Sample records for go-based similarity measure

  1. Effectively integrating information content and structural relationship to improve the GO-based similarity measure between proteins

    Li, Bo; Feltus, F Alex; Zhou, Jizhong; Luo, Feng


    The Gene Ontology (GO) provides a knowledge base to effectively describe proteins. However, measuring similarity between proteins based on GO remains a challenge. In this paper, we propose a new similarity measure, information coefficient similarity measure (SimIC), to effectively integrate both the information content (IC) of GO terms and the structural information of GO hierarchy to determine the similarity between proteins. Testing on yeast proteins, our results show that SimIC efficiently addresses the shallow annotation issue in GO, thus improves the correlations between GO similarities of yeast proteins and their expression similarities as well as between GO similarities of yeast proteins and their sequence similarities. Furthermore, we demonstrate that the proposed SimIC is superior in predicting yeast protein interactions. We predict 20484 yeast protein-protein interactions (PPIs) between 2462 proteins based on the high SimIC values of biological process (BP) and cellular component (CC). Examining the...

  2. Learning Biological Networks via Bootstrapping with Optimized GO-based Gene Similarity

    Taylor, Ronald C.; Sanfilippo, Antonio P.; McDermott, Jason E.; Baddeley, Robert L.; Riensche, Roderick M.; Jensen, Russell S.; Verhagen, Marc


    Microarray gene expression data provide a unique information resource for learning biological networks using "reverse engineering" methods. However, there are a variety of cases in which we know which genes are involved in a given pathology of interest, but we do not have enough experimental evidence to support the use of fully-supervised/reverse-engineering learning methods. In this paper, we explore a novel semi-supervised approach in which biological networks are learned from a reference list of genes and a partial set of links for these genes extracted automatically from PubMed abstracts, using a knowledge-driven bootstrapping algorithm. We show how new relevant links across genes can be iteratively derived using a gene similarity measure based on the Gene Ontology that is optimized on the input network at each iteration. We describe an application of this approach to the TGFB pathway as a case study and show how the ensuing results prove the feasibility of the approach as an alternate or complementary technique to fully supervised methods.

  3. Enriching regulatory networks by bootstrap learning using optimised GO-based gene similarity and gene links mined from PubMed abstracts

    Taylor, Ronald C.; Sanfilippo, Antonio P.; McDermott, Jason E.; Baddeley, Robert L.; Riensche, Roderick M.; Jensen, Russell S.; Verhagen, Marc; Pustejovsky, James


    Transcriptional regulatory networks are being determined using “reverse engineering” methods that infer connections based on correlations in gene state. Corroboration of such networks through independent means such as evidence from the biomedical literature is desirable. Here, we explore a novel approach, a bootstrapping version of our previous Cross-Ontological Analytic method (XOA) that can be used for semi-automated annotation and verification of inferred regulatory connections, as well as for discovery of additional functional relationships between the genes. First, we use our annotation and network expansion method on a biological network learned entirely from the literature. We show how new relevant links between genes can be iteratively derived using a gene similarity measure based on the Gene Ontology that is optimized on the input network at each iteration. Second, we apply our method to annotation, verification, and expansion of a set of regulatory connections found by the Context Likelihood of Relatedness algorithm.

  4. Similarity measures for face recognition

    Vezzetti, Enrico


    Face recognition has several applications, including security, such as (authentication and identification of device users and criminal suspects), and in medicine (corrective surgery and diagnosis). Facial recognition programs rely on algorithms that can compare and compute the similarity between two sets of images. This eBook explains some of the similarity measures used in facial recognition systems in a single volume. Readers will learn about various measures including Minkowski distances, Mahalanobis distances, Hansdorff distances, cosine-based distances, among other methods. The book also summarizes errors that may occur in face recognition methods. Computer scientists "facing face" and looking to select and test different methods of computing similarities will benefit from this book. The book is also useful tool for students undertaking computer vision courses.

  5. Estimating similarity of XML Schemas using path similarity measure

    Veena Trivedi


    Full Text Available In this paper, an attempt has been made to develop an algorithm which estimates the similarity for XML Schemas using multiple similarity measures. For performing the task, the XML Schema element information has been represented in the form of string and four different similarity measure approaches have been employed. To further improve the similarity measure, an overall similarity measure has also been calculated. The approach used in this paper is a distinguished one, as it calculates the similarity between two XML schemas using four approaches and gives an integrated values for the similarity measure. Keywords-componen

  6. Comparison of hydrological similarity measures

    Rianna, Maura; Ridolfi, Elena; Manciola, Piergiorgio; Napolitano, Francesco; Russo, Fabio


    The use of a traditional at site approach for the statistical characterization and simulation of spatio-temporal precipitation fields has a major recognized drawback. Indeed, the weakness of the methodology is related to the estimation of rare events and it involves the uncertainty of the at-site sample statistical inference, because of the limited length of records. In order to overcome the lack of at-site observations, regional frequency approach uses the idea of substituting space for time to estimate design floods. The conventional regional frequency analysis estimates quantile values at a specific site from multi-site analysis. The main idea is that homogeneous sites, once pooled together, have similar probability distribution curves of extremes, except for a scaling factor. The method for pooling groups of sites can be based on geographical or climatological considerations. In this work the region of influence (ROI) pooling method is compared with an entropy-based one. The ROI is a flexible pooling group approach which defines for each site its own "region" formed by a unique set of similar stations. The similarity is found through the Euclidean distance metric in the attribute space. Here an alternative approach based on entropy is introduced to cluster homogeneous sites. The core idea is that homogeneous sites share a redundant (i.e. similar) amount of information. Homogeneous sites are pooled through a hierarchical selection based on the mutual information index (i.e. a measure of redundancy). The method is tested on precipitation data in Central Italy area.

  7. Similarity measures for protein ensembles

    Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper


    Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformatio...

  8. Similarity measures for protein ensembles

    Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper


    Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformations...... a synthetic example from molecular dynamics simulations. We then apply the algorithms to revisit the problem of ensemble averaging during structure determination of proteins, and find that an ensemble refinement method is able to recover the correct distribution of conformations better than standard single...

  9. Shape Similarity Measures of Linear Entities


    The essential of feature matching technology lies in how to measure the similarity of spatial entities.Among all the possible similarity measures,the shape similarity measure is one of the most important measures because it is easy to collect the necessary parameters and it is also well matched with the human intuition.In this paper a new shape similarity measure of linear entities based on the differences of direction change along each line is presented and its effectiveness is illustrated.

  10. Stability of similarity measurements for bipartite networks

    Liu, Jian-Guo; Pan, Xue; Guo, Qiang; Zhou, Tao


    Similarity is a fundamental measure in network analyses and machine learning algorithms, with wide applications ranging from personalized recommendation to socio-economic dynamics. We argue that an effective similarity measurement should guarantee the stability even under some information loss. With six bipartite networks, we investigate the stabilities of fifteen similarity measurements by comparing the similarity matrixes of two data samples which are randomly divided from original data sets. Results show that, the fifteen measurements can be well classified into three clusters according to their stabilities, and measurements in the same cluster have similar mathematical definitions. In addition, we develop a top-$n$-stability method for personalized recommendation, and find that the unstable similarities would recommend false information to users, and the performance of recommendation would be largely improved by using stable similarity measurements. This work provides a novel dimension to analyze and eval...

  11. Cluster Tree Based Hybrid Document Similarity Measure

    M. Varshana Devi


    Full Text Available similarity measure is established to measure the hybrid similarity. In cluster tree, the hybrid similarity measure can be calculated for the random data even it may not be the co-occurred and generate different views. Different views of tree can be combined and choose the one which is significant in cost. A method is proposed to combine the multiple views. Multiple views are represented by different distance measures into a single cluster. Comparing the cluster tree based hybrid similarity with the traditional statistical methods it gives the better feasibility for intelligent based search. It helps in improving the dimensionality reduction and semantic analysis.

  12. Distance and Similarity Measures for Soft Sets

    Kharal, Athar


    In [P. Majumdar, S. K. Samanta, Similarity measure of soft sets, New Mathematics and Natural Computation 4(1)(2008) 1-12], the authors use matrix representation based distances of soft sets to introduce matching function and distance based similarity measures. We first give counterexamples to show that their Definition 2.7 and Lemma 3.5(3) contain errors, then improve their Lemma 4.4 making it a corllary of our result. The fundamental assumption of Majumdar et al has been shown to be flawed. This motivates us to introduce set operations based measures. We present a case (Example 28) where Majumdar-Samanta similarity measure produces an erroneous result but the measure proposed herein decides correctly. Several properties of the new measures have been presented and finally the new similarity measures have been applied to the problem of financial diagnosis of firms.

  13. Appropriate Similarity Measures for Author Cocitation Analysis

    N.J.P. van Eck (Nees Jan); L. Waltman (Ludo)


    textabstractWe provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similar

  14. Gait signal analysis with similarity measure.

    Lee, Sanghyuk; Shin, Seungsoo


    Human gait decision was carried out with the help of similarity measure design. Gait signal was selected through hardware implementation including all in one sensor, control unit, and notebook with connector. Each gait signal was considered as high dimensional data. Therefore, high dimensional data analysis was considered via heuristic technique such as the similarity measure. Each human pattern such as walking, sitting, standing, and stepping up was obtained through experiment. By the results of the analysis, we also identified the overlapped and nonoverlapped data relation, and similarity measure analysis was also illustrated, and comparison with conventional similarity measure was also carried out. Hence, nonoverlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considered high dimensional data analysis was designed with consideration of neighborhood information. Proposed similarity measure was applied to identify the behavior patterns of different persons, and different behaviours of the same person. Obtained analysis can be extended to organize health monitoring system for specially elderly persons.

  15. Gait Signal Analysis with Similarity Measure

    Sanghyuk Lee


    Full Text Available Human gait decision was carried out with the help of similarity measure design. Gait signal was selected through hardware implementation including all in one sensor, control unit, and notebook with connector. Each gait signal was considered as high dimensional data. Therefore, high dimensional data analysis was considered via heuristic technique such as the similarity measure. Each human pattern such as walking, sitting, standing, and stepping up was obtained through experiment. By the results of the analysis, we also identified the overlapped and nonoverlapped data relation, and similarity measure analysis was also illustrated, and comparison with conventional similarity measure was also carried out. Hence, nonoverlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considered high dimensional data analysis was designed with consideration of neighborhood information. Proposed similarity measure was applied to identify the behavior patterns of different persons, and different behaviours of the same person. Obtained analysis can be extended to organize health monitoring system for specially elderly persons.

  16. Mechanisms for similarity matching in disparity measurement

    Ross eGoutcher


    Full Text Available Early neural mechanisms for the measurement of binocular disparity appear to operate in a manner consistent with cross-correlation-like processes. Consequently, cross-correlation, or cross-correlation-like procedures have been used in a range of models of disparity measurement. Using such procedures as the basis for disparity measurement creates a preference for correspondence solutions that maximise the similarity between local left and right eye image regions. Here, we examine how observers’ perception of depth in an ambiguous stereogram is affected by manipulations of luminance and orientation-based image similarity. Results show a strong effect of coarse-scale luminance similarity manipulations, but a relatively weak effect of finer-scale manipulations of orientation similarity. This is in contrast to the measurements of depth obtained from a standard cross-correlation model. This model shows strong effects of orientation similarity manipulations and weaker effects of luminance similarity. In order to account for these discrepancies, the standard cross-correlation approach may be modified to include an initial spatial frequency filtering stage. The performance of this adjusted model most closely matches human psychophysical data when spatial frequency filtering favours coarser scales. This is consistent with the operation of disparity measurement processes where spatial frequency and disparity tuning are correlated, or where disparity measurement operates in a coarse-to-fine manner.

  17. Measure of Node Similarity in Multilayer Networks

    Mollgaard, Anders; Dammeyer, Jesper; Jensen, Mogens H; Lehmann, Sune; Mathiesen, Joachim


    The weight of links in a network is often related to the similarity of the nodes. Here, we introduce a simple tunable measure for analysing the similarity of nodes across different link weights. In particular, we use the measure to analyze homophily in a group of 659 freshman students at a large university. Our analysis is based on data obtained using smartphones equipped with custom data collection software, complemented by questionnaire-based data. The network of social contacts is represented as a weighted multilayer network constructed from different channels of telecommunication as well as data on face-to-face contacts. We find that even strongly connected individuals are not more similar with respect to basic personality traits than randomly chosen pairs of individuals. In contrast, several socio-demographics variables have a significant degree of similarity. We further observe that similarity might be present in one layer of the multilayer network and simultaneously be absent in the other layers. For a...

  18. Measure of Node Similarity in Multilayer Networks.

    Anders Mollgaard

    Full Text Available The weight of links in a network is often related to the similarity of the nodes. Here, we introduce a simple tunable measure for analysing the similarity of nodes across different link weights. In particular, we use the measure to analyze homophily in a group of 659 freshman students at a large university. Our analysis is based on data obtained using smartphones equipped with custom data collection software, complemented by questionnaire-based data. The network of social contacts is represented as a weighted multilayer network constructed from different channels of telecommunication as well as data on face-to-face contacts. We find that even strongly connected individuals are not more similar with respect to basic personality traits than randomly chosen pairs of individuals. In contrast, several socio-demographics variables have a significant degree of similarity. We further observe that similarity might be present in one layer of the multilayer network and simultaneously be absent in the other layers. For a variable such as gender, our measure reveals a transition from similarity between nodes connected with links of relatively low weight to dis-similarity for the nodes connected by the strongest links. We finally analyze the overlap between layers in the network for different levels of acquaintanceships.

  19. Measure of Node Similarity in Multilayer Networks

    Mollgaard, Anders; Zettler, Ingo; Dammeyer, Jesper; Jensen, Mogens H.; Lehmann, Sune; Mathiesen, Joachim


    The weight of links in a network is often related to the similarity of the nodes. Here, we introduce a simple tunable measure for analysing the similarity of nodes across different link weights. In particular, we use the measure to analyze homophily in a group of 659 freshman students at a large university. Our analysis is based on data obtained using smartphones equipped with custom data collection software, complemented by questionnaire-based data. The network of social contacts is represented as a weighted multilayer network constructed from different channels of telecommunication as well as data on face-to-face contacts. We find that even strongly connected individuals are not more similar with respect to basic personality traits than randomly chosen pairs of individuals. In contrast, several socio-demographics variables have a significant degree of similarity. We further observe that similarity might be present in one layer of the multilayer network and simultaneously be absent in the other layers. For a variable such as gender, our measure reveals a transition from similarity between nodes connected with links of relatively low weight to dis-similarity for the nodes connected by the strongest links. We finally analyze the overlap between layers in the network for different levels of acquaintanceships. PMID:27300084

  20. Measure of Node Similarity in Multilayer Networks

    Møllgaard, Anders; Zettler, Ingo; Dammeyer, Jesper


    The weight of links in a network is often related to the similarity of thenodes. Here, we introduce a simple tunable measure for analysing the similarityof nodes across different link weights. In particular, we use the measure toanalyze homophily in a group of 659 freshman students at a large...... university.Our analysis is based on data obtained using smartphones equipped with customdata collection software, complemented by questionnaire-based data. The networkof social contacts is represented as a weighted multilayer network constructedfrom different channels of telecommunication as well as data...... might bepresent in one layer of the multilayer network and simultaneously be absent inthe other layers. For a variable such as gender, our measure reveals atransition from similarity between nodes connected with links of relatively lowweight to dis-similarity for the nodes connected by the strongest...

  1. Similarity indices I: what do they measure.

    Johnston, J.W.


    A method for estimating the effects of environmental effusions on ecosystems is described. The characteristics of 25 similarity indices used in studies of ecological communities were investigated. The type of data structure, to which these indices are frequently applied, was described as consisting of vectors of measurements on attributes (species) observed in a set of samples. A general similarity index was characterized as the result of a two-step process defined on a pair of vectors. In the first step an attribute similarity score is obtained for each attribute by comparing the attribute values observed in the pair of vectors. The result is a vector of attribute similarity scores. These are combined in the second step to arrive at the similarity index. The operation in the first step was characterized as a function, g, defined on pairs of attribute values. The second operation was characterized as a function, F, defined on the vector of attribute similarity scores from the first step. Usually, F was a simple sum or weighted sum of the attribute similarity scores. It is concluded that similarity indices should not be used as the test statistic to discriminate between two ecological communities.

  2. Measurement of Similarity in Academic Contexts

    Omid Mahian


    Full Text Available We propose some reflections, comments and suggestions about the measurement of similar and matched content in scientific papers and documents, and the need to develop appropriate tools and standards for an ethically fair and equitable treatment of authors.

  3. New hyperspectral discrimination measure for spectral similarity

    Du, Yingzi; Chang, Chein-I.; Ren, Hsuan; D'Amico, Francis M.; Jensen, James O.


    Spectral angle mapper (SAM) has been widely used as a spectral similarity measure for multispectral and hyperspectral image analysis. It has been shown to be equivalent to Euclidean distance when the spectral angle is relatively small. Most recently, a stochastic measure, called spectral information divergence (SID) has been introduced to model the spectrum of a hyperspectral image pixel as a probability distribution so that spectral variations can be captured more effectively in a stochastic manner. This paper develops a new hyperspectral spectral discriminant measure, which is a mixture of SID and SAM. More specifically, let xi and xj denote two hyperspectral image pixel vectors with their corresponding spectra specified by si and sj. SAM is the spectral angle of xi and xj and is defined by [SAM(si,sj)]. Similarly, SID measures the information divergence between xi and xj and is defined by [SID(si,sj)]. The new measure, referred to as (SID,SAM)-mixed measure has two variations defined by SID(si,sj)xtan(SAM(si,sj)] and SID(si,sj)xsin[SAM(si,sj)] where tan [SAM(si,sj)] and sin[SAM(si,sj)] are the tangent and the sine of the angle between vectors x and y. The advantage of the developed (SID,SAM)-mixed measure combines both strengths of SID and SAM in spectral discriminability. In order to demonstrate its utility, a comparative study is conducted among the new measure, SID and SAM where the discriminatory power of the (SID,SAM)-mixed measure is significantly improved over SID and SAM.

  4. Efficient Video Similarity Measurement and Search

    Cheung, S-C S


    The amount of information on the world wide web has grown enormously since its creation in 1990. Duplication of content is inevitable because there is no central management on the web. Studies have shown that many similar versions of the same text documents can be found throughout the web. This redundancy problem is more severe for multimedia content such as web video sequences, as they are often stored in multiple locations and different formats to facilitate downloading and streaming. Similar versions of the same video can also be found, unknown to content creators, when web users modify and republish original content using video editing tools. Identifying similar content can benefit many web applications and content owners. For example, it will reduce the number of similar answers to a web search and identify inappropriate use of copyright content. In this dissertation, they present a system architecture and corresponding algorithms to efficiently measure, search, and organize similar video sequences found on any large database such as the web.

  5. Recovery geospatial objects using semantic similarity measures

    Neili Machado-García


    Full Text Available In this paper we propose a methodology based on the semantic processing of geographic objects for the classification of soils according to the New Version of Genetic Classification of soils of Cuba. The method consists of five stages: conceptualization, synthesis, queries processing, retrieval and management. The result is a system of geospatial information management applying semantic similarity measure of Resnik. As a case study considering the region of San Jose de las Lajas located in the province of Mayabeque.

  6. Image Steganalysis with Binary Similarity Measures

    Kharrazi Mehdi


    Full Text Available We present a novel technique for steganalysis of images that have been subjected to embedding by steganographic algorithms. The seventh and eighth bit planes in an image are used for the computation of several binary similarity measures. The basic idea is that the correlation between the bit planes as well as the binary texture characteristics within the bit planes will differ between a stego image and a cover image. These telltale marks are used to construct a classifier that can distinguish between stego and cover images. We also provide experimental results using some of the latest steganographic algorithms. The proposed scheme is found to have complementary performance vis-à-vis Farid's scheme in that they outperform each other in alternate embedding techniques.

  7. InteGO2: a web tool for measuring and visualizing gene semantic similarities using Gene Ontology.

    Peng, Jiajie; Li, Hongxiang; Liu, Yongzhuang; Juan, Liran; Jiang, Qinghua; Wang, Yadong; Chen, Jin


    The Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. We present InteGO2, a web tool that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface. InteGO2 can be accessed via .

  8. GO-based Functional Dissimilarity of Gene Sets

    Aguilar-Ruiz Jesús S


    Full Text Available Abstract Background The Gene Ontology (GO provides a controlled vocabulary for describing the functions of genes and can be used to evaluate the functional coherence of gene sets. Many functional coherence measures consider each pair of gene functions in a set and produce an output based on all pairwise distances. A single gene can encode multiple proteins that may differ in function. For each functionality, other proteins that exhibit the same activity may also participate. Therefore, an identification of the most common function for all of the genes involved in a biological process is important in evaluating the functional similarity of groups of genes and a quantification of functional coherence can helps to clarify the role of a group of genes working together. Results To implement this approach to functional assessment, we present GFD (GO-based Functional Dissimilarity, a novel dissimilarity measure for evaluating groups of genes based on the most relevant functions of the whole set. The measure assigns a numerical value to the gene set for each of the three GO sub-ontologies. Conclusions Results show that GFD performs robustly when applied to gene set of known functionality (extracted from KEGG. It performs particularly well on randomly generated gene sets. An ROC analysis reveals that the performance of GFD in evaluating the functional dissimilarity of gene sets is very satisfactory. A comparative analysis against other functional measures, such as GS2 and those presented by Resnik and Wang, also demonstrates the robustness of GFD.

  9. Measuring semantic similarities by combining gene ontology annotations and gene co-function networks.

    Peng, Jiajie; Uygun, Sahra; Kim, Taehyong; Wang, Yadong; Rhee, Seung Y; Chen, Jin


    Gene Ontology (GO) has been used widely to study functional relationships between genes. The current semantic similarity measures rely only on GO annotations and GO structure. This limits the power of GO-based similarity because of the limited proportion of genes that are annotated to GO in most organisms. We introduce a novel approach called NETSIM (network-based similarity measure) that incorporates information from gene co-function networks in addition to using the GO structure and annotations. Using metabolic reaction maps of yeast, Arabidopsis, and human, we demonstrate that NETSIM can improve the accuracy of GO term similarities. We also demonstrate that NETSIM works well even for genomes with sparser gene annotation data. We applied NETSIM on large Arabidopsis gene families such as cytochrome P450 monooxygenases to group the members functionally and show that this grouping could facilitate functional characterization of genes in these families. Using NETSIM as an example, we demonstrated that the performance of a semantic similarity measure could be significantly improved after incorporating genome-specific information. NETSIM incorporates both GO annotations and gene co-function network data as a priori knowledge in the model. Therefore, functional similarities of GO terms that are not explicitly encoded in GO but are relevant in a taxon-specific manner become measurable when GO annotations are limited. Supplementary information and software are available at .

  10. Similarity measure application to fault detection of flight system

    KIM J H; LEE S H; WANG Hong-mei


    Fault detection technique is introduced with similarity measure. The characteristics of conventional similarity measure based on fuzzy number are discussed. With the help of distance measure, similarity measure is constructed explicitly. The designed distance-based similarity measure is applicable to general fuzzy membership functions including non-convex fuzzy membership function, whereas fuzzy number-based similarity measure has limitation to calculate the similarity of general fuzzy membership functions. The applicability of the proposed similarity measure to general fuzzy membership structures is proven by identifying the definition. To decide fault detection of flight system, the experimental data (pitching moment coefficients and lift coefficients) are transformed into fuzzy membership functions. Distance-based similarity measure is applied to the obtained fuzzy membership functions, and similarity computation and analysis are obtained with the fault and normal operation coefficients.

  11. A Survey of Binary Similarity and Distance Measures

    Seung-Seok Choi


    Full Text Available The binary feature vector is one of the most common representations of patterns and measuring similarity and distance measures play a critical role in many problems such as clustering, classification, etc. Ever since Jaccard proposed a similarity measure to classify ecological species in 1901, numerous binary similarity and distance measures have been proposed in various fields. Applying appropriate measures results in more accurate data analysis. Notwithstanding, few comprehensive surveys on binary measures have been conducted. Hence we collected 76 binary similarity and distance measures used over the last century and reveal their correlations through the hierarchical clustering technique.

  12. A New Trajectory Similarity Measure for GPS Data

    Ismail, Anas


    We present a new algorithm for measuring the similarity between trajectories, and in particular between GPS traces. We call this new similarity measure the Merge Distance (MD). Our approach is robust against subsampling and supersampling. We perform experiments to compare this new similarity measure with the two main approaches that have been used so far: Dynamic Time Warping (DTW) and the Euclidean distance. © 2015 ACM.

  13. Similarity

    Apostol, Tom M. (Editor)


    In this 'Project Mathematics! series, sponsored by the California Institute for Technology (CalTech), the mathematical concept of similarity is presented. he history of and real life applications are discussed using actual film footage and computer animation. Terms used and various concepts of size, shape, ratio, area, and volume are demonstrated. The similarity of polygons, solids, congruent triangles, internal ratios, perimeters, and line segments using the previous mentioned concepts are shown.

  14. A Framework for Analysis of Music Similarity Measures

    Jensen, Jesper Højvang; Christensen, Mads G.; Jensen, Søren Holdt


    To analyze specific properties of music similarity measures that the commonly used genre classification evaluation procedure does not reveal, we introduce a MIDI based test framework for music similarity measures. We introduce the framework by example and thus outline an experiment to analyze...... the dependency of a music similarity measure on the instrumentation of a song compared to the melody, and to analyze its sensitivity to transpositions. Using the outlined experiment, we analyze music similarity measures from three software packages, namely Marsyas, MA toolbox and Intelligent Sound Processing...

  15. Similarity Measurement of Web Sessions Based on Sequence Alignment

    LI Chaofeng; LU Yansheng


    The task of clustering Web sessions is to group Web sessions based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-group similarity.The first and foremost question needed to be considered in clustering Web sessions is how to measure the similarity between Web sessions. However, there are many shortcomings in traditional measurements. This paper introduces a new method for measuring similarities between Web pages that takes into account not only the URL but also the viewing time of the visited Web page. Then we give a new method to measure the similarity of Web sessions using sequence alignment and the similarity of Web page access in detail.Experiments have proved that our method is valid and efficient.

  16. Self-similar measures on the Julia sets of polynomials


    If the immediate basin of infinity of a polynomial P(z) contains at least one of its critical points, then there is a self-similar measure on its Julia set, and if all the critical points of P(z) lie in the immediate basin of in finity, then the self-similar measure is unique.

  17. JacUOD: A New Similarity Measurement for Collaborative Filtering

    Hui-Feng Sun; Jun-Liang Chen; Gang Yu; Chuan-Chang Liu; Yong Peng; Guang Chen; Bo Cheng


    Collaborative filtering (CF) has been widely applied to recommender systems,since it can assist users to discover their favorite items.Similarity measurement that measures the similarity between two users or items is critical to CF.However,traditional similarity measurement approaches for memory-based CF can be strongly improved.In this paper,we propose a novel similarity measurement,named Jaccard Uniform Operator Distance (JacUOD),to effectively measure the similarity.Our JacUOD approach aims at unifying similarity comparison for vectors in different multidimensional vector spaces.Compared with traditional similarity measurement approaches,JacUOD properly handles dimension-number difference for different vector spaces.We conduct experiments based on the well-known MovieLens datasets,and take user-based CF as an example to show the effectiveness of our approach.The experimental results show that our JacUOD approach achieves better prediction accuracy than traditional similarity measurement approaches.

  18. Semantic similarity measure in biomedical domain leverage web search engine.

    Chen, Chi-Huang; Hsieh, Sheau-Ling; Weng, Yung-Ching; Chang, Wen-Yung; Lai, Feipei


    Semantic similarity measure plays an essential role in Information Retrieval and Natural Language Processing. In this paper we propose a page-count-based semantic similarity measure and apply it in biomedical domains. Previous researches in semantic web related applications have deployed various semantic similarity measures. Despite the usefulness of the measurements in those applications, measuring semantic similarity between two terms remains a challenge task. The proposed method exploits page counts returned by the Web Search Engine. We define various similarity scores for two given terms P and Q, using the page counts for querying P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using lexico-syntactic patterns with page counts. These different similarity scores are integrated adapting support vector machines, to leverage the robustness of semantic similarity measures. Experimental results on two datasets achieve correlation coefficients of 0.798 on the dataset provided by A. Hliaoutakis, 0.705 on the dataset provide by T. Pedersen with physician scores and 0.496 on the dataset provided by T. Pedersen et al. with expert scores.

  19. Measurement of Similarity for Spatial Directions Between Areal Objects

    DING Hong; GUO Qingsheng; DU Xiaohu


    Similarity for spatial directions plays an important role in GIS. In this paper, the conventional approaches are analyzed. Based on raster data areal objects, the authors propose two new methods for measuring similarity among spatial directions. One is to measure the similarity among spatial directions based on the features of raster data and the changes of distances between spatial objects, the other is to measure the similarity among spatial directions according to the variation of each raster cell centroid angle. The two methods overcome the complexity of measuring similarity among spatial directions with direction matrix model and solve the limitation of small changes in direction. The two methods are simple and have broader applicability.

  20. A New Similarity measure for taxonomy based on edge counting

    Manjula Shenoy.K


    Full Text Available This paper introduces a new similarity measure based on edge counting in a taxonomy like WorldNet orOntology. Measurement of similarity between text segments or concepts is very useful for manyapplications like information retrieval, ontology matching, text mining, and question answering and so on.Several measures have been developed for measuring similarity between two concepts: out of these we seethat the measure given by Wu and Palmer [1] is simple, and gives good performance. Our measure isbased on their measure but strengthens it. Wu and Palmer [1] measure has a disadvantage that it does notconsider how far the concepts are semantically. In our measure we include the shortest path between theconcepts and the depth of whole taxonomy together with the distances used in Wu and Palmer [1]. Also themeasure has following disadvantage i.e. in some situations, the similarity of two elements of an IS-Aontology contained in the neighbourhood exceeds the similarity value of two elements contained in thesame hierarchy. Our measure introduces a penalization factor for this case based upon shortest lengthbetween the concepts and depth of whole taxonomy.

  1. Jacobians for Lebesgue registration for a range of similarity measures

    Sporring, Jon; Darkner, Sune

    In [Darkner and Sporring, 2011] was presented a framework based on locally orderless images and Lebesgue integration resulting in a fast algorithm for registration using normalized mutual information as dissimilarity measure. This report extends the algorithm to arbitrary complex similarity measu...... measures and supplies the full derivatives of a range of common dissimilarity measures as well as their obvious extensions....

  2. A vector-based, multidimensional scanpath similarity measure

    Jarodzka, Halszka; Holmqvist, Kenneth; Nyström, Marcus


    Jarodzka, H., Holmqvist, K., & Nyström, M. (2010, March). A vector-based, multidimensional scanpath similarity measure. Presentation at the Eye Tracking Research & Application Symposium (ETRA), Austin, Texas, USA.

  3. Cosine Similarity Measure of Interval Valued Neutrosophic Sets

    Said Broumi


    Full Text Available In this paper, we define a new cosine similarity between two interval valued neutrosophic sets based on Bhattacharya’s distance [19]. The notions of interval valued neutrosophic sets (IVNS, for short will be used as vector representations in 3D-vector space. Based on the comparative analysis of the existing similarity measures for IVNS, we find that our proposed similarity measure is better and more robust. An illustrative example of the pattern recognition shows that the proposed method is simple and effective.

  4. Semantic Referencing - Determining Context Weights for Similarity Measurement

    Janowicz, Krzysztof; Adams, Benjamin; Raubal, Martin

    Semantic similarity measurement is a key methodology in various domains ranging from cognitive science to geographic information retrieval on the Web. Meaningful notions of similarity, however, cannot be determined without taking additional contextual information into account. One way to make similarity measures context-aware is by introducing weights for specific characteristics. Existing approaches to automatically determine such weights are rather limited or require application specific adjustments. In the past, the possibility to tweak similarity theories until they fit a specific use case has been one of the major criticisms for their evaluation. In this work, we propose a novel approach to semi-automatically adapt similarity theories to the user's needs and hence make them context-aware. Our methodology is inspired by the process of georeferencing images in which known control points between the image and geographic space are used to compute a suitable transformation. We propose to semi-automatically calibrate weights to compute inter-instance and inter-concept similarities by allowing the user to adjust pre-computed similarity rankings. These known control similarities are then used to reference other similarity values.

  5. A Measure of Similarity Between Trajectories of Vessels

    Le QI


    Full Text Available The measurement of similarity between trajectories of vessels is one of the kernel problems that must be addressed to promote the development of maritime intelligent traffic system (ITS. In this study, a new model of trajectory similarity measurement was established to improve the data processing efficiency in dynamic application and to reflect actual sailing behaviors of vessels. In this model, a feature point detection algorithm was proposed to extract feature points, reduce data storage space and save computational resources. A new synthesized distance algorithm was also created to measure the similarity between trajectories by using the extracted feature points. An experiment was conducted to measure the similarity between the real trajectories of vessels. The growth of these trajectories required measurements to be conducted under different voyages. The results show that the similarity measurement between the vessel trajectories is efficient and correct. Comparison of the synthesized distance with the sailing behaviors of vessels proves that results are consistent with actual situations. The experiment results demonstrate the promising application of the proposed model in studying vessel traffic and in supplying reliable data for the development of maritime ITS.

  6. Generalized Framework for Similarity Measure of Time Series

    Hongsheng Yin


    Full Text Available Currently, there is no definitive and uniform description for the similarity of time series, which results in difficulties for relevant research on this topic. In this paper, we propose a generalized framework to measure the similarity of time series. In this generalized framework, whether the time series is univariable or multivariable, and linear transformed or nonlinear transformed, the similarity of time series is uniformly defined using norms of vectors or matrices. The definitions of the similarity of time series in the original space and the transformed space are proved to be equivalent. Furthermore, we also extend the theory on similarity of univariable time series to multivariable time series. We present some experimental results on published time series datasets tested with the proposed similarity measure function of time series. Through the proofs and experiments, it can be claimed that the similarity measure functions of linear multivariable time series based on the norm distance of covariance matrix and nonlinear multivariable time series based on kernel function are reasonable and practical.

  7. Measures of Similarity for Command and Control Situation Analysis


    Resnik proposed to use information content[17]. He reasoned that if we consider the taxon- omy to be a source of information, that we could effectively...determined by the specific probabilistic profile of that source2. From this, Resnik proposed that the degree of similarity between concepts within a...379–423,623–656, 1948. [17] Philip Resnik . Semantic similarity in a taxonomy: An information-based measure and its application to problems of

  8. Rotational invariant similarity measurement for content-based image indexing

    Ro, Yong M.; Yoo, Kiwon


    We propose a similarity matching technique for contents based image retrieval. The proposed technique is invariant from rotated images. Since image contents for image indexing and retrieval would be arbitrarily extracted from still image or key frame of video, the rotation invariant property of feature description of image is important for general application of contents based image indexing and retrieval. In this paper, we propose a rotation invariant similarity measurement in cooperating with texture featuring base on HVS. To simplify computational complexity, we employed hierarchical similarity distance searching. To verify the method, experiments with MPEG-7 data set are performed.

  9. Computing Semantic Similarity Measure Between Words Using Web Search Engine

    Pushpa C N


    Full Text Available Semantic Similarity measures between words plays an important role in information retrieval, natural language processing and in various tasks on the web. In this paper, we have proposed a Modified Pattern Extraction Algorithm to compute th e supervised semantic similarity measure between the words by combining both page count meth od and web snippets method. Four association measures are used to find semantic simi larity between words in page count method using web search engines. We use a Sequential Minim al Optimization (SMO support vector machines (SVM to find the optimal combination of p age counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and non-synonymous word-pairs. The propo sed Modified Pattern Extraction Algorithm outperforms by 89.8 percent of correlatio n value.

  10. Anonymous indexing of health conditions for a similarity measure.

    Song, Insu; Marsh, Nigel V


    A health social network is an online information service which facilitates information sharing between closely related members of a community with the same or a similar health condition. Over the years, many automated recommender systems have been developed for social networking in order to help users find their communities of interest. For health social networking, the ideal source of information for measuring similarities of patients is the medical information of the patients. However, it is not desirable that such sensitive and private information be shared over the Internet. This is also true for many other security sensitive domains. A new information-sharing scheme is developed where each patient is represented as a small number of (possibly disjoint) d-words (discriminant words) and the d-words are used to measure similarities between patients without revealing sensitive personal information. The d-words are simple words like "food,'' and thus do not contain identifiable personal information. This makes our method an effective one-way hashing of patient assessments for a similarity measure. The d-words can be easily shared on the Internet to find peers who might have similar health conditions.

  11. Application of the Frequency Spectrum to Spectral Similarity Measures

    Ke Wang


    Full Text Available Several frequency-based spectral similarity measures, derived from commonly-used ones, are developed for hyperspectral image classification based on the frequency domain. Since the frequency spectrum (magnitude spectrum of the original signature for each pixel from hyperspectral data can clearly reflect the spectral features of different types of land covers, we replace the original spectral signature with its frequency spectrum for calculating the existing spectral similarity measure. The frequency spectrum is symmetrical around the direct current (DC component; thus, we take one-half of the frequency spectrum from the DC component to the highest frequency component as the input signature. Furthermore, considering the fact that the low frequencies include most of the frequency energy, we can optimize the classification result by choosing the ratio of the frequency spectrum (from the DC component to the highest frequency component involved in the calculation. In our paper, the frequency-based measures based on the spectral gradient angle (SAM, spectral information divergence (SID, spectral correlation mapper (SCM, Euclidean distance (ED, normalized Euclidean distance (NED and SID × sin(SAM (SsS measures are called the F-SAM, F-SID, F-SCM, F-ED, F-NED and F-SsS, respectively. In the experiment, three commonly-used hyperspectral remote sensing images are employed as test data. The frequency-based measures proposed here are compared to the corresponding existing ones in terms of classification accuracy. The classification results by parameter optimization are also analyzed. The results show that, although not all frequency-based spectral similarity measures are better than the original ones, some frequency-based measures, such as the F-SsS and F-SID, exhibit a relatively better performance and have more robust applications than the other spectral similarity measures.

  12. A Semantic Similarity Measure for Expressive Description Logics

    d'Amato, Claudia; Esposito, Floriana


    A totally semantic measure is presented which is able to calculate a similarity value between concept descriptions and also between concept description and individual or between individuals expressed in an expressive description logic. It is applicable on symbolic descriptions although it uses a numeric approach for the calculus. Considering that Description Logics stand as the theoretic framework for the ontological knowledge representation and reasoning, the proposed measure can be effectively used for agglomerative and divisional clustering task applied to the semantic web domain.

  13. A vector-based, multidimensional scanpath similarity measure

    Jarodzka, Halszka; Kenneth, Holmqvist; Marcus, Nyström


    Jarodzka, H., Holmqvist, K., & Nyström, M. (2010). A vector-based, multidimensional scanpath similarity measure. In C. Morimoto & H. Instance (Eds.), Proceedings of the 2010 Symposium on Eye Tracking Research & Applications ETRA ’10 (pp. 211-218). New York, NY: ACM.

  14. Fuzzy Relational Databases: Representational Issues and Reduction Using Similarity Measures.

    Prade, Henri; Testemale, Claudette


    Compares and expands upon two approaches to dealing with fuzzy relational databases. The proposed similarity measure is based on a fuzzy Hausdorff distance and estimates the mismatch between two possibility distributions using a reduction process. The consequences of the reduction process on query evaluation are studied. (Author/EM)

  15. Information Theoretic Similarity Measures for Content Based Image Retrieval.

    Zachary, John; Iyengar, S. S.


    Content-based image retrieval is based on the idea of extracting visual features from images and using them to index images in a database. Proposes similarity measures and an indexing algorithm based on information theory that permits an image to be represented as a single number. When used in conjunction with vectors, this method displays…

  16. Ultrasound specific similarity measures for three-dimensional mosaicing

    Wachinger, Christian; Navab, Nassir


    The introduction of 2D array ultrasound transducers enables the instantaneous acquisition of ultrasound volumes in the clinical practice. The next step coming along is the combination of several scans to create compounded volumes that provide an extended field-of-view, so called mosaics. The correct alignment of multiple images, which is a complex task, forms the basis of mosaicing. Especially the simultaneous intensity-based registration has many properties making it a good choice for ultrasound mosaicing in comparison to the pairwise one. Fundamental for each registration approach is a suitable similarity measure. So far, only standard measures like SSD, NNC, CR, and MI were used for mosaicing, which implicitly assume an additive Gaussian distributed noise. For ultrasound images, which are degraded by speckle patterns, alternative noise models based on multiplicative Rayleigh distributed noise were proposed in the field of motion estimation. Setting these models into the maximum likelihood estimation framework, which enables the mathematical modeling of the registration process, led us to ultrasound specific bivariate similarity measures. Subsequently, we used an extension of the maximum likelihood estimation framework, which we developed in a previous work, to also derive multivariate measures. They allow us to perform ultrasound specific simultaneous registration for mosaicing. These measures have a higher potential than afore mentioned standard measures since they are specifically designed to cope with problems arising from the inherent contamination of ultrasound images by speckle patterns. The results of the experiments that we conducted on a typical mosaicing scenario with only partly overlapping images confirm this assumption.

  17. A fingerprint based metric for measuring similarities of crystalline structures.

    Zhu, Li; Amsler, Maximilian; Fuhrer, Tobias; Schaefer, Bastian; Faraji, Somayeh; Rostami, Samare; Ghasemi, S Alireza; Sadeghi, Ali; Grauzinyte, Migle; Wolverton, Chris; Goedecker, Stefan


    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not directly suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell, we introduce crystal fingerprints that can be calculated easily and define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method can be a useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms, and high-throughput screenings.

  18. Preserving Differential Privacy for Similarity Measurement in Smart Environments

    Kok-Seng Wong


    Full Text Available Advances in both sensor technologies and network infrastructures have encouraged the development of smart environments to enhance people’s life and living styles. However, collecting and storing user’s data in the smart environments pose severe privacy concerns because these data may contain sensitive information about the subject. Hence, privacy protection is now an emerging issue that we need to consider especially when data sharing is essential for analysis purpose. In this paper, we consider the case where two agents in the smart environment want to measure the similarity of their collected or stored data. We use similarity coefficient function FSC as the measurement metric for the comparison with differential privacy model. Unlike the existing solutions, our protocol can facilitate more than one request to compute FSC without modifying the protocol. Our solution ensures privacy protection for both the inputs and the computed FSC results.

  19. A fingerprint based metric for measuring similarities of crystalline structures

    Zhu, Li; Fuhrer, Tobias; Schaefer, Bastian; Grauzinyte, Migle; Goedecker, Stefan, E-mail: [Department of Physics, Universität Basel, Klingelbergstr. 82, 4056 Basel (Switzerland); Amsler, Maximilian [Department of Physics, Universität Basel, Klingelbergstr. 82, 4056 Basel (Switzerland); Department of Materials Science and Engineering, Northwestern University, Evanston, Illinois 60208 (United States); Faraji, Somayeh; Rostami, Samare; Ghasemi, S. Alireza [Institute for Advanced Studies in Basic Sciences, P.O. Box 45195-1159, Zanjan (Iran, Islamic Republic of); Sadeghi, Ali [Physics Department, Shahid Beheshti University, G. C., Evin, 19839 Tehran (Iran, Islamic Republic of); Wolverton, Chris [Department of Materials Science and Engineering, Northwestern University, Evanston, Illinois 60208 (United States)


    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not directly suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell, we introduce crystal fingerprints that can be calculated easily and define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method can be a useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms, and high-throughput screenings.

  20. A fingerprint based metric for measuring similarities of crystalline structures

    Zhu, Li; Fuhrer, Tobias; Schaefer, Bastian; Faraji, Somayeh; Rostami, Samara; Ghasemi, S Alireza; Sadeghi, Ali; Grauzinyte, Migle; Wolverton, Christopher; Goedecker, Stefan


    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell we introduce crystal fingerprints that can be calculated easily and allow to define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method is an useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms and high-throughput screenings.

  1. Identifying Cover Songs Using Information-Theoretic Measures of Similarity

    Foster, Peter; Dixon, Simon; Klapuri, Anssi


    This paper investigates methods for quantifying similarity between audio signals, specifically for the task of of cover song detection. We consider an information-theoretic approach, where we compute pairwise measures of predictability between time series. We compare discrete-valued approaches operating on quantised audio features, to continuous-valued approaches. In the discrete case, we propose a method for computing the normalised compression distance, where we account for correlation betw...


    S. Anitha Elavarasi


    Full Text Available Learning is the process of generating useful information from a huge volume of data. Learning can be either supervised learning (e.g. classification or unsupervised learning (e.g. Clustering Clustering is the process of grouping a set of physical objects into classes of similar object. Objects in real world consist of both numerical and categorical data. Categorical data are not analyzed as numerical data because of the absence of inherit ordering. This paper describes about ten different clustering algorithms, its methodology and the factors influencing its performance. Each algorithm is evaluated using real world datasets and its pro and cons are specified. The various similarity / dissimilarity measure applied to categorical data and its performance is also discussed. The time complexity defines the amount of time taken by an algorithm to perform the elementary operation. The time complexity of various algorithms are discussed and its performance on real world data such as mushroom, zoo, soya bean, cancer, vote, car and iris are measured. In this survey Cluster Accuracy and Error rate for four different clustering algorithm (K-modes, fuzzy K-modes, ROCK and Squeezer, two different similarity measure (DISC and Overlap and DILCA applied for hierarchy and partition algorithm are evaluated.

  3. Locally adaptive Nakagami-based ultrasound similarity measures.

    Wachinger, Christian; Klein, Tassilo; Navab, Nassir


    The derivation of statistically optimal similarity measures for intensity-based registration is possible by modeling the underlying image noise distribution. The parameters of these distributions are, however, commonly set heuristically across all images. In this article, we show that the estimation of the parameters on the present images largely improves the registration, which is a consequence of the more accurate characterization of the image noise. More precisely, instead of having constant parameters over the entire image domain, we estimate them on patches, leading to a local adaptation of the similarity measure. While this basic idea of creating locally adaptive metrics is interesting for various fields of application, we present the derivation for ultrasound imaging. The domain of ultrasound is particularly appealing for this approach, due to the inherent contamination with speckle noise. Furthermore, there exist detailed analyses of suitable noise distributions in the literature. We present experiments for applying a bivariate Nakagami distribution that facilitates modeling of several scattering scenarios prominent in medical ultrasound. Depending on the number of scatterers per resolution cell and the presence of coherent structures, different Nakagami parameters are required to obtain a valid approximation of the intensity statistics and to account for distributional locality. Our registration results on radio-frequency ultrasound data confirm the theoretical necessity for a spatial adaptation of similarity metrics.

  4. SemioSem: A Semiotic-Based Similarity Measure

    Aimé, Xavier; Furst, Frédéric; Kuntz, Pascale; Trichet, Francky

    This paper introduces a new similarity measure called SemioSem. The first originality of this measure, which is defined in the context of a semiotic-based approach, is to consider the three dimensions of the conceptualization underlying a domain ontology: the intension (i.e. the properties used to define the concepts), the extension (i.e. the instances of the concepts) and the expression (i.e. the terms used to denote both the concepts and the instances). Thus, SemioSem aims at aggregating and improving existing extensional-based and intensional-based measures, with an original expressional one. The second originality of this measure is to be context-sensitive, and in particular user-sensitive. Indeed, SemioSem is based on multiple informations sources: (1) a textual corpus, validated by the end-user, which must reflect the domain underlying the ontology which is considered, (2) a set of instances known by the end-user, (3) an ontology enriched with the perception of the end-user on how each property associated to a concept c is important for defining c and (4) the emotional state of the end-user. The importance of each source can be modulated according to the context of use and SemioSem remains valid even if one of the source is missing. This makes our measure more flexible, more robust and more close to the end-user's judgment than the other similarity measures which are usually only based on one aspect of a conceptualization and never take the end-user's perceptions and purposes into account.

  5. Semantic Analysis of Tag Similarity Measures in Collaborative Tagging Systems

    Cattuto, Ciro; Hotho, Andreas; Stumme, Gerd


    Social bookmarking systems allow users to organise collections of resources on the Web in a collaborative fashion. The increasing popularity of these systems as well as first insights into their emergent semantics have made them relevant to disciplines like knowledge extraction and ontology learning. The problem of devising methods to measure the semantic relatedness between tags and characterizing it semantically is still largely open. Here we analyze three measures of tag relatedness: tag co-occurrence, cosine similarity of co-occurrence distributions, and FolkRank, an adaptation of the PageRank algorithm to folksonomies. Each measure is computed on tags from a large-scale dataset crawled from the social bookmarking system To provide a semantic grounding of our findings, a connection to WordNet (a semantic lexicon for the English language) is established by mapping tags into synonym sets of WordNet, and applying there well-known metrics of semantic similarity. Our results clearly expose differe...

  6. Performance evaluation of similarity measures for dense multimodal stereovision

    Yaman, Mustafa; Kalkan, Sinan


    Multimodal imaging systems have recently been drawing attention in fields such as medical imaging, remote sensing, and video surveillance systems. In such systems, estimating depth has become possible due to the promising progress of multimodal matching techniques. We perform a systematic performance evaluation of similarity measures frequently used in the literature for dense multimodal stereovision. The evaluated measures include mutual information (MI), sum of squared distances, normalized cross-correlation, census transform, local self-similarity (LSS) as well as descriptors adopted to multimodal settings, like scale invariant feature transform (SIFT), speeded-up robust features (SURF), histogram of oriented gradients (HOG), binary robust independent elementary features, and fast retina keypoint (FREAK). We evaluate the measures over datasets we generated, compiled, and provided as a benchmark and compare the performances using the Winner Takes All method. The datasets are (1) synthetically modified four popular pairs from the Middlebury Stereo Dataset (namely, Tsukuba, Venus, Cones, and Teddy) and (2) our own multimodal image pairs acquired using the infrared and the electro-optical cameras of a Kinect device. The results show that MI and HOG provide promising results for multimodal imagery, and FREAK, SURF, SIFT, and LSS can be considered as alternatives depending on the multimodality level and the computational complexity requirements of the intended application.

  7. Wavelet matrix transform for time-series similarity measurement

    HU Zhi-kun; XU Fei; GUI Wei-hua; YANG Chun-hua


    A time-series similarity measurement method based on wavelet and matrix transform was proposed, and its anti-noise ability, sensitivity and accuracy were discussed. The time-series sequences were compressed into wavelet subspace, and sample feature vector and orthogonal basics of sample time-series sequences were obtained by K-L transform. Then the inner product transform was carried out to project analyzed time-series sequence into orthogonal basics to gain analyzed feature vectors. The similarity was calculated between sample feature vector and analyzed feature vector by the Euclid distance. Taking fault wave of power electronic devices for example, the experimental results show that the proposed method has low dimension of feature vector, the anti-noise ability of proposed method is 30 times as large as that of plain wavelet method, the sensitivity of proposed method is 1/3 as large as that of plain wavelet method, and the accuracy of proposed method is higher than that of the wavelet singular value decomposition method. The proposed method can be applied in similarity matching and indexing for lager time series databases.

  8. Fractal Video Coding Using Fast Normalized Covariance Based Similarity Measure

    Ravindra E. Chaudhari


    Full Text Available Fast normalized covariance based similarity measure for fractal video compression with quadtree partitioning is proposed in this paper. To increase the speed of fractal encoding, a simplified expression of covariance between range and overlapped domain blocks within a search window is implemented in frequency domain. All the covariance coefficients are normalized by using standard deviation of overlapped domain blocks and these are efficiently calculated in one computation by using two different approaches, namely, FFT based and sum table based. Results of these two approaches are compared and they are almost equal to each other in all aspects, except the memory requirement. Based on proposed simplified similarity measure, gray level transformation parameters are computationally modified and isometry transformations are performed using rotation/reflection properties of IFFT. Quadtree decompositions are used for the partitions of larger size of range block, that is, 16 × 16, which is based on target level of motion compensated prediction error. Experimental result shows that proposed method can increase the encoding speed and compression ratio by 66.49% and 9.58%, respectively, as compared to NHEXS method with increase in PSNR by 0.41 dB. Compared to H.264, proposed method can save 20% of compression time with marginal variation in PSNR and compression ratio.

  9. Sentence based semantic similarity measure for blog-posts

    Aziz, Mehwish


    Blogs-Online digital diary like application on web 2.0 has opened new and easy way to voice opinion, thoughts, and like-dislike of every Internet user to the World. Blogosphere has no doubt the largest user-generated content repository full of knowledge. The potential of this knowledge is still to be explored. Knowledge discovery from this new genre is quite difficult and challenging as it is totally different from other popular genre of web-applications like World Wide Web (WWW). Blog-posts unlike web documents are small in size, thus lack in context and contain relaxed grammatical structures. Hence, standard text similarity measure fails to provide good results. In this paper, specialized requirements for comparing a pair of blog-posts is thoroughly investigated. Based on this we proposed a novel algorithm for sentence oriented semantic similarity measure of a pair of blog-posts. We applied this algorithm on a subset of political blogosphere of Pakistan, to cluster the blogs on different issues of political...

  10. The Edit Distance as a Measure of Perceived Rhythmic Similarity

    Olaf Post


    Full Text Available The ‘edit distance’ (or ‘Levenshtein distance’ measure of distance between two data sets is defined as the minimum number of editing operations – insertions, deletions, and substitutions – that are required to transform one data set to the other (Orpen and Huron, 1992. This measure of distance has been applied frequently and successfully in music information retrieval, but rarely in predicting human perception of distance. In this study, we investigate the effectiveness of the edit distance as a predictor of perceived rhythmic dissimilarity under simple rhythmic alterations. Approaching rhythms as a set of pulses that are either onsets or silences, we study two types of alterations. The first experiment is designed to test the model’s accuracy for rhythms that are relatively similar; whether rhythmic variations with the same edit distance to a source rhythm are also perceived as relatively similar by human subjects. In addition, we observe whether the salience of an edit operation is affected by its metric placement in the rhythm. Instead of using a rhythm that regularly subdivides a 4/4 meter, our source rhythm is a syncopated 16-pulse rhythm, the son. Results show a high correlation between the predictions by the edit distance model and human similarity judgments (r = 0.87; a higher correlation than for the well-known generative theory of tonal music (r = 0.64. In the second experiment, we seek to assess the accuracy of the edit distance model in predicting relatively dissimilar rhythms. The stimuli used are random permutations of the son’s inter-onset intervals: 3-3-4-2-4. The results again indicate that the edit distance correlates well with the perceived rhythmic dissimilarity judgments of the subjects (r = 0.76. To gain insight in the relationships between the individual rhythms, the results are also presented by means of graphic phylogenetic trees.

  11. Psychophysical similarity measure based on multi-dimensional scaling for retrieval of similar images of breast masses on mammograms

    Nishimura, Kohei; Muramatsu, Chisako; Oiwa, Mikinao; Shiraiwa, Misaki; Endo, Tokiko; Doi, Kunio; Fujita, Hiroshi


    For retrieving reference images which may be useful to radiologists in their diagnosis, it is necessary to determine a reliable similarity measure which would agree with radiologists' subjective impression. In this study, we propose a new similarity measure for retrieval of similar images, which may assist radiologists in the distinction between benign and malignant masses on mammograms, and investigated its usefulness. In our previous study, to take into account the subjective impression, the psychophysical similarity measure was determined by use of an artificial neural network (ANN), which was employed to learn the relationship between radiologists' subjective similarity ratings and image features. In this study, we propose a psychophysical similarity measure based on multi-dimensional scaling (MDS) in order to improve the accuracy in retrieval of similar images. Twenty-seven images of masses, 3 each from 9 different pathologic groups, were selected, and the subjective similarity ratings for all possible 351 pairs were determined by 8 expert physicians. MDS was applied using the average subjective ratings, and the relationship between each output axis and image features was modeled by the ANN. The MDS-based psychophysical measures were determined by the distance in the modeled space. With a leave-one-out test method, the conventional psychophysical similarity measure was moderately correlated with subjective similarity ratings (r=0.68), whereas the psychophysical measure based on MDS was highly correlated (r=0.81). The result indicates that a psychophysical similarity measure based on MDS would be useful in the retrieval of similar images.


    R. Malini


    Full Text Available This study aims to increase the retrieval efficiency of proposed image retrieval system on the basis of color content. A new idea of feature extraction based on color perception histogram is proposed. First, the color histogram is constructed for HSV image. Secondly, the true color and grey color components are identified based on hue and intensity. The weight for true and grey color components is calculated using NBS distance. An updated histogram is constructed using weighted true and grey color values. The color features extracted from the updated histogram of query image and for all the images in image database are compared with existing color histogram based technique by using multiple similarity measures. Experimental results show that proposed image retrieval based on the color perception histogram gives higher retrieval performance in terms of high average precision and average recall with less computational complexity.

  13. Similarity analysis between chromosomes of Homo sapiens and monkeys with correlation coefficient, rank correlation coefficient and cosine similarity measures.

    Someswara Rao, Chinta; Viswanadha Raju, S


    In this paper, we consider correlation coefficient, rank correlation coefficient and cosine similarity measures for evaluating similarity between Homo sapiens and monkeys. We used DNA chromosomes of genome wide genes to determine the correlation between the chromosomal content and evolutionary relationship. The similarity among the H. sapiens and monkeys is measured for a total of 210 chromosomes related to 10 species. The similarity measures of these different species show the relationship between the H. sapiens and monkey. This similarity will be helpful at theft identification, maternity identification, disease identification, etc.

  14. Multiscale Hybrid Nonlocal Means Filtering Using Modified Similarity Measure

    Zahid Hussain Shamsi


    Full Text Available A new multiscale implementation of nonlocal means filtering (MHNLM for image denoising is proposed. The proposed algorithm also introduces a modification of the similarity measure for patch comparison. Assuming the patch as an oriented surface, the notion of a normal vectors patch is introduced. The inner product of these normal vectors patches is defined and then used in the weighted Euclidean distance of intensity patches as the weight factor. The algorithm involves two steps: the first step is a multiscale implementation of an accelerated nonlocal means filtering in the discrete stationary wavelet domain to obtain a refined version of the noisy patches for later comparison. The next step is to apply the proposed modification of standard nonlocal means filtering to the noisy image using the reference patches obtained in the first step. These refined patches contain less noise, and consequently the computation of normal vectors and partial derivatives is more precise. Experimental results show equivalent or better performance of the proposed algorithm compared to various state-of-the-art algorithms.

  15. Concept Vector for Similarity Measurement Based on Hierarchical Domain Structure

    Hong Zhe Liu; Hong Bao; Xu


    The concept vector model generalizes standard representations of similarity concept in terms of tree-like structure. In the model, each concept node in the hierarchical tree has ancestor and descendent concept nodes composing its relevancy nodes, thus a concept node is represented as a concept vector according to its relevancy nodes' density and the similarity of the two concepts is obtained by computing cosine similarity between their vectors. In addition, the model is adjusted in terms of l...

  16. Evolution of workflow similarity measures in service discovery

    Wombacher, Andreas; Rozie, M.; Schoop, M.; Huemer, C.; Rebstock, M.; Bichler, M.


    Service discovery of state-dependent services has to take workflow aspects into account. To increase the usability of a query result, the results should be ordered with regard to their relevance, that is, the similarity of the query and the result list entry. Although there exist service discovery s

  17. Piloting an empirical study on measures for workflow similarity

    Wombacher, Andreas; Rozie, M.

    Service discovery of state dependent services has to take workflow aspects into account. To increase the usability of a service discovery, the result list of services should be ordered with regard to the relevance of the services. Means of ordering a list of workflows due to their similarity with

  18. Investigation of psychophysical similarity measures for selection of similar images in the diagnosis of clustered microcalcifications on mammograms.

    Muramatsu, Chisako; Li, Qiang; Schmidt, Robert; Shiraishi, Junji; Doi, Kunio


    The presentation of images with lesions of known pathology that are similar to an unknown lesion may be helpful to radiologists in the diagnosis of challenging cases for improving the diagnostic accuracy and also for reducing variation among different radiologists. The authors have been developing a computerized scheme for automatically selecting similar images with clustered microcalcifications on mammograms from a large database. For similar images to be useful, they must be similar from the point of view of the diagnosing radiologists. In order to select such images, subjective similarity ratings were obtained for a number of pairs of clustered microcalcifications by breast radiologists for establishment of a "gold standard" of image similarity, and the gold standard was employed for determination and evaluation of the selection of similar images. The images used in this study were obtained from the Digital Database for Screening Mammography developed by the University of South Florida. The subjective similarity ratings for 300 pairs of images with clustered microcalcifications were determined by ten breast radiologists. The authors determined a number of image features which represent the characteristics of clustered microcalcifications that radiologists would use in their diagnosis. For determination of objective similarity measures, an artificial neural network (ANN) was employed. The ANN was trained with the average subjective similarity ratings as teacher and selected image features as input data. The ANN was trained to learn the relationship between the image features and the radiologists' similarity ratings; therefore, once the training was completed, the ANN was able to determine the similarity, called a psychophysical similarity measure, which was expected to be close to radiologists' impressions, for an unknown pair of clustered microcalcifications. By use of a leave-one-out test method, the best combination of features was selected. The correlation

  19. A Feature-Based Structural Measure: An Image Similarity Measure for Face Recognition

    Noor Abdalrazak Shnain


    Full Text Available Facial recognition is one of the most challenging and interesting problems within the field of computer vision and pattern recognition. During the last few years, it has gained special attention due to its importance in relation to current issues such as security, surveillance systems and forensics analysis. Despite this high level of attention to facial recognition, the success is still limited by certain conditions; there is no method which gives reliable results in all situations. In this paper, we propose an efficient similarity index that resolves the shortcomings of the existing measures of feature and structural similarity. This measure, called the Feature-Based Structural Measure (FSM, combines the best features of the well-known SSIM (structural similarity index measure and FSIM (feature similarity index measure approaches, striking a balance between performance for similar and dissimilar images of human faces. In addition to the statistical structural properties provided by SSIM, edge detection is incorporated in FSM as a distinctive structural feature. Its performance is tested for a wide range of PSNR (peak signal-to-noise ratio, using ORL (Olivetti Research Laboratory, now AT&T Laboratory Cambridge and FEI (Faculty of Industrial Engineering, São Bernardo do Campo, São Paulo, Brazil databases. The proposed measure is tested under conditions of Gaussian noise; simulation results show that the proposed FSM outperforms the well-known SSIM and FSIM approaches in its efficiency of similarity detection and recognition of human faces.

  20. Density-based similarity measures for content based search

    Hush, Don R [Los Alamos National Laboratory; Porter, Reid B [Los Alamos National Laboratory; Ruggiero, Christy E [Los Alamos National Laboratory


    We consider the query by multiple example problem where the goal is to identify database samples whose content is similar to a coUection of query samples. To assess the similarity we use a relative content density which quantifies the relative concentration of the query distribution to the database distribution. If the database distribution is a mixture of the query distribution and a background distribution then it can be shown that database samples whose relative content density is greater than a particular threshold {rho} are more likely to have been generated by the query distribution than the background distribution. We describe an algorithm for predicting samples with relative content density greater than {rho} that is computationally efficient and possesses strong performance guarantees. We also show empirical results for applications in computer network monitoring and image segmentation.

  1. Map similarity measurement and its application to the Sado estuary

    Sandra Caeiro


    Full Text Available In the past thirty years GIS technology has progressed from computer mapping to spatial database management, and more recently, to quantitative map analysis and modeling. However, most applications still rely on visual analysis for determining similarity within and among maps. The aim of this study is to compare maps of homogenous areas computed from estuarine sediment characterization indicators, using different approaches. These maps were defined using three different interpolation methods. Different Kappa statistics, visual map overlays or components of agreement and disagreement due to chance, quantity and location were used for single cell and/or neighborhood (hard and soft map comparison. Although the three methods were computed with different statistical techniques, their results are similar, supporting the choice of any of the methods as equivalent and thus of equal value to be used as management units of the estuary. Hence the significance of choosing one of the methods is reduced.

  2. Adaptive Sampling for High Throughput Data Using Similarity Measures

    Bulaevskaya, V. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Sales, A. P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)


    The need for adaptive sampling arises in the context of high throughput data because the rates of data arrival are many orders of magnitude larger than the rates at which they can be analyzed. A very fast decision must therefore be made regarding the value of each incoming observation and its inclusion in the analysis. In this report we discuss one approach to adaptive sampling, based on the new data point’s similarity to the other data points being considered for inclusion. We present preliminary results for one real and one synthetic data set.

  3. [Evaluation and improvement of a measure of drug name similarity, vwhtfrag, in relation to subjective similarities and experimental error rates].

    Tamaki, Hirofumi; Satoh, Hiroki; Hori, Satoko; Sawada, Yasufumi


    Confusion of drug names is one of the most common causes of drug-related medical errors. A similarity measure of drug names, "vwhtfrag", was developed to discriminate whether drug name pairs are likely to cause confusion errors, and to provide information that would be helpful to avoid errors. The aim of the present study was to evaluate and improve vwhtfrag. Firstly, we evaluated the correlation of vwhtfrag with subjective similarity or error rate of drug name pairs in psychological experiments. Vwhtfrag showed a higher correlation to subjective similarity (college students: r=0.84) or error rate than did other conventional similarity measures (htco, cos1, edit). Moreover, name pairs that showed coincidences of the initial character strings had a higher subjective similarity than those which had coincidences of the end character strings and had the same vwhtfrag. Therefore, we developed a new similarity measure (vwhtfrag+), in which coincidence of initial character strings in name pairs is weighted by 1.53 times over coincidence of end character strings. Vwhtfrag+ showed a higher correlation to subjective similarity than did unmodified vwhtfrag. Further studies appear warranted to examine in detail whether vwhtfrag+ has superior ability to discriminate drug name pairs likely to cause confusion errors.

  4. simDEF: definition-based semantic similarity measure of gene ontology terms for functional similarity analysis of genes.

    Pesaranghader, Ahmad; Matwin, Stan; Sokolova, Marina; Beiko, Robert G


    Measures of protein functional similarity are essential tools for function prediction, evaluation of protein-protein interactions (PPIs) and other applications. Several existing methods perform comparisons between proteins based on the semantic similarity of their GO terms; however, these measures are highly sensitive to modifications in the topological structure of GO, tend to be focused on specific analytical tasks and concentrate on the GO terms themselves rather than considering their textual definitions. We introduce simDEF, an efficient method for measuring semantic similarity of GO terms using their GO definitions, which is based on the Gloss Vector measure commonly used in natural language processing. The simDEF approach builds optimized definition vectors for all relevant GO terms, and expresses the similarity of a pair of proteins as the cosine of the angle between their definition vectors. Relative to existing similarity measures, when validated on a yeast reference database, simDEF improves correlation with sequence homology by up to 50%, shows a correlation improvement >4% with gene expression in the biological process hierarchy of GO and increases PPI predictability by > 2.5% in F1 score for molecular function hierarchy. Datasets, results and source code are available at CONTACT: or Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail:

  5. A New Similarity Measure between Intuitionistic Fuzzy Sets and Its Application to Pattern Recognition

    Yafei Song


    Full Text Available As a generation of ordinary fuzzy set, the concept of intuitionistic fuzzy set (IFS, characterized both by a membership degree and by a nonmembership degree, is a more flexible way to cope with the uncertainty. Similarity measures of intuitionistic fuzzy sets are used to indicate the similarity degree between intuitionistic fuzzy sets. Although many similarity measures for intuitionistic fuzzy sets have been proposed in previous studies, some of those cannot satisfy the axioms of similarity or provide counterintuitive cases. In this paper, a new similarity measure and weighted similarity measure between IFSs are proposed. It proves that the proposed similarity measures satisfy the properties of the axiomatic definition for similarity measures. Comparison between the previous similarity measures and the proposed similarity measure indicates that the proposed similarity measure does not provide any counterintuitive cases. Moreover, it is demonstrated that the proposed similarity measure is capable of discriminating difference between patterns.

  6. Collaborative Personalized Web Recommender System using Entropy based Similarity Measure

    Mehta, Harita; Bedi, Punam; Dixit, V S


    On the internet, web surfers, in the search of information, always strive for recommendations. The solutions for generating recommendations become more difficult because of exponential increase in information domain day by day. In this paper, we have calculated entropy based similarity between users to achieve solution for scalability problem. Using this concept, we have implemented an online user based collaborative web recommender system. In this model based collaborative system, the user session is divided into two levels. Entropy is calculated at both the levels. It is shown that from the set of valuable recommenders obtained at level I; only those recommenders having lower entropy at level II than entropy at level I, served as trustworthy recommenders. Finally, top N recommendations are generated from such trustworthy recommenders for an online user.

  7. Similarity Measures, Author Cocitation Analysis, and Information Theory

    Leydesdorff, Loet


    The use of Pearson's correlation coefficient in Author Cocitation Analysis was compared with Salton's cosine measure in a number of recent contributions. Unlike the Pearson correlation, the cosine is insensitive to the number of zeros. However, one has the option of applying a logarithmic transformation in correlation analysis. Information calculus is based on both the logarithmic transformation and provides a non-parametric statistics. Using this methodology one can cluster a document set in...

  8. Similarity Measures, Author Cocitation Analysis, and Information Theory

    Leydesdorff, Loet


    The use of Pearson's correlation coefficient in Author Cocitation Analysis was compared with Salton's cosine measure in a number of recent contributions. Unlike the Pearson correlation, the cosine is insensitive to the number of zeros. However, one has the option of applying a logarithmic transformation in correlation analysis. Information calculus is based on both the logarithmic transformation and provides a non-parametric statistics. Using this methodology one can cluster a document set in a precise way and express the differences in terms of bits of information. The algorithm is explained and used on the data set which was made the subject of this discussion.

  9. Strong Similarity Measures for Ordered Sets of Documents in Information Retrieval.

    Egghe, L.; Michel, Christine


    Presents a general method to construct ordered similarity measures in information retrieval based on classical similarity measures for ordinary sets. Describes a test of some of these measures in an information retrieval system that extracted ranked document sets and discuses the practical usability of the ordered similarity measures. (Author/LRW)

  10. Improved cosine similarity measures of simplified neutrosophic sets for medical diagnoses.

    Ye, Jun


    In pattern recognition and medical diagnosis, similarity measure is an important mathematical tool. To overcome some disadvantages of existing cosine similarity measures of simplified neutrosophic sets (SNSs) in vector space, this paper proposed improved cosine similarity measures of SNSs based on cosine function, including single valued neutrosophic cosine similarity measures and interval neutrosophic cosine similarity measures. Then, weighted cosine similarity measures of SNSs were introduced by taking into account the importance of each element. Further, a medical diagnosis method using the improved cosine similarity measures was proposed to solve medical diagnosis problems with simplified neutrosophic information. The improved cosine similarity measures between SNSs were introduced based on cosine function. Then, we compared the improved cosine similarity measures of SNSs with existing cosine similarity measures of SNSs by numerical examples to demonstrate their effectiveness and rationality for overcoming some shortcomings of existing cosine similarity measures of SNSs in some cases. In the medical diagnosis method, we can find a proper diagnosis by the cosine similarity measures between the symptoms and considered diseases which are represented by SNSs. Then, the medical diagnosis method based on the improved cosine similarity measures was applied to two medical diagnosis problems to show the applications and effectiveness of the proposed method. Two numerical examples all demonstrated that the improved cosine similarity measures of SNSs based on the cosine function can overcome the shortcomings of the existing cosine similarity measures between two vectors in some cases. By two medical diagnoses problems, the medical diagnoses using various similarity measures of SNSs indicated the identical diagnosis results and demonstrated the effectiveness and rationality of the diagnosis method proposed in this paper. The improved cosine measures of SNSs based on cosine

  11. C-Rank: A Link-based Similarity Measure for Scientific Literature Databases

    Yoon, Seok-Ho; Park, Sunju


    As the number of people who use scientific literature databases grows, the demand for literature retrieval services has been steadily increased. One of the most popular retrieval services is to find a set of papers similar to the paper under consideration, which requires a measure that computes similarities between papers. Scientific literature databases exhibit two interesting characteristics that are different from general databases. First, the papers cited by old papers are often not included in the database due to technical and economic reasons. Second, since a paper references the papers published before it, few papers cite recently-published papers. These two characteristics cause all existing similarity measures to fail in at least one of the following cases: (1) measuring the similarity between old, but similar papers, (2) measuring the similarity between recent, but similar papers, and (3) measuring the similarity between two similar papers: one old, the other recent. In this paper, we propose a new ...

  12. Similarity Measures and Entropy for Vague Sets%Vague相似度量与Vague熵

    李艳红; 迟忠先; 阎德勤


    The paper draws comparison and analysis among present similarity measure methods in the case of similari-ty measures between Vague values, provides a new similarity measure method, of which discusses on the normalcharacteristics, gives some relative character theorems. At the same time, it analyzes the application of fuzzy similari-ty measures in vague similarity measures and gives its normal forms such as similarity measures between Vague sets,between elements and their weighted similarity measures. Finally, vague entropy rule respectively aiming at twokinds of cases is approached and its corresponding vague entropy expressions is provided. The content of this paper isof practical significance in such fields as fuzzy decision-making, vague clustering, pattern recognition, data miningetc.

  13. Comparison of Various Similarity Measures for Average Image Hash in Mobile Phone Application

    Farisa Chaerul Haviana, Sam; Taufik, Muhammad


    One of the main issue in Content Based Image Retrieval (CIBR) is similarity measures for resulting image hashes. The main key challenge is to find the most benefits distance or similarity measures for calculating the similarity in term of speed and computing costs, specially under limited computing capabilities device like mobile phone. This study we utilize twelve most common and popular distance or similarity measures technique implemented in mobile phone application, to be compared and studied. The results show that all similarity measures implemented in this study was perform equally under mobile phone application. This gives more possibilities for method combinations to be implemented for image retrieval.

  14. Evaluation of discrimination measures to characterize spectrally similar leaves of African Savannah trees

    Dudeni, N


    Full Text Available in establishing similarities between spectra and are also functional in identification of vegetation types. The stochastic spectral similarity measures such as spectral information divergence (SID) describe the spectral prosperities essential for discriminating...

  15. A New Similarity Measure between Intuitionistic Fuzzy Sets and Its Application to Pattern Recognition

    Yafei Song; Xiaodan Wang; Lei Lei; Aijun Xue


    As a generation of ordinary fuzzy set, the concept of intuitionistic fuzzy set (IFS), characterized both by a membership degree and by a nonmembership degree, is a more flexible way to cope with the uncertainty. Similarity measures of intuitionistic fuzzy sets are used to indicate the similarity degree between intuitionistic fuzzy sets. Although many similarity measures for intuitionistic fuzzy sets have been proposed in previous studies, some of those cannot satisfy the axioms of similarity ...

  16. An alternative approach to measure similarity between two deterministic transient signals

    Shin, Kihong


    In many practical engineering applications, it is often required to measure the similarity of two signals to gain insight into the conditions of a system. For example, an application that monitors machinery can regularly measure the signal of the vibration and compare it to a healthy reference signal in order to monitor whether or not any fault symptom is developing. Also in modal analysis, a frequency response function (FRF) from a finite element model (FEM) is often compared with an FRF from experimental modal analysis. Many different similarity measures are applicable in such cases, and correlation-based similarity measures may be most frequently used among these such as in the case where the correlation coefficient in the time domain and the frequency response assurance criterion (FRAC) in the frequency domain are used. Although correlation-based similarity measures may be particularly useful for random signals because they are based on probability and statistics, we frequently deal with signals that are largely deterministic and transient. Thus, it may be useful to develop another similarity measure that takes the characteristics of the deterministic transient signal properly into account. In this paper, an alternative approach to measure the similarity between two deterministic transient signals is proposed. This newly proposed similarity measure is based on the fictitious system frequency response function, and it consists of the magnitude similarity and the shape similarity. Finally, a few examples are presented to demonstrate the use of the proposed similarity measure.

  17. Measurement of Characteristic Self-Similarity and Self-Diversity for Complex Mechanical Systems

    ZHOU Meili; LAI Jiangfeng


    Based on similarity science and complex system theory, a new concept of characteristic self-diversity and corresponding relations between self-similarity and self-diversity for complex mechanical systems are presented in this paper. Methods of system self-similarity and self-diversity measure between main system and sub-system are studied. Numerical calculations show that the characteristic self-similarity and self-diversity measure method is validity. A new theory and method of self-similarity and self-diversity measure for complexity mechanical system is presented.

  18. A Novel Method for Spectral Similarity Measure by Fusing Shape and Amplitude Features

    J. G. Ding


    Full Text Available Spectral similarity measure is the basis of spectral information extraction. The description of spectral features is the key to spectral similarity measure. To express the spectral shape and amplitude features reasonably, this paper presents the definition of shape and amplitude feature vector, constructs the shape feature distance vector and amplitude feature distance vector, proposes the spectral similarity measure by fusing shape and amplitude features (SAF, and discloses the relationship of fusing SAF with Euclidean distance and spectral information divergence. Different measures were tested on the basis of United States Geological Survey (USGS mineral_beckman_430. Generally, measures by integrating SAF achieve the highest accuracy, followed by measures based on shape features and measures based on amplitude features. In measures by integrating SAF, fusing SAF shows the highest accuracy. Fusing SAF expresses the measured results with the inner product of shape and amplitude feature distance vectors, which integrate spectral shape and amplitude features well. Fusing SAF is superior to other similarity measures that integrate SAF, such as spectral similarity scale, spectral pan-similarity measure, and normalized spectral similarity score(NS3 .

  19. A new gene ontology-based measure for the functional similarity of gene products

    QI Guo-long; QIAN Shi-yu; FANG Ji-qian


    Background Although biomedical ontologies have standardized the representation of gene products across species and databases,a method for determining the functional similarities of gene products has not yet been developed.Methods We proposed a new semantic similarity measure based on Gene Ontology that considers the semantic influences from all of the ancestor terms in a graph.Our measure was compared with Resnik's measure in two applications,which were based on the association of the measure used with the gene co-expression and the proteinprotein interactions.Results The results showed a considerable association between the semantic similarity and the expression correlation and between the semantic similarity and the protein-protein interactions,and our measure performed the best overall.Conclusion These results revealed the potential value of our newly proposed semantic similarity measure in studying the functional relevance of gene products.

  20. Evaluating the effect of annotation size on measures of semantic similarity

    Kulmanov, Maxat


    Background: Ontologies are widely used as metadata in biological and biomedical datasets. Measures of semantic similarity utilize ontologies to determine how similar two entities annotated with classes from ontologies are, and semantic similarity is increasingly applied in applications ranging from diagnosis of disease to investigation in gene networks and functions of gene products.

  1. Comparison of human face matching behavior and computational image similarity measure

    CHEN WenFeng; LIU ChangHong; LANDER Karen; FU XiaoLan


    Computational similarity measures have been evaluated in a variety of ways, but few of the validated computational measures are based on a high-level, cognitive criterion of objective similarity. In this paper, we evaluate two popular objective similarity measures by comparing them with face matching performance In human observers. The results suggest that these measures are still limited in predicting human behavior, especially In rejection behavior, but objective measure taking advantage of global and local face characteristics may improve the prediction. It is also suggested that human may set different criterions for "hit" and "rejection" and this may provide implications for biologically-inspired computational systems.

  2. TopoICSim: a new semantic similarity measure based on gene ontology.

    Ehsani, Rezvan; Drabløs, Finn


    The Gene Ontology (GO) is a dynamic, controlled vocabulary that describes the cellular function of genes and proteins according to tree major categories: biological process, molecular function and cellular component. It has become widely used in many bioinformatics applications for annotating genes and measuring their semantic similarity, rather than their sequence similarity. Generally speaking, semantic similarity measures involve the GO tree topology, information content of GO terms, or a combination of both. Here we present a new semantic similarity measure called TopoICSim (Topological Information Content Similarity) which uses information on the specific paths between GO terms based on the topology of the GO tree, and the distribution of information content along these paths. The TopoICSim algorithm was evaluated on two human benchmark datasets based on KEGG pathways and Pfam domains grouped as clans, using GO terms from either the biological process or molecular function. The performance of the TopoICSim measure compared favorably to five existing methods. Furthermore, the TopoICSim similarity was also tested on gene/protein sets defined by correlated gene expression, using three human datasets, and showed improved performance compared to two previously published similarity measures. Finally we used an online benchmarking resource which evaluates any similarity measure against a set of 11 similarity measures in three tests, using gene/protein sets based on sequence similarity, Pfam domains, and enzyme classifications. The results for TopoICSim showed improved performance relative to most of the measures included in the benchmarking, and in particular a very robust performance throughout the different tests. The TopoICSim similarity measure provides a competitive method with robust performance for quantification of semantic similarity between genes and proteins based on GO annotations. An R script for TopoICSim is available at .

  3. How to Normalize Co-Occurrence Data? An Analysis of Some Well-Known Similarity Measures

    N.J.P. van Eck (Nees Jan); L. Waltman (Ludo)


    textabstractIn scientometric research, the use of co-occurrence data is very common. In many cases, a similarity measure is employed to normalize the data. However, there is no consensus among researchers on which similarity measure is most appropriate for normalization purposes. In this paper, we t

  4. A comparison of symbolic similarity measures for finding occurrences of melodic segments

    Janssen, Berit; van Kranenburg, Peter; Volk, A.


    To find occurrences of melodic segments, such as themes, phrases and motifs, in musical works, a well-performing similarity measure is needed to support human analysis of large music corpora. We evaluate the performance of a range of melodic similarity measures to find occurrences of phrases in folk

  5. Similarity measure learning in closed-form solution for image classification.

    Chen, Jing; Tang, Yuan Yan; Chen, C L Philip; Fang, Bin; Shang, Zhaowei; Lin, Yuewei


    Adopting a measure is essential in many multimedia applications. Recently, distance learning is becoming an active research problem. In fact, the distance is the natural measure for dissimilarity. Generally, a pairwise relationship between two objects in learning tasks includes two aspects: similarity and dissimilarity. The similarity measure provides different information for pairwise relationships. However, similarity learning has been paid less attention in learning problems. In this work, firstly, we propose a general framework for similarity measure learning (SML). Additionally, we define a generalized type of correlation as a similarity measure. By a set of parameters, generalized correlation provides flexibility for learning tasks. Based on this similarity measure, we present a specific algorithm under the SML framework, called correlation similarity measure learning (CSML), to learn a parameterized similarity measure over input space. A nonlinear extension version of CSML, kernel CSML, is also proposed. Particularly, we give a closed-form solution avoiding iterative search for a local optimal solution in the high-dimensional space as the previous work did. Finally, classification experiments have been performed on face databases and a handwritten digits database to demonstrate the efficiency and reliability of CSML and KCSML.

  6. Several Similarity Measures of Interval Valued Neutrosophic Soft Sets and Their Application in Pattern Recognition Problems

    Anjan Mukherjee


    Full Text Available Interval valued neutrosophic soft set introduced by Irfan Deli in 2014[8] is a generalization of neutrosophic set introduced by F. Smarandache in 1995[19], which can be used in real scientific and engineering applications. In this paper the Hamming and Euclidean distances between two interval valued neutrosophic soft sets (IVNS sets are defined and similarity measures based on distances between two interval valued neutrosophic soft sets are proposed. Similarity measure based on set theoretic approach is also proposed. Some basic properties of similarity measures between two interval valued neutrosophic soft sets is also studied. A decision making method is established for interval valued neutrosophic soft set setting using similarity measures between IVNS sets. Finally an example is given to demonstrate the possible application of similarity measures in pattern recognition problems.

  7. CCor: A whole genome network-based similarity measure between two genes.

    Hu, Yiming; Zhao, Hongyu


    Measuring the similarity between genes is often the starting point for building gene regulatory networks. Most similarity measures used in practice only consider pairwise information with a few also consider network structure. Although theoretical properties of pairwise measures are well understood in the statistics literature, little is known about their statistical properties of those similarity measures based on network structure. In this article, we consider a new whole genome network-based similarity measure, called CCor, that makes use of information of all the genes in the network. We derive a concentration inequality of CCor and compare it with the commonly used Pearson correlation coefficient for inferring network modules. Both theoretical analysis and real data example demonstrate the advantages of CCor over existing measures for inferring gene modules.


    周孟; 余建坤


    针对以往的Vague集相似度量方法的不足之处,提出了一种新的Vague值相似度的定义,并重新给出了新的Vague集相似度量的定义和性质.最后,又提出了在Vague环境下用Vague集间的相似度和相似度量进行模式识别的方法.通过一些应用实例计算,结果表明,该Vague集的相似度量具有一定的优越性,并提高了Vague集相似度量的精确度.%Aiming at the shortcomings of previous Vague set similarity measurement methods, a new Vague value similarity definition is presented. By applying the definition, the new Vague set similarity measurement is redefined. Some properties of Vague set similarity measurement are given. Finally, a method for pattern recognition under Vague environment by similarity and similarity measurement between Vague sets is proposed. By a few application instance calculation results, it is illustrated that this Vague set similarity measurement is superior at Vague set similarity measurement accuracy.

  9. Novel similarity measures for face representation based on local binary pattern

    ZHU Shi-hu; FENG Ju-fu


    The successful face recognition based on local binary pattern (LBP) relies on the effective extraction of LBP features and the inferring of similarity between the extracted features. In this paper, we focus on the latter and propose two novel similarity measures for the local matching methods and the holistic matching methods respectively. One is Earth Mover's Distance with Hamming and Lp ground distance (EMD-HammingLp),which is a cross-bin dissimilarity measure for LBP histograms. The other is IMage Hamming Distance (IMHD),which is a dissimilarity measure for the whole LBP images. Experiments on FERET database show that the proposed two similarity measures outperform the state-of-the-art Chi-square similarity measure for extraction of LBP features.

  10. Study on the Similarities and Differences of Body Measurement Terminology between ASTM and China GB Standard

    方方; 张渭源; 张文斌


    The similarities and differences of ASTM and China GB standard are studied in three aspects:measure instrument,terminology and applicable field.They are similar on the measuring apparatus and GB has less measurements,such as girth,length and width measurements than ASTM and it lack across chest width,back width,total crotch length and shoulder slope which are important measurements in pattern making.ASTM classifies its standards according to the customers' size,gender and age.So we think GB standard could make some modifications from these fields to satisfy the users.

  11. IntelliGO: a new vector-based semantic similarity measure including annotation origin

    Devignes Marie-Dominique


    Full Text Available Abstract Background The Gene Ontology (GO is a well known controlled vocabulary describing the biological process, molecular function and cellular component aspects of gene annotation. It has become a widely used knowledge source in bioinformatics for annotating genes and measuring their semantic similarity. These measures generally involve the GO graph structure, the information content of GO aspects, or a combination of both. However, only a few of the semantic similarity measures described so far can handle GO annotations differently according to their origin (i.e. their evidence codes. Results We present here a new semantic similarity measure called IntelliGO which integrates several complementary properties in a novel vector space model. The coefficients associated with each GO term that annotates a given gene or protein include its information content as well as a customized value for each type of GO evidence code. The generalized cosine similarity measure, used for calculating the dot product between two vectors, has been rigorously adapted to the context of the GO graph. The IntelliGO similarity measure is tested on two benchmark datasets consisting of KEGG pathways and Pfam domains grouped as clans, considering the GO biological process and molecular function terms, respectively, for a total of 683 yeast and human genes and involving more than 67,900 pair-wise comparisons. The ability of the IntelliGO similarity measure to express the biological cohesion of sets of genes compares favourably to four existing similarity measures. For inter-set comparison, it consistently discriminates between distinct sets of genes. Furthermore, the IntelliGO similarity measure allows the influence of weights assigned to evidence codes to be checked. Finally, the results obtained with a complementary reference technique give intermediate but correct correlation values with the sequence similarity, Pfam, and Enzyme classifications when compared to

  12. 3D Facial Similarity Measure Based on Geodesic Network and Curvatures

    Junli Zhao


    Full Text Available Automated 3D facial similarity measure is a challenging and valuable research topic in anthropology and computer graphics. It is widely used in various fields, such as criminal investigation, kinship confirmation, and face recognition. This paper proposes a 3D facial similarity measure method based on a combination of geodesic and curvature features. Firstly, a geodesic network is generated for each face with geodesics and iso-geodesics determined and these network points are adopted as the correspondence across face models. Then, four metrics associated with curvatures, that is, the mean curvature, Gaussian curvature, shape index, and curvedness, are computed for each network point by using a weighted average of its neighborhood points. Finally, correlation coefficients according to these metrics are computed, respectively, as the similarity measures between two 3D face models. Experiments of different persons’ 3D facial models and different 3D facial models of the same person are implemented and compared with a subjective face similarity study. The results show that the geodesic network plays an important role in 3D facial similarity measure. The similarity measure defined by shape index is consistent with human’s subjective evaluation basically, and it can measure the 3D face similarity more objectively than the other indices.

  13. Random walk-based similarity measure method for patterns in complex object

    Liu Shihu


    Full Text Available This paper discusses the similarity of the patterns in complex objects. The complex object is composed both of the attribute information of patterns and the relational information between patterns. Bearing in mind the specificity of complex object, a random walk-based similarity measurement method for patterns is constructed. In this method, the reachability of any two patterns with respect to the relational information is fully studied, and in the case of similarity of patterns with respect to the relational information can be calculated. On this bases, an integrated similarity measurement method is proposed, and algorithms 1 and 2 show the performed calculation procedure. One can find that this method makes full use of the attribute information and relational information. Finally, a synthetic example shows that our proposed similarity measurement method is validated.

  14. A New Method for Measuring Text Similarity in Learning Management Systems Using WordNet

    Alkhatib, Bassel; Alnahhas, Ammar; Albadawi, Firas


    As text sources are getting broader, measuring text similarity is becoming more compelling. Automatic text classification, search engines and auto answering systems are samples of applications that rely on text similarity. Learning management systems (LMS) are becoming more important since electronic media is getting more publicly available. As…

  15. Exploring information from the topology beneath the Gene Ontology terms to improve semantic similarity measures.

    Zhang, Shu-Bo; Lai, Jian-Huang


    Measuring the similarity between pairs of biological entities is important in molecular biology. The introduction of Gene Ontology (GO) provides us with a promising approach to quantifying the semantic similarity between two genes or gene products. This kind of similarity measure is closely associated with the GO terms annotated to biological entities under consideration and the structure of the GO graph. However, previous works in this field mainly focused on the upper part of the graph, and seldom concerned about the lower part. In this study, we aim to explore information from the lower part of the GO graph for better semantic similarity. We proposed a framework to quantify the similarity measure beneath a term pair, which takes into account both the information two ancestral terms share and the probability that they co-occur with their common descendants. The effectiveness of our approach was evaluated against seven typical measurements on public platform CESSM, protein-protein interaction and gene expression datasets. Experimental results consistently show that the similarity derived from the lower part contributes to better semantic similarity measure. The promising features of our approach are the following: (1) it provides a mirror model to characterize the information two ancestral terms share with respect to their common descendant; (2) it quantifies the probability that two terms co-occur with their common descendant in an efficient way; and (3) our framework can effectively capture the similarity measure beneath two terms, which can serve as an add-on to improve traditional semantic similarity measure between two GO terms. The algorithm was implemented in Matlab and is freely available from Copyright © 2016 Elsevier B.V. All rights reserved.


    Pushpa C N


    Full Text Available Semantic Similarity measures plays an important role in information retrieval, natural language processing and various tasks on web such as relation extraction, community mining, document clustering, and automatic meta-data extraction. In this paper, we have proposed a Pattern Retrieval Algorithm [PRA] to compute the semantic similarity measure between the words by combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method using web search engines. We use a Sequential Minimal Optimization (SMO support vector machines (SVM to find the optimal combination of page counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and nonsynonymous word-pairs. The proposed approach aims to improve the Correlation values, Precision, Recall, and F-measures, compared to the existing methods. The proposed algorithm outperforms by 89.8 % of correlation value.

  17. A Model of Generating Visual Place Cells Based on Environment Perception and Similar Measure

    Yang Zhou


    Full Text Available It is an important content to generate visual place cells (VPCs in the field of bioinspired navigation. By analyzing the firing characteristic of biological place cells and the existing methods for generating VPCs, a model of generating visual place cells based on environment perception and similar measure is abstracted in this paper. VPCs’ generation process is divided into three phases, including environment perception, similar measure, and recruiting of a new place cell. According to this process, a specific method for generating VPCs is presented. External reference landmarks are obtained based on local invariant characteristics of image and a similar measure function is designed based on Euclidean distance and Gaussian function. Simulation validates the proposed method is available. The firing characteristic of the generated VPCs is similar to that of biological place cells, and VPCs’ firing fields can be adjusted flexibly by changing the adjustment factor of firing field (AFFF and firing rate’s threshold (FRT.

  18. Construction of Weak and Strong Similarity Measures for Ordered Sets of Documents Using Fuzzy Set Techniques.

    Egghe, L.; Michel, C.


    Ordered sets (OS) of documents are encountered more and more in information distribution systems, such as information retrieval systems. Classical similarity measures for ordinary sets of documents need to be extended to these ordered sets. This is done in this article using fuzzy set techniques. The practical usability of the OS-measures is…

  19. a Fast Method for Measuring the Similarity Between 3d Model and 3d Point Cloud

    Zhang, Zongliang; Li, Jonathan; Li, Xin; Lin, Yangbin; Zhang, Shanxin; Wang, Cheng


    This paper proposes a fast method for measuring the partial Similarity between 3D Model and 3D point Cloud (SimMC). It is crucial to measure SimMC for many point cloud-related applications such as 3D object retrieval and inverse procedural modelling. In our proposed method, the surface area of model and the Distance from Model to point Cloud (DistMC) are exploited as measurements to calculate SimMC. Here, DistMC is defined as the weighted distance of the distances between points sampled from model and point cloud. Similarly, Distance from point Cloud to Model (DistCM) is defined as the average distance of the distances between points in point cloud and model. In order to reduce huge computational burdens brought by calculation of DistCM in some traditional methods, we define SimMC as the ratio of weighted surface area of model to DistMC. Compared to those traditional SimMC measuring methods that are only able to measure global similarity, our method is capable of measuring partial similarity by employing distance-weighted strategy. Moreover, our method is able to be faster than other partial similarity assessment methods. We demonstrate the superiority of our method both on synthetic data and laser scanning data.

  20. Measure of Similarity between Rough Sets%Rough集之间的相似度量

    徐久成; 沈钧毅; 王国胤


    Applications of rough set theory in incomplete information systems are a key of putting rough set into real applications. In this paper, after analyzing some basic concepts of classical rough set theory and extended rough set theory, the measure of similarity is developed between two rough sets in the classical rough set theory based on indiscernibility relation and between two rough sets in the extended rough set theory based on limited tolerance relation. Then,some properties of these two methods for measuring similarity are developed respectively. At last,these two measure methods of rough set theory are compared.


    Zeng Fanzi; Qiu Zhengding; Li Dongsheng; Yue Jianhai


    Pattern discovery from time series is of fundamental importance. Most of the algorithms of pattern discovery in time series capture the values of time series based on some kinds of similarity measures. Affected by the scale and baseline, value-based methods bring about problem when the objective is to capture the shape. Thus, a similarity measure based on shape, Sh measure, is originally proposed, andthe properties of this similarity and corresponding proofs are given. Then a time series shape pattern discovery algorithm based on Sh measure is put forward. The proposed algorithm is terminated in finite iteration with given computational and storage complexity. Finally the experiments on synthetic datasets and sunspot datasets demonstrate that the time series shape pattern algorithm is valid.

  2. Hierarchical 3D mechanical parts matching based-on adjustable geometry and topology similarity measurements

    马嵩华; 田凌


    A hierarchical scheme of feature-based model similarity measurement was proposed, named CSG_D2, in which both geometry similarity and topology similarity were applied. The features of 3D mechanical part were constructed by a series of primitive features with tree structure, as a form of constructive solid geometry (CSG) tree. The D2 shape distributions of these features were extracted for geometry similarity measurement, and the pose vector and non-disappeared proportion of each leaf node were gained for topology similarity measurement. Based on these, the dissimilarity between the query and the candidate was accessed by level-by-level CSG tree comparisons. With the adjustable weights, our scheme satisfies different comparison emphasis on the geometry or topology similarity. The assessment results from CSG_D2 demonstrate more discriminative than those from D2 in the analysis of precision-recall and similarity matrix. Finally, an experimental search engine is applied for mechanical parts reuse by using CSG_D2, which is convenient for the mechanical design process.

  3. Optimizing top precision performance measure of content-based image retrieval by learning similarity function

    Liang, Ru-Ze


    In this paper we study the problem of content-based image retrieval. In this problem, the most popular performance measure is the top precision measure, and the most important component of a retrieval system is the similarity function used to compare a query image against a database image. However, up to now, there is no existing similarity learning method proposed to optimize the top precision measure. To fill this gap, in this paper, we propose a novel similarity learning method to maximize the top precision measure. We model this problem as a minimization problem with an objective function as the combination of the losses of the relevant images ranked behind the top-ranked irrelevant image, and the squared Frobenius norm of the similarity function parameter. This minimization problem is solved as a quadratic programming problem. The experiments over two benchmark data sets show the advantages of the proposed method over other similarity learning methods when the top precision is used as the performance measure.

  4. Constitutive relation measurement of geological mechanics similar material based on fiber Bragg grating

    You, Zewei; Wang, Yuan; Sun, Yangyang; Zhang, Qinghua; Zhang, Zhenglin; Huang, Xiaodi


    The constitutive relation of geological mechanics similar material is the basis of the geological mechanics experiment. It is obtained using stress curves and strain curves from the uniaxial compression test of square specimen. However, the traditional measuring method exhibits a nonignorable error for similar material owing to its boundary effect. An approach based on embedding a bare fiber Bragg grating (FBG) sensor into a specimen is presented for measuring the constitutive relation of the similar material. The error of the traditional approach was examined by simulation, and the results of measurement in different frictions were compared. The simulation result revealed that the error increases with friction when the sensor was pasted on the surface. When the sensor was embedded in the middle of the specimen, the friction was less effective. Two similar FBG sensors were used in the measurement of similar material for verification: one embedded into the specimen and another pasted on the surface. The friction was varied using silicone oil. Experimental results agreed well with the simulation results, indicating that the approach of embedding bare FBG into a specimen can measure the constitutive relation precisely, and it is more accurate than the traditional approach.

  5. The next generation of similarity measures that fully explore the semantics in biomedical ontologies.

    Couto, Francisco M; Pinto, H Sofia


    There is a prominent trend to augment and improve the formality of biomedical ontologies. For example, this is shown by the current effort on adding description logic axioms, such as disjointness. One of the key ontology applications that can take advantage of this effort is the conceptual (functional) similarity measurement. The presence of description logic axioms in biomedical ontologies make the current structural or extensional approaches weaker and further away from providing sound semantics-based similarity measures. Although beneficial in small ontologies, the exploration of description logic axioms by semantics-based similarity measures is computational expensive. This limitation is critical for biomedical ontologies that normally contain thousands of concepts. Thus in the process of gaining their rightful place, biomedical functional similarity measures have to take the journey of finding how this rich and powerful knowledge can be fully explored while keeping feasible computational costs. This manuscript aims at promoting and guiding the development of compelling tools that deliver what the biomedical community will require in a near future: a next-generation of biomedical similarity measures that efficiently and fully explore the semantics present in biomedical ontologies.

  6. A relation based measure of semantic similarity for Gene Ontology annotations

    Gaudin Benoit


    Full Text Available Abstract Background Various measures of semantic similarity of terms in bio-ontologies such as the Gene Ontology (GO have been used to compare gene products. Such measures of similarity have been used to annotate uncharacterized gene products and group gene products into functional groups. There are various ways to measure semantic similarity, either using the topological structure of the ontology, the instances (gene products associated with terms or a mixture of both. We focus on an instance level definition of semantic similarity while using the information contained in the ontology, both in the graphical structure of the ontology and the semantics of relations between terms, to provide constraints on our instance level description. Semantic similarity of terms is extended to annotations by various approaches, either though aggregation operations such as min, max and average or through an extrapolative method. These approaches introduce assumptions about how semantic similarity of terms relates to the semantic similarity of annotations that do not necessarily reflect how terms relate to each other. Results We exploit the semantics of relations in the GO to construct an algorithm called SSA that provides the basis of a framework that naturally extends instance based methods of semantic similarity of terms, such as Resnik's measure, to describing annotations and not just terms. Our measure attempts to correctly interpret how terms combine via their relationships in the ontological hierarchy. SSA uses these relationships to identify the most specific common ancestors between terms. We outline the set of cases in which terms can combine and associate partial order constraints with each case that order the specificity of terms. These cases form the basis for the SSA algorithm. The set of associated constraints also provide a set of principles that any improvement on our method should seek to satisfy. Conclusion We derive a measure of semantic

  7. A measure of similarity between scientific journals and of diversity of a list of publications

    Cordier, Stéphane


    The aim of this note is to propose a definition of the scientific diversity and corollarly, a measure of the "interdisciplinarity" of collaborations. With respect to previous studies, the proposed approach consists of 2 steps : first, the definition of similarity between journals and second, these similarities are used to characterize the homogeneity (or, on the contrary the diversity) of a publication list (that can be for one individual or a team).

  8. Behavioral similarity measurement based on image processing for robots that use imitative learning

    Sterpin B., Dante G.; Martinez S., Fernando; Jacinto G., Edwar


    In the field of the artificial societies, particularly those are based on memetics, imitative behavior is essential for the development of cultural evolution. Applying this concept for robotics, through imitative learning, a robot can acquire behavioral patterns from another robot. Assuming that the learning process must have an instructor and, at least, an apprentice, the fact to obtain a quantitative measurement for their behavioral similarity, would be potentially useful, especially in artificial social systems focused on cultural evolution. In this paper the motor behavior of both kinds of robots, for two simple tasks, is represented by 2D binary images, which are processed in order to measure their behavioral similarity. The results shown here were obtained comparing some similarity measurement methods for binary images.

  9. A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain.

    Harispe, Sébastien; Sánchez, David; Ranwez, Sylvie; Janaqi, Stefan; Montmain, Jacky


    Ontologies are widely adopted in the biomedical domain to characterize various resources (e.g. diseases, drugs, scientific publications) with non-ambiguous meanings. By exploiting the structured knowledge that ontologies provide, a plethora of ad hoc and domain-specific semantic similarity measures have been defined over the last years. Nevertheless, some critical questions remain: which measure should be defined/chosen for a concrete application? Are some of the, a priori different, measures indeed equivalent? In order to bring some light to these questions, we perform an in-depth analysis of existing ontology-based measures to identify the core elements of semantic similarity assessment. As a result, this paper presents a unifying framework that aims to improve the understanding of semantic measures, to highlight their equivalences and to propose bridges between their theoretical bases. By demonstrating that groups of measures are just particular instantiations of parameterized functions, we unify a large number of state-of-the-art semantic similarity measures through common expressions. The application of the proposed framework and its practical usefulness is underlined by an empirical analysis of hundreds of semantic measures in a biomedical context. Copyright © 2013 Elsevier Inc. All rights reserved.

  10. Experimental examination of similarity measures and preprocessing methods used for image registration

    Svedlow, M.; Mcgillem, C. D.; Anuta, P. E.


    The criterion used to measure the similarity between images and thus find the position where the images are registered is examined. The three similarity measures considered are the correlation coefficient, the sum of the absolute differences, and the correlation function. Three basic types of preprocessing are then discussed: taking the magnitude of the gradient of the images, thresholding the images at their medians, and thresholding the magnitude of the gradient of the images at an arbitrary level to be determined experimentally. These multitemporal registration techniques are applied to remote imagery of agricultural areas.

  11. Applying Statistical Models and Parametric Distance Measures for Music Similarity Search

    Lukashevich, Hanna; Dittmar, Christian; Bastuck, Christoph

    Automatic deriving of similarity relations between music pieces is an inherent field of music information retrieval research. Due to the nearly unrestricted amount of musical data, the real-world similarity search algorithms have to be highly efficient and scalable. The possible solution is to represent each music excerpt with a statistical model (ex. Gaussian mixture model) and thus to reduce the computational costs by applying the parametric distance measures between the models. In this paper we discuss the combinations of applying different parametric modelling techniques and distance measures and weigh the benefits of each one against the others.

  12. Single Valued Neutrosophic Similarity Measures for Multiple Attribute Decision-Making

    Jun Ye


    Full Text Available Similarity measures play an important role in data mining, pattern recognition, decision making, machine learning, image process etc. Then, single valued neutrosophic sets (SVNSs can describe and handle the indeterminate and inconsistent information, which fuzzy sets and intuitionistic fuzzy sets cannot describe and deal with. Therefore, the paper proposes new similarity meas-ures between SVNSs based on the minimum and maxi-mum operators. Then a multiple attribute decision-making method based on the weighted similarity measure of SVNSs is established in which attribute values for alternatives are represented by the form of single valued neutrosophic values (SVNVs and the attribute weights and the weights of the three independent elements (i.e., truthmembership degree, indeterminacy-membership degree, and falsity-membership degree in a SVNV are considered in the decision-making method. In the decision making, we utilize the single-valued neutrosophic weighted similarity measure between the ideal alternative and an alternative to rank the alternatives corresponding to the measure values and to select the most desirable one(s. Finally, two practical examples are provided to demonstrate the applications and effectiveness of the single valued neutrosophic multiple attribute decision-making method.

  13. Phase transitions for the multifractal analysis of self-similar measures

    Testud, B.


    We are interested in the multifractal analysis of a class of self-similar measures with overlaps. This class, for which we obtain explicit formulae for the Lq-spectrum, τ(q), as well as the singularity spectrum f(α), is sufficiently large to point out new phenomena in the multifractal structure of self-similar measures. We show that, unlike in the classical quasi-Bernoulli case, the Lq-spectrum, τ(q), of the measures studied can have an arbitrarily large number of non-differentiability points (phase transitions). These singularities occur only for the negative values of q and yield to measures that do not satisfy the usual multifractal formalism. The weak quasi-Bernoulli property is the key point of most of the arguments.


    S. K. Jayanthi


    Full Text Available In the current scenario, web page result personalization is playing a vital role. Nearly 80 % of the users expect the best results in the first page itself without having any persistence to browse longer in URL mode. This research work focuses on two main themes: Semantic web search through online and Domain based search through offline. The first part is to find an effective method which allows grouping similar results together using BookShelf Data Structure and organizing the various clusters. The second one is focused on the academic domain based search through offline. This paper focuses on finding documents which are similar and how Vector space can be used to solve it. So more weightage is given for the principles and working methodology of similarity propagation. Cosine similarity measure is used for finding the relevancy among the documents.

  15. Feature-enhanced spectral similarity measure for the analysis of hyperspectral imagery

    Li, Qingbo; Niu, Chunyang


    In hyperspectral remote sensing, the surface compositional material can be identified by means of spectral matching algorithms. In many cases, the importance of each spectral band to measure spectral similarity is different, whereas the traditional spectral matching algorithms implicitly assume all wavelength-dependent absorption features are equal. This may yield an unsatisfactory performance for spectral matching. To remedy this deficiency, we propose methods called feature-enhanced spectral similarity measures. They are hybrids of the spectral matching algorithms combined with a feature-enhanced space projection, termed feature-enhanced spectral angle measure, feature-enhanced Euclidean distance measure, feature-enhanced spectral correlation measure, and feature-enhanced spectral information divergence. The proposed methods creatively project the original spectra into spectral feature-enhanced space, in which important features for measuring the spectral similarity will be increased to a high degree, whereas features of low importance will be suppressed. In order to demonstrate the effectiveness of the proposed approaches, performances are compared on real hyperspectral image data from Airborne Visible Infrared Imaging Spectrometer. The proposed methods are found to possess significant improvements over the original four spectral matching algorithms.

  16. Superquadric Similarity Measure with Spherical Harmonics in 3D Object Recognition

    XINGWeiwei; LIUWeibin; YUANBaozong


    This paper proposes a novel approach for superquadric similarity measure in 3D object recognition. The 3D objects are represented by a composite volumetric representation of Superquadric (SQ)-based geons, which are the new and powerful volumetric models adequate for 3D recognition. The proposed approach is processed through three stages: first, a novel sampling algorithm is designed for searching Chebyshev nodes on superquadric surface to construct the discrete spherical function representing superquadric 3D shape; secondly, the fast Spherical Harmonic Transform is performed on the discrete spherical function to obtain the rotation invariant descriptor of superquadric; thirdly, the similarity of superquadrics is measured by computing the L2 difference between two obtained descriptors. In addition, an integrated processing framework is presented for 3D object recognition with SQ-based geons from the real 3D data, which implements the approach proposed in this paper for shape similarity measure between SQ-based geons. Evaluation experiments demonstrate that the proposed approach is very efficient and robust for similarity measure of superquadric models. The research lays a foundation for developing SQ-based 3D object recognition systems.

  17. A New Approach to Change Vector Analysis Using Distance and Similarity Measures

    Alan R. Gillespie


    Full Text Available The need to monitor the Earth’s surface over a range of spatial and temporal scales is fundamental in ecosystems planning and management. Change-Vector Analysis (CVA is a bi-temporal method of change detection that considers the magnitude and direction of change vector. However, many multispectral applications do not make use of the direction component. The procedure most used to calculate the direction component using multiband data is the direction cosine, but the number of output direction cosine images is equal to the number of original bands and has a complex interpretation. This paper proposes a new approach to calculate the spectral direction of change, using the Spectral Angle Mapper and Spectral Correlation Mapper spectral-similarity measures. The chief advantage of this approach is that it generates a single image of change information insensitive to illumination variation. In this paper the magnitude component of the spectral similarity was calculated in two ways: as the standard Euclidean distance and as the Mahalanobis distance. In this test the best magnitude measure was the Euclidean distance and the best similarity measure was Spectral Angle Mapper. The results show that the distance and similarity measures are complementary and need to be applied together.

  18. Fusion of PCA-Based and LDA-Based Similarity Measures for Face Verification

    Kittler Josef


    Full Text Available The problem of fusing similarity measure-based classifiers is considered in the context of face verification. The performance of face verification systems using different similarity measures in two well-known appearance-based representation spaces, namely Principle Component Analysis (PCA and Linear Discriminant Analysis (LDA is experimentally studied. The study is performed for both manually and automatically registered face images. The experimental results confirm that our optimised Gradient Direction (GD metric within the LDA feature space outperforms the other adopted metrics. Different methods of selection and fusion of the similarity measure-based classifiers are then examined. The experimental results demonstrate that the combined classifiers outperform any individual verification algorithm. In our studies, the Support Vector Machines (SVMs and Weighted Averaging of similarity measures appear to be the best fusion rules. Another interesting achievement of the work is that although features derived from the LDA approach lead to better results than those of the PCA algorithm for all the adopted scoring functions, fusing the PCA- and LDA-based scores improves the performance of the system.

  19. Semantic similarity measures in the biomedical domain by leveraging a web search engine.

    Hsieh, Sheau-Ling; Chang, Wen-Yung; Chen, Chi-Huang; Weng, Yung-Ching


    Various researches in web related semantic similarity measures have been deployed. However, measuring semantic similarity between two terms remains a challenging task. The traditional ontology-based methodologies have a limitation that both concepts must be resided in the same ontology tree(s). Unfortunately, in practice, the assumption is not always applicable. On the other hand, if the corpus is sufficiently adequate, the corpus-based methodologies can overcome the limitation. Now, the web is a continuous and enormous growth corpus. Therefore, a method of estimating semantic similarity is proposed via exploiting the page counts of two biomedical concepts returned by Google AJAX web search engine. The features are extracted as the co-occurrence patterns of two given terms P and Q, by querying P, Q, as well as P AND Q, and the web search hit counts of the defined lexico-syntactic patterns. These similarity scores of different patterns are evaluated, by adapting support vector machines for classification, to leverage the robustness of semantic similarity measures. Experimental results validating against two datasets: dataset 1 provided by A. Hliaoutakis; dataset 2 provided by T. Pedersen, are presented and discussed. In dataset 1, the proposed approach achieves the best correlation coefficient (0.802) under SNOMED-CT. In dataset 2, the proposed method obtains the best correlation coefficient (SNOMED-CT: 0.705; MeSH: 0.723) with physician scores comparing with measures of other methods. However, the correlation coefficients (SNOMED-CT: 0.496; MeSH: 0.539) with coder scores received opposite outcomes. In conclusion, the semantic similarity findings of the proposed method are close to those of physicians' ratings. Furthermore, the study provides a cornerstone investigation for extracting fully relevant information from digitizing, free-text medical records in the National Taiwan University Hospital database.

  20. Perceptual similarity of visual patterns predicts dynamic neural activation patterns measured with MEG.

    Wardle, Susan G; Kriegeskorte, Nikolaus; Grootswagers, Tijl; Khaligh-Razavi, Seyed-Mahdi; Carlson, Thomas A


    Perceptual similarity is a cognitive judgment that represents the end-stage of a complex cascade of hierarchical processing throughout visual cortex. Previous studies have shown a correspondence between the similarity of coarse-scale fMRI activation patterns and the perceived similarity of visual stimuli, suggesting that visual objects that appear similar also share similar underlying patterns of neural activation. Here we explore the temporal relationship between the human brain's time-varying representation of visual patterns and behavioral judgments of perceptual similarity. The visual stimuli were abstract patterns constructed from identical perceptual units (oriented Gabor patches) so that each pattern had a unique global form or perceptual 'Gestalt'. The visual stimuli were decodable from evoked neural activation patterns measured with magnetoencephalography (MEG), however, stimuli differed in the similarity of their neural representation as estimated by differences in decodability. Early after stimulus onset (from 50ms), a model based on retinotopic organization predicted the representational similarity of the visual stimuli. Following the peak correlation between the retinotopic model and neural data at 80ms, the neural representations quickly evolved so that retinotopy no longer provided a sufficient account of the brain's time-varying representation of the stimuli. Overall the strongest predictor of the brain's representation was a model based on human judgments of perceptual similarity, which reached the limits of the maximum correlation with the neural data defined by the 'noise ceiling'. Our results show that large-scale brain activation patterns contain a neural signature for the perceptual Gestalt of composite visual features, and demonstrate a strong correspondence between perception and complex patterns of brain activity.

  1. MaSiMe: A Customized Similarity Measure and Its Application for Tag Cloud Refactoring

    Urdiales-Nieto, David; Martinez-Gil, Jorge; Aldana-Montes, José F.

    Nowadays the popularity of tag clouds in websites is increased notably, but its generation is criticized because its lack of control causes it to be more likely to produce inconsistent and redundant results. It is well known that if tags are freely chosen (instead of taken from a given set of terms), synonyms (multiple tags for the same meaning), normalization of words and even, heterogeneity of users are likely to arise, lowering the efficiency of content indexing and searching contents. To solve this problem, we have designed the Maximum Similarity Measure (MaSiMe) a dynamic and flexible similarity measure that is able to take into account and optimize several considerations of the user who wishes to obtain a free-of-redundancies tag cloud. Moreover, we include an algorithm to effectively compute the measure and a parametric study to determine the best configuration for this algorithm.

  2. On the Minkowski Measurability of Self-Similar Fractals in R^d

    Deniz, Ali; Ozdemir, Yunus; Ratiu, Andrei V; Ureyen, A Ersin


    M. Lapidus and C. Pomerance (1990-1993) and K.J. Falconer (1995) proved that a self-similar fractal in $\\mathbb{R}$ is Minkowski-measurable iff it is of non-lattice type. D. Gatzouras (1999) proved that a self-similar fractal in $\\mathbb{R}^d$ is Minkowski measurable if it is of non-lattice type (though the actual computation of the content is intractable with his approach) and conjectured that it is not Minkowski measurable if it is of lattice type. Under mild conditions we prove this conjecture and in the non-lattice case we improve his result in the sense that we express the content of the fractal in terms of the residue of the associated $\\zeta$-function at the Minkowski-dimension.

  3. Self-organizing maps for measuring similarity of audiovisual speech percepts

    Bothe, Hans-Heinrich

    The goal of this work is to find a way to measure similarity of audiovisual speech percepts. Phoneme-related self-organizing maps (SOM) with a rectangular basis are trained with data material from a (labeled) video film. For the training, a combination of auditory speech features and corresponding...... sentences in German with a balanced phoneme repertoire. As a result it can be stated that (i) the SOM can be trained to map auditory and visual features in a topology-preserving way and (ii) they show strain due to the influence of other audio-visual units. The SOM can be used to measure similarity amongst...... audio-visual speech percepts and to measure coarticulatory effects....

  4. Natural similarity measures between position frequency matrices with an application to clustering.

    Pape, Utz J; Rahmann, Sven; Vingron, Martin


    Transcription factors (TFs) play a key role in gene regulation by binding to target sequences. In silico prediction of potential binding of a TF to a binding site is a well-studied problem in computational biology. The binding sites for one TF are represented by a position frequency matrix (PFM). The discovery of new PFMs requires the comparison to known PFMs to avoid redundancies. In general, two PFMs are similar if they occur at overlapping positions under a null model. Still, most existing methods compute similarity according to probabilistic distances of the PFMs. Here we propose a natural similarity measure based on the asymptotic covariance between the number of PFM hits incorporating both strands. Furthermore, we introduce a second measure based on the same idea to cluster a set of the Jaspar PFMs. We show that the asymptotic covariance can be efficiently computed by a two dimensional convolution of the score distributions. The asymptotic covariance approach shows strong correlation with simulated data. It outperforms three alternative methods. The Jaspar clustering yields distinct groups of TFs of the same class. Furthermore, a representative PFM is given for each class. In contrast to most other clustering methods, PFMs with low similarity automatically remain singletons. A website to compute the similarity and to perform clustering, the source code and Supplementary Material are available at

  5. Quantifying Similarity and Distance Measures for Vector-Based Datasets: Histograms, Signals, and Probability Distribution Functions


    documents. Citation of manufacturer’s or trade names does not constitute an official endorse- ment or approval of the use thereof. Destroy this report when it...NUMBER (Include area code)   Standard Form 298 (Rev. 8/98)    Prescribed by ANSI Std. Z39.18 February 2017 Technical Note Quantifying Similarity and...datasets. There are a large number of different possible similarity and distance measures that can be applied to different datasets. In this technical

  6. Dependence centrality similarity: Measuring the diversity of profession levels of interests

    Yan, Deng-Cheng; Li, Ming; Wang, Bing-Hong


    To understand the relations between developers and software, we study a collaborative coding platform from the perspective of networks, including follower networks, dependence networks and developer-project bipartite networks. Through the analyzing of degree distribution, PageRank and degree-dependent nearest neighbors' centrality, we find that the degree distributions of all networks have a power-law form except the out-degree distributions of dependence networks. The nearest neighbors' centrality is negatively correlated with degree for developers but fluctuates around the average for projects. In order to measure the diversity of profession levels of interests, a new index called dependence centrality similarity is proposed and the correlation between dependence centrality similarity and degree is investigated. The result shows an obvious negative correlations between dependence centrality similarity and degree.

  7. An ontology-based similarity measure for biomedical data-application to radiology reports.

    Mabotuwana, Thusitha; Lee, Michael C; Cohen-Solal, Eric V


    Determining similarity between two individual concepts or two sets of concepts extracted from a free text document is important for various aspects of biomedicine, for instance, to find prior clinical reports for a patient that are relevant to the current clinical context. Using simple concept matching techniques, such as lexicon based comparisons, is typically not sufficient to determine an accurate measure of similarity. In this study, we tested an enhancement to the standard document vector cosine similarity model in which ontological parent-child (is-a) relationships are exploited. For a given concept, we define a semantic vector consisting of all parent concepts and their corresponding weights as determined by the shortest distance between the concept and parent after accounting for all possible paths. Similarity between the two concepts is then determined by taking the cosine angle between the two corresponding vectors. To test the improvement over the non-semantic document vector cosine similarity model, we measured the similarity between groups of reports arising from similar clinical contexts, including anatomy and imaging procedure. We further applied the similarity metrics within a k-nearest-neighbor (k-NN) algorithm to classify reports based on their anatomical and procedure based groups. 2150 production CT radiology reports (952 abdomen reports and 1128 neuro reports) were used in testing with SNOMED CT, restricted to Body structure, Clinical finding and Procedure branches, as the reference ontology. The semantic algorithm preferentially increased the intra-class similarity over the inter-class similarity, with a 0.07 and 0.08 mean increase in the neuro-neuro and abdomen-abdomen pairs versus a 0.04 mean increase in the neuro-abdomen pairs. Using leave-one-out cross-validation in which each document was iteratively used as a test sample while excluding it from the training data, the k-NN based classification accuracy was shown in all cases to be

  8. The Nonlocal Sparse Reconstruction Algorithm by Similarity Measurement with Shearlet Feature Vector

    Wu Qidi


    Full Text Available Due to the limited accuracy of conventional methods with image restoration, the paper supplied a nonlocal sparsity reconstruction algorithm with similarity measurement. To improve the performance of restoration results, we proposed two schemes to dictionary learning and sparse coding, respectively. In the part of the dictionary learning, we measured the similarity between patches from degraded image by constructing the Shearlet feature vector. Besides, we classified the patches into different classes with similarity and trained the cluster dictionary for each class, by cascading which we could gain the universal dictionary. In the part of sparse coding, we proposed a novel optimal objective function with the coding residual item, which can suppress the residual between the estimate coding and true sparse coding. Additionally, we show the derivation of self-adaptive regularization parameter in optimization under the Bayesian framework, which can make the performance better. It can be indicated from the experimental results that by taking full advantage of similar local geometric structure feature existing in the nonlocal patches and the coding residual suppression, the proposed method shows advantage both on visual perception and PSNR compared to the conventional methods.

  9. Predicting the similarity between expressive performances of music from measurements of tempo and dynamics

    Timmers, Renee


    Measurements of tempo and dynamics from audio files or MIDI data are frequently used to get insight into a performer's contribution to music. The measured variations in tempo and dynamics are often represented in different formats by different authors. Few systematic comparisons have been made between these representations. Moreover, it is unknown what data representation comes closest to subjective perception. The reported study tests the perceptual validity of existing data representations by comparing their ability to explain the subjective similarity between pairs of performances. In two experiments, 40 participants rated the similarity between performances of a Chopin prelude and a Mozart sonata. Models based on different representations of the tempo and dynamics of the performances were fitted to these similarity ratings. The results favor other data representations of performances than generally used, and imply that comparisons between performances are made perceptually in a different way than often assumed. For example, the best fit was obtained with models based on absolute tempo and absolute tempo times loudness, while conventional models based on normalized variations, or on correlations between tempo profiles and loudness profiles, did not explain the similarity ratings well. .

  10. Self-similar and self-affine sets; measure of the intersection of two copies

    Elekes, Márton; Máthé, András


    Let K be a self-similar or self-affine set in R^d, let \\mu be a self-similar or self-affine measure on it, and let G be the group of affine maps, similitudes, isometries or translations of R^d. Under various assumptions (such as separation conditions or we assume that the transformations are small perturbations or that K is a so called Sierpinski sponge) we prove theorems of the following types, which are closely related to each other; Non-stability: There exists a constant c 0 \\iff int_K (K\\cap g(K)) is nonempty (where int_K is interior relative to K). Extension: The measure \\mu has a G-invariant extension to R^d. Moreover, in many situations we characterize those g's for which \\mu(K\\cap g(K) > 0 holds.

  11. An efficient similarity measure technique for medical image registration

    Vilas H Gaidhane; Yogesh V Hote; Vijander Singh


    In this paper, an efficient similarity measure technique is proposed for medical image registration. The proposed approach is based on the Gerschgorin circles theorem. In this approach, image registration is carried out by considering Gerschgorin bounds of a covariance matrix of two compared images with normalized energy. The beauty of this approach is that there is no need to calculate image features like eigenvalues and eigenvectors. This technique is superior to other well-known techniques such as normalized cross-correlation method and eigenvalue-based similarity measures since it avoids the false registration and requires less computation. The proposed approach is sensitive to small defects and robust to change in illuminations and noise. Experimental results on various synthetic medical images have shown the effectiveness of the proposed technique for detecting and locating the disease in the complicated medical images.

  12. A modified statistical pattern recognition approach to measuring the crosslinguistic similarity of Mandarin and English vowels.

    Thomson, Ron I; Nearey, Terrance M; Derwing, Tracey M


    This study describes a statistical approach to measuring crosslinguistic vowel similarity and assesses its efficacy in predicting L2 learner behavior. In the first experiment, using linear discriminant analysis, relevant acoustic variables from vowel productions of L1 Mandarin and L1 English speakers were used to train a statistical pattern recognition model that simultaneously comprised both Mandarin and English vowel categories. The resulting model was then used to determine what categories novel Mandarin and English vowel productions most resembled. The extent to which novel cases were classified as members of a competing language category provided a means for assessing the crosslinguistic similarity of Mandarin and English vowels. In a second experiment, L2 English learners imitated English vowels produced by a native speaker of English. The statistically defined similarity between Mandarin and English vowels quite accurately predicted L2 learner behavior; the English vowel elicitation stimuli deemed most similar to Mandarin vowels were more likely to elicit L2 productions that were recognized as a Mandarin category; English stimuli that were less similar to Mandarin vowels were more likely to elicit L2 productions that were recognized as new or emerging categories.

  13. An Automatic Registration-Fusion Scheme Based on Similarity Measures: An Application to Dental Imaging


    images, the specialist can then perform any quantitative comparisons, concerning the evolution of abnormalities (cysts, tooth decay etc.) or healing...images to evaluate the progression of pathological conditions, such as cysts or tooth decays , or healing processes, as well as the assessment of the...calculation of similarity measures between two dental radiographic images to be registered. Moreover, a fusion process has been developed to combine

  14. Investigation of Time Series Representations and Similarity Measures for Structural Damage Pattern Recognition

    Swartz, R. Andrew


    This paper investigates the time series representation methods and similarity measures for sensor data feature extraction and structural damage pattern recognition. Both model-based time series representation and dimensionality reduction methods are studied to compare the effectiveness of feature extraction for damage pattern recognition. The evaluation of feature extraction methods is performed by examining the separation of feature vectors among different damage patterns and the pattern recognition success rate. In addition, the impact of similarity measures on the pattern recognition success rate and the metrics for damage localization are also investigated. The test data used in this study are from the System Identification to Monitor Civil Engineering Structures (SIMCES) Z24 Bridge damage detection tests, a rigorous instrumentation campaign that recorded the dynamic performance of a concrete box-girder bridge under progressively increasing damage scenarios. A number of progressive damage test case datasets and damage test data with different damage modalities are used. The simulation results show that both time series representation methods and similarity measures have significant impact on the pattern recognition success rate. PMID:24191136

  15. Investigation of time series representations and similarity measures for structural damage pattern recognition.

    Liu, Wenjia; Chen, Bo; Swartz, R Andrew


    This paper investigates the time series representation methods and similarity measures for sensor data feature extraction and structural damage pattern recognition. Both model-based time series representation and dimensionality reduction methods are studied to compare the effectiveness of feature extraction for damage pattern recognition. The evaluation of feature extraction methods is performed by examining the separation of feature vectors among different damage patterns and the pattern recognition success rate. In addition, the impact of similarity measures on the pattern recognition success rate and the metrics for damage localization are also investigated. The test data used in this study are from the System Identification to Monitor Civil Engineering Structures (SIMCES) Z24 Bridge damage detection tests, a rigorous instrumentation campaign that recorded the dynamic performance of a concrete box-girder bridge under progressively increasing damage scenarios. A number of progressive damage test case datasets and damage test data with different damage modalities are used. The simulation results show that both time series representation methods and similarity measures have significant impact on the pattern recognition success rate.

  16. Examination of the wavelet-based approach for measuring self-similarity of epileptic electroencephalogram data



    Self-similarity or scale-invariance is a fascinating characteristic found in various signals including electroencephalogram (EEG) signals. A common measure used for characterizing self-similarity or scale-invariance is the spectral exponent. In this study, a computational method for estimating the spectral exponent based on wavelet transform was examined. A series of Daubechies wavelet bases with various numbers of vanishing moments were applied to analyze the self-similar characteristics of intracranial EEG data corresponding to different pathological states of the brain, i.e., ictal and interictal states, in patients with epilepsy. The computational results show that the spectral exponents of intracranial EEG signals obtained during epileptic seizure activity tend to be higher than those obtained during non-seizure periods. This suggests that the intracranial EEG signals obtained during epileptic seizure activity tend to be more self-similar than those obtained during non-seizure periods. The computational results obtained using the wavelet-based approach were validated by comparison with results obtained using the power spectrum method.

  17. Approach for Text Classification Based on the Similarity Measurement between Normal Cloud Models

    Jin Dai


    Full Text Available The similarity between objects is the core research area of data mining. In order to reduce the interference of the uncertainty of nature language, a similarity measurement between normal cloud models is adopted to text classification research. On this basis, a novel text classifier based on cloud concept jumping up (CCJU-TC is proposed. It can efficiently accomplish conversion between qualitative concept and quantitative data. Through the conversion from text set to text information table based on VSM model, the text qualitative concept, which is extraction from the same category, is jumping up as a whole category concept. According to the cloud similarity between the test text and each category concept, the test text is assigned to the most similar category. By the comparison among different text classifiers in different feature selection set, it fully proves that not only does CCJU-TC have a strong ability to adapt to the different text features, but also the classification performance is also better than the traditional classifiers.

  18. An Efficient Technique to Implement Similarity Measures in Text Document Clustering using Artificial Neural Networks Algorithm

    K. Selvi


    Full Text Available Pattern recognition, envisaging supervised and unsupervised method, optimization, associative memory and control process are some of the diversified troubles that can be resolved by artificial neural networks. Problem identified: Of late, discovering the required information in massive quantity of data is the challenging tasks. The model of similarity evaluation is the central element in accomplishing a perceptive of variables and perception that encourage behavior and mediate concern. This study proposes Artificial Neural Networks algorithms to resolve similarity measures. In order to apply singular value decomposition the frequency of word pair is established in the given document. (1 Tokenization: The splitting up of a stream of text into words, phrases, signs, or other significant parts is called tokenization. (2 Stop words: Preceding or succeeding to processing natural language data, the words that are segregated is called stop words. (3 Porter stemming: The main utilization of this algorithm is as part of a phrase normalization development that is characteristically completed while setting up in rank recovery technique. (4 WordNet: The compilation of lexical data base for the English language is called as WordNet Based on Artificial Neural Networks, the core part of this study work extends n-gram proposed algorithm. All the phonemes, syllables, letters, words or base pair corresponds in accordance to the application. Future work extends the application of this same similarity measures in various other neural network algorithms to accomplish improved results.

  19. TAR:an improved process similarity measure based on unfolding of Petri nets

    WANG Wen-xing; WANG Jian-min


    Determining the similarity degree between process models was very important for their management, reuse, and analy- sis. Current approaches either focused on process model's structural aspect, or had inefficiency or imprecision in behavioral simi- larity. Aiming at these problems, a novel similarity measure which extended an existing method named Transition Adjacent Rela- tion (TAR) with improved precision and efficiency named TAR" was proposed. The ability of measuring similarity was extended by eliminating the duplicate tasks without impacting the behaviors. For precision, TARs was classified into repeatable and unre- peatable ones to identify whether a TAR was involved in a loop. Two new kinds of TARs were added, one related to the invisible tasks after the source place and before sink place, and the other representing implicit dependencies. For efficiency, all TARs based on unfolding instead of its reach ability graph of a labeled Petri net were calculated to avoid state space explosion. Experi- ments on artificial and real-world process models showed the effectiveness and effieienev of the DrODosed method_


    Haoxiang XIA; Shuguang WANG; Taketoshi YOSHIDA


    Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontology-based semantic similarity measure is used in conjunction with the traditional vector-space-model-based measure to provide more accurate assessment of the similarity between documents. On the other, the ant behavior model is modified to pursue better algorithmic performance.Especially, the ant movement rule is adjusted so as to direct a laden ant toward a dense area of the same type of items as the ant's carrying item, and to direct an unladen ant toward an area that contains an item dissimilar with the surrounding items within its Moore neighborhood. Using WordNet as the base ontology for assessing the semantic similarity between documents, the proposed algorithm is tested with a sample set of documents excerpted from the Reuters-21578 corpus and the experiment results partly indicate that the proposed algorithm perform better than the standard ant-based text-clustering algorithm and the k-means algorithm.

  1. Self-organizing maps for measuring similarity of audiovisual speech percepts

    Bothe, Hans-Heinrich

    . Dependent on the training data, these other units may also be contextually immediate neighboring units. The poster demonstrates the idea with text material spoken by one individual subject using a set of simple audio-visual features. The data material for the training process consists of 44 labeled...... visual lip features is used. Phoneme-related receptive fields result on the SOM basis; they are speaker dependent and show individual locations and strain. Overlapping main slopes indicate a high similarity of respective units; distortion or extra peaks originate from the influence of other units...... sentences in German with a balanced phoneme repertoire. As a result it can be stated that (i) the SOM can be trained to map auditory and visual features in a topology-preserving way and (ii) they show strain due to the influence of other audio-visual units. The SOM can be used to measure similarity amongst...

  2. FISim: A new similarity measure between transcription factor binding sites based on the fuzzy integral

    Cano Carlos


    Full Text Available Abstract Background Regulatory motifs describe sets of related transcription factor binding sites (TFBSs and can be represented as position frequency matrices (PFMs. De novo identification of TFBSs is a crucial problem in computational biology which includes the issue of comparing putative motifs with one another and with motifs that are already known. The relative importance of each nucleotide within a given position in the PFMs should be considered in order to compute PFM similarities. Furthermore, biological data are inherently noisy and imprecise. Fuzzy set theory is particularly suitable for modeling imprecise data, whereas fuzzy integrals are highly appropriate for representing the interaction among different information sources. Results We propose FISim, a new similarity measure between PFMs, based on the fuzzy integral of the distance of the nucleotides with respect to the information content of the positions. Unlike existing methods, FISim is designed to consider the higher contribution of better conserved positions to the binding affinity. FISim provides excellent results when dealing with sets of randomly generated motifs, and outperforms the remaining methods when handling real datasets of related motifs. Furthermore, we propose a new cluster methodology based on kernel theory together with FISim to obtain groups of related motifs potentially bound by the same TFs, providing more robust results than existing approaches. Conclusion FISim corrects a design flaw of the most popular methods, whose measures favour similarity of low information content positions. We use our measure to successfully identify motifs that describe binding sites for the same TF and to solve real-life problems. In this study the reliability of fuzzy technology for motif comparison tasks is proven.

  3. A measure of semantic similarity between gene ontology terms based on semantic pathway covering

    LI Rong; CAO Shunliang; LI Yuanyuan; TAN Hao; ZHU Yangyong; ZHONG Yang; LI Yixue


    Semantic similarity between Gene Ontology (GO) terms is critical in resolving semantic heterogeneousness when integrating heterogeneous biological databases. Traditionally, distance based and information content based measures are two major methods.In this paper, a new method based on semantic pathway covering is proposed and an algorithm, COMBINE algorithm, is presented,which considers information contents of two given nodes and those of all nodes included in the two nodes' pathways. Experiments show that COMBINE algorithm obtains the highest correlation index compared with those distance based and information content based algorithms.

  4. Similar Processes Despite Divergent Behavior in Two Commonly Used Measures of Risky Decision Making



    Performance on complex decision-making tasks may depend on a multitude of processes. Two such tasks, the Iowa Gambling Task (IGT) and Balloon Analog Risk Task (BART), are of particular interest because they are associated with real world risky behavior, including illegal drug use. We used cognitive models to disentangle underlying processes in both tasks. Whereas behavioral measures from the IGT and BART were uncorrelated, cognitive models revealed two reliable cross-task associations. Results suggest that the tasks similarly measure loss aversion and decision-consistency processes, but not necessarily the same learning process. Additionally, substance-using individuals (and especially stimulant users) performed worse on the IGT than healthy controls did, and this pattern could be explained by reduced decision consistency. PMID:21836771

  5. A robust co-localisation measurement utilising z-stack image intensity similarities for biological studies.

    Yinhai Wang

    Full Text Available BACKGROUND: Co-localisation is a widely used measurement in immunohistochemical analysis to determine if fluorescently labelled biological entities, such as cells, proteins or molecules share a same location. However the measurement of co-localisation is challenging due to the complex nature of such fluorescent images, especially when multiple focal planes are captured. The current state-of-art co-localisation measurements of 3-dimensional (3D image stacks are biased by noise and cross-overs from non-consecutive planes. METHOD: In this study, we have developed Co-localisation Intensity Coefficients (CICs and Co-localisation Binary Coefficients (CBCs, which uses rich z-stack data from neighbouring focal planes to identify similarities between image intensities of two and potentially more fluorescently-labelled biological entities. This was developed using z-stack images from murine organotypic slice cultures from central nervous system tissue, and two sets of pseudo-data. A large amount of non-specific cross-over situations are excluded using this method. This proposed method is also proven to be robust in recognising co-localisations even when images are polluted with a range of noises. RESULTS: The proposed CBCs and CICs produce robust co-localisation measurements which are easy to interpret, resilient to noise and capable of removing a large amount of false positivity, such as non-specific cross-overs. Performance of this method of measurement is significantly more accurate than existing measurements, as determined statistically using pseudo datasets of known values. This method provides an important and reliable tool for fluorescent 3D neurobiological studies, and will benefit other biological studies which measure fluorescence co-localisation in 3D.

  6. A cross-lingual similarity measure for detecting biomedical term translations.

    Danushka Bollegala

    Full Text Available Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source from another language (target. Specifically, a biomedical term in a language is represented using two types of features: (a intrinsic features that consist of character n-grams extracted from the term under consideration, and (b extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP--a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR. The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD and non-negative matrix factorization (NMF in a nearest neighbor prediction task. Moreover, our experimental results covering several language

  7. A cross-lingual similarity measure for detecting biomedical term translations.

    Bollegala, Danushka; Kontonatsios, Georgios; Ananiadou, Sophia


    Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)--a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as

  8. Measuring user similarity using electric circuit analysis: application to collaborative filtering.

    Yang, Joonhyuk; Kim, Jinwook; Kim, Wonjoon; Kim, Young Hwan


    We propose a new technique of measuring user similarity in collaborative filtering using electric circuit analysis. Electric circuit analysis is used to measure the potential differences between nodes on an electric circuit. In this paper, by applying this method to transaction networks comprising users and items, i.e., user-item matrix, and by using the full information about the relationship structure of users in the perspective of item adoption, we overcome the limitations of one-to-one similarity calculation approach, such as the Pearson correlation, Tanimoto coefficient, and Hamming distance, in collaborative filtering. We found that electric circuit analysis can be successfully incorporated into recommender systems and has the potential to significantly enhance predictability, especially when combined with user-based collaborative filtering. We also propose four types of hybrid algorithms that combine the Pearson correlation method and electric circuit analysis. One of the algorithms exceeds the performance of the traditional collaborative filtering by 37.5% at most. This work opens new opportunities for interdisciplinary research between physics and computer science and the development of new recommendation systems.

  9. The Similarity of Global Value Chains: A Network-Based Measure

    Zhu, Zhen; Puliga, Michelangelo; Chessa, Alessandro; Riccaboni, Massimo


    International trade has been increasingly organized in the form of global value chains (GVCs) where different stages of production are located in different countries. This recent phenomenon has substantial consequences for both trade policy design at the national or regional level and business decision making at the firm level. In this paper, we provide a new method for comparing GVCs across countries and over time. First, we use the World Input-Output Database (WIOD) to construct both the upstream and downstream global value networks, where the nodes are individual sectors in different countries and the links are the value-added contribution relationships. Second, we introduce a network-based measure of node similarity to compare the GVCs between any pair of countries for each sector and each year available in the WIOD. Our network-based similarity is a better measure for node comparison than the existing ones because it takes into account all the direct and indirect relationships between country-sector pair...

  10. Sea Ice Detection Based on an Improved Similarity Measurement Method Using Hyperspectral Data

    Han, Yanling; Li, Jue; Zhang, Yun; Hong, Zhonghua; Wang, Jing


    Hyperspectral remote sensing technology can acquire nearly continuous spectrum information and rich sea ice image information, thus providing an important means of sea ice detection. However, the correlation and redundancy among hyperspectral bands reduce the accuracy of traditional sea ice detection methods. Based on the spectral characteristics of sea ice, this study presents an improved similarity measurement method based on linear prediction (ISMLP) to detect sea ice. First, the first original band with a large amount of information is determined based on mutual information theory. Subsequently, a second original band with the least similarity is chosen by the spectral correlation measuring method. Finally, subsequent bands are selected through the linear prediction method, and a support vector machine classifier model is applied to classify sea ice. In experiments performed on images of Baffin Bay and Bohai Bay, comparative analyses were conducted to compare the proposed method and traditional sea ice detection methods. Our proposed ISMLP method achieved the highest classification accuracies (91.18% and 94.22%) in both experiments. From these results the ISMLP method exhibits better performance overall than other methods and can be effectively applied to hyperspectral sea ice detection. PMID:28505135

  11. Sparse multivariate measures of similarity between intra-modal neuroimaging datasets

    Maria J. Rosa


    Full Text Available An increasing number of neuroimaging studies are now based on either combining more than one data modality (inter-modal or combining more than one measurement from the same modality (intra-modal. To date, most intra-modal studies using multivariate statistics have focused on differences between datasets, for instance relying on classifiers to differentiate between effects in the data. However, to fully characterize these effects, multivariate methods able to measure similarities between datasets are needed. One classical technique for estimating the relationship between two datasets is canonical correlation analysis (CCA. However, in the context of high-dimensional data the application of CCA is extremely challenging. A recent extension of CCA, sparse CCA (SCCA, overcomes this limitation, by regularizing the model parameters while yielding a sparse solution. In this work, we modify SCCA with the aim of facilitating its application to high-dimensional neuroimaging data and finding meaningful multivariate image-to-image correspondences in intra-modal studies. In particular, we show how the optimal subset of variables can be estimated independently and we look at the information encoded in more than one set of SCCA transformations. We illustrate our framework using Arterial Spin Labelling data to investigate multivariate similarities between the effects of two antipsychotic drugs on cerebral blood flow.

  12. ClusTrack: feature extraction and similarity measures for clustering of genome-wide data sets.

    Halfdan Rydbeck

    Full Text Available Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-level genomic and epigenomic data, e.g. ChIP-based data. We here introduce a general methodology for clustering data sets of coordinates relative to a genome assembly, i.e. genomic tracks. By defining appropriate feature extraction approaches and similarity measures, we allow biologically meaningful clustering to be performed for genomic tracks using standard clustering algorithms. An implementation of the methodology is provided through a tool, ClusTrack, which allows fine-tuned clustering analyses to be specified through a web-based interface. We apply our methods to the clustering of occupancy of the H3K4me1 histone modification in samples from a range of different cell types. The majority of samples form meaningful subclusters, confirming that the definitions of features and similarity capture biological, rather than technical, variation between the genomic tracks. Input data and results are available, and can be reproduced, through a Galaxy Pages document at The clustering functionality is available as a Galaxy tool, under the menu option "Specialized analyzis of tracks", and the submenu option "Cluster tracks based on genome level similarity", at the Genomic HyperBrowser server:

  13. Information content-based Gene Ontology functional similarity measures: which one to use for a given biological data type?

    Gaston K Mazandu

    Full Text Available The current increase in Gene Ontology (GO annotations of proteins in the existing genome databases and their use in different analyses have fostered the improvement of several biomedical and biological applications. To integrate this functional data into different analyses, several protein functional similarity measures based on GO term information content (IC have been proposed and evaluated, especially in the context of annotation-based measures. In the case of topology-based measures, each approach was set with a specific functional similarity measure depending on its conception and applications for which it was designed. However, it is not clear whether a specific functional similarity measure associated with a given approach is the most appropriate, given a biological data set or an application, i.e., achieving the best performance compared to other functional similarity measures for the biological application under consideration. We show that, in general, a specific functional similarity measure often used with a given term IC or term semantic similarity approach is not always the best for different biological data and applications. We have conducted a performance evaluation of a number of different functional similarity measures using different types of biological data in order to infer the best functional similarity measure for each different term IC and semantic similarity approach. The comparisons of different protein functional similarity measures should help researchers choose the most appropriate measure for the biological application under consideration.

  14. SSM-DBSCANand SSM-OPTICS : Incorporating a new similarity measure for Density based Clustering of Web usage data.

    Ms K.Santhisree


    Full Text Available Clustering web sessions is to group web sessions based on similarity and consists of minimizing the intra-group similarity and maximizing the inter-group similarity. Here in this paper we developed a new similarity measure named SSM(Sequence Similarity Measure and enhanced an existing DBSCAN and OPTICS clustering techniques namely SSM-DBSCAN, and SSM-OPTICS for clustering web sessions for web personalization. Then we adopted various similarity measures like Euclidean distance, Jaccard, Cosine and Fuzzy similarity measures to measure the similarity of web sessions using sequence alignment to determine learning behaviors of web usage data. This new measure hassignificant results when comparing similarities between web sessions with other previous measures. We performed a variety of experiments in the context of density based clustering, using existing DBSCANand OPTICS and developed SSM-DBSCAN and SSM-OPTICS based on sequence alignment to measure similarities between web sessions where sessions are chronologically ordered sequences of page visits. Finally the time and the memory required to perform clustering using SSM is less when compared to other similarity measures.

  15. Early Seizure Detection Using Neuronal Potential Similarity: A Generalized Low-Complexity and Robust Measure.

    Bandarabadi, Mojtaba; Rasekhi, Jalil; Teixeira, Cesar A; Netoff, Theoden I; Parhi, Keshab K; Dourado, Antonio


    A novel approach using neuronal potential similarity (NPS) of two intracranial electroencephalogram (iEEG) electrodes placed over the foci is proposed for automated early seizure detection in patients with refractory partial epilepsy. The NPS measure is obtained from the spectral analysis of space-differential iEEG signals. Ratio between the NPS values obtained from two specific frequency bands is then investigated as a robust generalized measure, and reveals invaluable information about seizure initiation trends. A threshold-based classifier is subsequently applied on the proposed measure to generate alarms. The performance of the method was evaluated using cross-validation on a large clinical dataset, involving 183 seizure onsets in 1785 h of long-term continuous iEEG recordings of 11 patients. On average, the results show a high sensitivity of 86.9% (159 out of 183), a very low false detection rate of 1.4 per day, and a mean detection latency of 13.1 s from electrographic seizure onsets, while in average preceding clinical onsets by 6.3 s. These high performance results, specifically the short detection latency, coupled with the very low computational cost of the proposed method make it adequate for using in implantable closed-loop seizure suppression systems.

  16. Context Aware Similarity Measure Selection: Mining of Wearable Implantable Body Sensor Network Data with Logical Reasoning

    Y Indu


    Full Text Available Wireless sensor networks monitor the environment with various types of sensors. Environment in its broader terms can be the geographic environment or it can be our human body. One such type of network is Wearable and Implantable Body Sensor Network (WIBSN. This paper focuses on processing of data generated from WIBSN. WIBSN includes a network of sensors that generate different type of values. This paper treats each sensor as a dimension in the whole dataset. In this case, data may have both continuous and discrete values. Hence; proposed work can be applicable for both of those data values. By identifying nature of the sensor data model, underlying similarity or dissimilarity measure is selected. A novel Crisp clustering technique is used to simulate the proposed work.

  17. Cross similarity measurement for speaker adaptive test normalization in text-independent speaker verification

    ZHAO Jian; DONG Yuan; ZHAO Xian-yu; YANG Hao; WANG Hai-la


    Speaker adaptive test normalization (Atnorm) is the most effective approach of the widely used score normalization in text-independent speaker verification, which selects speaker adaptive impostor cohorts with an extra development corpus in order to enhance the recognition performance. In this paper, an improved implementation of Atnorm that can offer overall significant advantages over the original Atnorm is presented. This method adopts a novel cross similarity measurement in speaker adaptive cohort model selection without an extra development corpus. It can achieve a comparable performance with the original Atnorm and reduce the computation complexity moderately. With the full use of the saved extra development corpus, the overall system performance can be improved significantly. The results are presented on NIST 2006 Speaker Recognition Evaluation data corpora where it is shown that this method provides significant improvements in system performance, with relatively 14.4% gain on equal error rate (EER) and 14.6% gain on decision cost function (DCF) obtained as a whole.

  18. Zeroing In on Mindfulness Facets: Similarities, Validity, and Dimensionality across Three Independent Measures.

    Siegling, Alex B; Petrides, K V


    The field of mindfulness has seen a proliferation of psychometric measures, characterised by differences in operationalisation and conceptualisation. To illuminate the scope of, and offer insights into, the diversity apparent in the burgeoning literature, two distinct samples were used to examine the similarities, validity, and dimensionality of mindfulness facets and subscales across three independent measures: the Five Facet Mindfulness Questionnaire (FFMQ), Philadelphia Mindfulness Scale (PHLMS), and Toronto Mindfulness Scale (TMS). Results revealed problematic associations of FFMQ Observe with the other FFMQ facets and supported a four-factor structure (omitting this facet), while disputing the originally envisaged five-factor model; thus, solidifying a pattern in the literature. Results also confirmed the bidimensional nature of the PHLMS and TMS subscales, respectively. A joint Confirmatory Factor Analysis showed that PHLMS Acceptance could be assimilated within the FFMQ's four-factor model (as a distinct factor). The study offers a way of understanding interrelationships between the available mindfulness scales, so as to help practitioners and researchers make a more informed choice when conceptualising and operationalising mindfulness.

  19. Fuzzy similarity measures for detection and classification of defects in CFRP.

    Pellicanó, Diego; Palamara, Isabella; Cacciola, Matteo; Calcagno, Salvatore; Versaci, Mario; Morabito, Francesco Carlo


    The systematic use of nondestructive testing assumes a remarkable importance where on-line manufacturing quality control is associated with the maintenance of complex equipment. For this reason, nondestructive testing and evaluation (NDT/NDE), together with accuracy and precision of measurements of the specimen, results as a strategic activity in many fields of industrial and civil interest. It is well known that nondestructive research methodologies are able to provide information on the state of a manufacturing process without compromising its integrity and functionality. Moreover, exploitation of algorithms with a low computational complexity for detecting the integrity of a specimen plays a crucial role in real-time work. In such a context, the production of carbon fiber resin epoxy (CFRP) is a complex process that is not free from defects and faults that could compromise the integrity of the manufactured specimen. Ultrasonic tests provide an effective contribution in identifying the presence of a defect. In this work, a fuzzy similarity approach is proposed with the goal of localizing and classifying defects in CFRP in terms of a sort of distance among signals (measure of ultrasonic echoes). A field-programmable gate array (FPGA)-based board will be also presented which implements the described algorithms on a hardware device. The good performance of the detection and classification achieved assures the comparability of the results with the results obtained using heuristic techniques with a higher computational load.

  20. Mapping dominant runoff processes: an evaluation of different approaches using similarity measures and synthetic runoff simulations

    Antonetti, Manuel; Buss, Rahel; Scherrer, Simon; Margreth, Michael; Zappa, Massimiliano


    The identification of landscapes with similar hydrological behaviour is useful for runoff and flood predictions in small ungauged catchments. An established method for landscape classification is based on the concept of dominant runoff process (DRP). The various DRP-mapping approaches differ with respect to the time and data required for mapping. Manual approaches based on expert knowledge are reliable but time-consuming, whereas automatic GIS-based approaches are easier to implement but rely on simplifications which restrict their application range. To what extent these simplifications are applicable in other catchments is unclear. More information is also needed on how the different complexities of automatic DRP-mapping approaches affect hydrological simulations. In this paper, three automatic approaches were used to map two catchments on the Swiss Plateau. The resulting maps were compared to reference maps obtained with manual mapping. Measures of agreement and association, a class comparison, and a deviation map were derived. The automatically derived DRP maps were used in synthetic runoff simulations with an adapted version of the PREVAH hydrological model, and simulation results compared with those from simulations using the reference maps. The DRP maps derived with the automatic approach with highest complexity and data requirement were the most similar to the reference maps, while those derived with simplified approaches without original soil information differed significantly in terms of both extent and distribution of the DRPs. The runoff simulations derived from the simpler DRP maps were more uncertain due to inaccuracies in the input data and their coarse resolution, but problems were also linked with the use of topography as a proxy for the storage capacity of soils. The perception of the intensity of the DRP classes also seems to vary among the different authors, and a standardised definition of DRPs is still lacking. Furthermore, we argue not to use

  1. Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure.

    Zhang, Wen; Xiao, Fan; Li, Bin; Zhang, Siguang


    Recently, LSI (Latent Semantic Indexing) based on SVD (Singular Value Decomposition) is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been validated as with good representative quality. In this paper, SVD on clusters is proposed to improve the discriminative power of LSI. The contribution of this paper is three manifolds. Firstly, we make a survey of existing linear algebra methods for LSI, including both SVD based methods and non-SVD based methods. Secondly, we propose SVD on clusters for LSI and theoretically explain that dimension expansion of document vectors and dimension projection using SVD are the two manipulations involved in SVD on clusters. Moreover, we develop updating processes to fold in new documents and terms in a decomposed matrix by SVD on clusters. Thirdly, two corpora, a Chinese corpus and an English corpus, are used to evaluate the performances of the proposed methods. Experiments demonstrate that, to some extent, SVD on clusters can improve the precision of interdocument similarity measure in comparison with other SVD based LSI methods.

  2. Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure

    Wen Zhang


    Full Text Available Recently, LSI (Latent Semantic Indexing based on SVD (Singular Value Decomposition is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been validated as with good representative quality. In this paper, SVD on clusters is proposed to improve the discriminative power of LSI. The contribution of this paper is three manifolds. Firstly, we make a survey of existing linear algebra methods for LSI, including both SVD based methods and non-SVD based methods. Secondly, we propose SVD on clusters for LSI and theoretically explain that dimension expansion of document vectors and dimension projection using SVD are the two manipulations involved in SVD on clusters. Moreover, we develop updating processes to fold in new documents and terms in a decomposed matrix by SVD on clusters. Thirdly, two corpora, a Chinese corpus and an English corpus, are used to evaluate the performances of the proposed methods. Experiments demonstrate that, to some extent, SVD on clusters can improve the precision of interdocument similarity measure in comparison with other SVD based LSI methods.

  3. Gene selection and classification for cancer microarray data based on machine learning and similarity measures

    Liu Qingzhong


    Full Text Available Abstract Background Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money. Results To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others. Conclusions On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF.

  4. An Information-Theoretic Measure for Face Recognition: Comparison with Structural Similarity

    Asmhan Flieh Hassan


    Full Text Available Automatic recognition of people faces is a challenging problem that has received significant attention from signal processing researchers in recent years. This is due to its several applications in different fields, including security and forensic analysis. Despite this attention, face recognition is still one among the most challenging problems. Up to this moment, there is no technique that provides a reliable solution to all situations. In this paper a novel technique for face recognition is presented. This technique, which is called ISSIM, is derived from our recently published information - theoretic similarity measure HSSIM, which was based on joint histogram. Face recognition with ISSIM is still based on joint histogram of a test image and a database images. Performance evaluation was performed on MATLAB using part of the well-known AT&T image database that consists of 49 face images, from which seven subjects are chosen, and for each subject seven views (poses are chosen with different facial expressions. The goal of this paper is to present a simplified approach for face recognition that may work in real-time environments. Performance of our information - theoretic face recognition method (ISSIM has been demonstrated experimentally and is shown to outperform the well-known, statistical-based method (SSIM.

  5. A new method of measuring similarity between two neutrosophic soft sets and its application in pattern recognition problems

    Anjan Mukherjee


    Full Text Available Smarandache in 1995 introduced the concept of neutrosophic set and in 2013 Maji introduced the notion of neutrosophic soft set, which is a hybridization of neutrosophic set and soft set. After its introduction neutrosophic soft sets become most efficient tools to deals with problems that contain uncertainty such as problem in social, economic system, medical diagnosis, pattern recognition, game theory, coding theory and so on. In this work a new method of measuring similarity measure and weighted similarity measure between two neutrosophic soft sets (NSSs are proposed. A comparative study with existing similarity measures for neutrosophic soft sets also studied. A decision making method is established for neutrosophic soft set setting using similarity measures. Lastly a numerical example is given to demonstrate the possible application of similarity measures in pattern recognition problems.

  6. A New Similarity Measure of Interval-Valued Intuitionistic Fuzzy Sets Considering Its Hesitancy Degree and Applications in Expert Systems

    Chong Wu


    Full Text Available As an important content in fuzzy mathematics, similarity measure is used to measure the similarity degree between two fuzzy sets. Considering the existing similarity measures, most of them do not consider the hesitancy degree and some methods considering the hesitancy degree are based on the intuitionistic fuzzy sets, intuitionistic fuzzy values. It may cause some counterintuitive results in some cases. In order to make up for the drawback, we present a new approach to construct the similarity measure between two interval-valued intuitionistic fuzzy sets using the entropy measure and considering the hesitancy degree. In particular, the proposed measure was demonstrated to yield a similarity measure. Besides, some examples are given to prove the practicality and effectiveness of the new measure. We also apply the similarity measure to expert system to solve the problems on pattern recognition and the multicriteria group decision making. In these examples, we also compare it with existing methods such as other similarity measures and the ideal point method.


    Shaoyuan Xu; Weiyi Su; Zuoling Zhou


    In this paper,we provide a new effective method for computing the exact value of Hausdorff measures of a class of self-similar sets satisfying the open set condition(OSC).As applications,we discuss a self-similar Cantor set satisfying OSC and give a simple method for computing its exact Hausdorff measure.

  8. How to compare movement? A review of physical movement similarity measures in geographic information science and beyond

    Ranacher, Peter; Tzavella, Katerina


    In geographic information science, a plethora of different approaches and methods is used to assess the similarity of movement. Some of these approaches term two moving objects similar if they share akin paths. Others require objects to move at similar speed and yet others consider movement similar if it occurs at the same time. We believe that a structured and comprehensive classification of movement comparison measures is missing. We argue that such a classification not only depicts the sta...

  9. Improved structural similarity metric for the visible quality measurement of images

    Lee, Daeho; Lim, Sungsoo


    The visible quality assessment of images is important to evaluate the performance of image processing methods such as image correction, compressing, and enhancement. The structural similarity is widely used to determine the visible quality; however, existing structural similarity metrics cannot correctly assess the perceived human visibility of images that have been slightly geometrically transformed or images that have undergone significant regional distortion. We propose an improved structural similarity metric that is more close to human visible evaluation. Compared with the existing metrics, the proposed method can more correctly evaluate the similarity between an original image and various distorted images.

  10. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language

    Resnik, P


    This article presents a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach. The article presents algorithms that take advantage of taxonomic similarity in resolving syntactic and semantic ambiguity, along with experimental results demonstrating their effectiveness.

  11. Combining Hierarchical and Associative Gene Ontology Relations with Textual Evidence in Estimating Gene and Gene Product Similarity

    Sanfilippo, Antonio P.; Posse, Christian; Gopalan, Banu; Riensche, Roderick M.; Beagley, Nathaniel; Baddeley, Bob L.; Tratz, Stephen C.; Gregory, Michelle L.


    Gene and gene product similarity is a fundamental diagnostic measure in analyzing biological data and constructing predictive models for functional genomics. With the rising influence of the Gene Ontology, two complementary approaches have emerged where the similarity between two genes or gene products is obtained by comparing Gene Ontology (GO) annotations associated with the genes or gene products. One approach captures GO-based similarity in terms of hierarchical relations within each gene subontology. The other approach identifies GO-based similarity in terms of associative relations across the three gene subontologies. We propose a novel methodology where the two approaches can be merged with ensuing benefits in coverage and accuracy, and demonstrate that further improvements can be obtained by integrating textual evidence extracted from relevant biomedical literature.

  12. Similarity matrix analysis and divergence measures for statistical detection of unknown deterministic signals hidden in additive noise

    Le Bot, O., E-mail: [Univ. Grenoble Alpes, GIPSA-Lab, F-38000 Grenoble (France); CNRS, GIPSA-Lab, F-38000 Grenoble (France); Mars, J.I. [Univ. Grenoble Alpes, GIPSA-Lab, F-38000 Grenoble (France); CNRS, GIPSA-Lab, F-38000 Grenoble (France); Gervaise, C. [Univ. Grenoble Alpes, GIPSA-Lab, F-38000 Grenoble (France); CNRS, GIPSA-Lab, F-38000 Grenoble (France); Chaire CHORUS, Foundation of Grenoble Institute of Technology, 46 Avenue Félix Viallet, 38031 Grenoble Cedex 1 (France)


    This Letter proposes an algorithm to detect an unknown deterministic signal hidden in additive white Gaussian noise. The detector is based on recurrence analysis. It compares the distribution of the similarity matrix coefficients of the measured signal with an analytic expression of the distribution expected in the noise-only case. This comparison is achieved using divergence measures. Performance analysis based on the receiver operating characteristics shows that the proposed detector outperforms the energy detector, giving a probability of detection 10% to 50% higher, and has a similar performance to that of a sub-optimal filter detector. - Highlights: • We model the distribution of the similarity matrix coefficients of a Gaussian noise. • We use divergence measures for goodness-of-fit test between a model and measured data. • We distinguish deterministic signal and Gaussian noise with similarity matrix analysis. • Similarity matrix analysis outperforms energy detector.

  13. Application of ultrasonic in strength measurement of similar materials of limestone

    LIU Tie-xiong; PENG Zhen-bin; HAN Jin-tian


    Similar materials such as cement, gypsum and sand are options for simulating limestone characteristic.A series of reasonable proportions are chosen to do similar experiments of Karst roof based on the proportions testing of small samples indoors. Applying on ultrasonic, the velocities of transverse wave and vertical wave of similar samples have been inspected with the sound wave instrument. Dynamic modulus of elasticity and Poisson's ratio of the samples have been educed. According to the testing data, the relationship between the transverse wave and vertical wave velocity, compressive strength and anti-bend strength are analyzed. It has been proved that the vertical wave velocity is better for reflecting compressive strength and anti-bend strength of similar materials than the transverse wave velocity. The vertical wave velocity increases with the strengthand dynamic modulus of elasticity.

  14. Knowledge-based method for determining the meaning of ambiguous biomedical terms using information content measures of similarity.

    McInnes, Bridget T; Pedersen, Ted; Liu, Ying; Melton, Genevieve B; Pakhomov, Serguei V


    In this paper, we introduce a novel knowledge-based word sense disambiguation method that determines the sense of an ambiguous word in biomedical text using semantic similarity or relatedness measures. These measures quantify the degree of similarity between concepts in the Unified Medical Language System (UMLS). The objective of this work was to develop a method that can disambiguate terms in biomedical text by exploiting similarity information extracted from the UMLS and to evaluate the efficacy of information content-based semantic similarity measures, which augment path-based information with probabilities derived from biomedical corpora. We show that information content-based measures obtain a higher disambiguation accuracy than path-based measures because they weight the path based on where it exists in the taxonomy coupled with the probability of the concepts occurring in a corpus of text.

  15. Similarity measure of spectral vectors based on set theory and its application in hyperspectral RS image retrieval

    Peijun Du (杜培军); Tao Fang (方涛); Hong Tang (唐宏); Pengfei Shi (施鹏飞)


    In this paper, two new similarity measure methods based on set theory were proposed. Firstly, similarity measure of two sets based on set theory and set operation was discussed. This principle was used to spectral vectors, and two approaches were proposed. The first method was to create a spectral polygon corresponding to spectral curve, and similarity of two spectral vectors can be replaced by that of two polygons. Area of spectral polygon was used as quantification function and some effective indexes for similarity and dissimilarity were computed. The second method was to transform the original spectral vector to encoding vector according to absorption or reflectance feature bands, and similarity measure was conducted to encoding vectors. It proved that the spectral polygon-based approach was effective and can be used to hyperspectral RS image retrieval.

  16. Evaluation of technical measures for workflow similarity based on a pilot study

    Wombacher, Andreas


    Service discovery of state dependent services has to take workflow aspects into account. To increase the usability of a service discovery, the result list of services should be ordered with regard to the relevance of the services. Means of ordering a list of workflows due to their similarity with

  17. On Measuring Process Model Similarity Based on High-Level Change Operations

    Li, C.; Reichert, M.U.; Wombacher, A.


    For various applications there is the need to compare the similarity between two process models. For example, given the as-is and to-be models of a particular business process, we would like to know how much they differ from each other and how we can efficiently transform the as-is to the to-be mode

  18. On Measuring Process Model Similarity based on High-level Change Operations

    Li, C.; Reichert, M.U.; Wombacher, A.


    For various applications there is the need to compare the similarity between two process models. For example, given the as-is and to-be models of a particular business process, we would like to know how much they differ from each other and how we can efficiently transform the as-is to the to-be mode

  19. Zeroing In on Mindfulness Facets: Similarities, Validity, and Dimensionality across Three Independent Measures

    Siegling, Alex B; Petrides, K V


    ..., or trait, mindfulness (average or baseline states of mindfulness), rather than state mindfulness, or the particular mindful state at the time of measurement [2,6,7]. As described in the next section, eight measures have been salient in the literature [6], although newer ones are emerging [8,9]. Despite overall promising results in terms of cri...


    Zhiwei Zhu; Zuoling Zhou


    Let S (C) R2 be the attractor of the iterated function system{f1,f2,f3}iterating on the unit equilateral triangle So,where fi(x)=λix+bi,I=1,2,3,x=(x1,x2),b1=(0,0),62=(1-λ2,0),63=(1-λ3/2,√3/2(1-λ3)).This paper determines the exact Hausdorff measure,centred covering measure and packing measure of S under some conditions relating to the ontraction parameter.

  1. An ontology-based measure to compute semantic similarity in biomedicine

    Batet, Montserrat; Sánchez, David; Valls, Aida

    ... (ontologies, thesauri, domain corpora, etc.) have been proposed. Some of these measures have been adapted to the biomedical field by incorporating domain information extracted from clinical data or from medical ontologies...

  2. Breast mass detection in tomosynthesis projection images using information-theoretic similarity measures

    Singh, Swatee; Tourassi, Georgia D.; Lo, Joseph Y.


    The purpose of this project is to study Computer Aided Detection (CADe) of breast masses for digital tomosynthesis. It is believed that tomosynthesis will show improvement over conventional mammography in detection and characterization of breast masses by removing overlapping dense fibroglandular tissue. This study used the 60 human subject cases collected as part of on-going clinical trials at Duke University. Raw projections images were used to identify suspicious regions in the algorithm's high-sensitivity, low-specificity stage using a Difference of Gaussian (DoG) filter. The filtered images were thresholded to yield initial CADe hits that were then shifted and added to yield a 3D distribution of suspicious regions. These were further summed in the depth direction to yield a flattened probability map of suspicious hits for ease of scoring. To reduce false positives, we developed an algorithm based on information theory where similarity metrics were calculated using knowledge databases consisting of tomosynthesis regions of interest (ROIs) obtained from projection images. We evaluated 5 similarity metrics to test the false positive reduction performance of our algorithm, specifically joint entropy, mutual information, Jensen difference divergence, symmetric Kullback-Liebler divergence, and conditional entropy. The best performance was achieved using the joint entropy similarity metric, resulting in ROC A z of 0.87 +/- 0.01. As a whole, the CADe system can detect breast masses in this data set with 79% sensitivity and 6.8 false positives per scan. In comparison, the original radiologists performed with only 65% sensitivity when using mammography alone, and 91% sensitivity when using tomosynthesis alone.

  3. Measuring the self-similarity exponent in Lévy stable processes of financial time series

    Fernández-Martínez, M.; Sánchez-Granero, M. A.; Trinidad Segovia, J. E.


    Geometric method-based procedures, which will be called GM algorithms herein, were introduced in [M.A. Sánchez Granero, J.E. Trinidad Segovia, J. García Pérez, Some comments on Hurst exponent and the long memory processes on capital markets, Phys. A 387 (2008) 5543-5551], to efficiently calculate the self-similarity exponent of a time series. In that paper, the authors showed empirically that these algorithms, based on a geometrical approach, are more accurate than the classical algorithms, especially with short length time series. The authors checked that GM algorithms are good when working with (fractional) Brownian motions. Moreover, in [J.E. Trinidad Segovia, M. Fernández-Martínez, M.A. Sánchez-Granero, A note on geometric method-based procedures to calculate the Hurst exponent, Phys. A 391 (2012) 2209-2214], a mathematical background for the validity of such procedures to estimate the self-similarity index of any random process with stationary and self-affine increments was provided. In particular, they proved theoretically that GM algorithms are also valid to explore long-memory in (fractional) Lévy stable motions. In this paper, we prove empirically by Monte Carlo simulation that GM algorithms are able to calculate accurately the self-similarity index in Lévy stable motions and find empirical evidence that they are more precise than the absolute value exponent (denoted by AVE onwards) and the multifractal detrended fluctuation analysis (MF-DFA) algorithms, especially with a short length time series. We also compare them with the generalized Hurst exponent (GHE) algorithm and conclude that both GM2 and GHE algorithms are the most accurate to study financial series. In addition to that, we provide empirical evidence, based on the accuracy of GM algorithms to estimate the self-similarity index in Lévy motions, that the evolution of the stocks of some international market indices, such as U.S. Small Cap and Nasdaq100, cannot be modelized by means of a

  4. An Approach of System Similarity Measurement Based on Segmented-Digital-Fingerprint

    Liao Gen-Wei


    Full Text Available Analysis and identification on software infringement, which is a time-consuming and complicated work, is always done in lab. However, to check whether suspect software infringes upon other’s copyright quickly is the necessity in software infringement cases. An approach of copyright checking based on digital fingerprint is provided in this study, which computes system similarity through segmenting files to be compared, searching boundaries by sliding window and finding the same digital fingerprints of data blocks with simple and complex hash. The approach fits for finding preliminary evidences on the law enforcement spot of software infringement case, thus it has the attributes of efficiency and reliability.

  5. Molecular and pedigree measures of relatedness provide similar estimates of inbreeding depression in a bottlenecked population.

    Townsend, S M; Jamieson, I G


    Individual-based estimates of the degree of inbreeding or parental relatedness from pedigrees provide a critical starting point for studies of inbreeding depression, but in practice wild pedigrees are difficult to obtain. Because inbreeding increases the proportion of genomewide loci that are identical by descent, inbreeding variation within populations has the potential to generate observable correlations between heterozygosity measured using molecular markers and a variety of fitness related traits. Termed heterozygosity-fitness correlations (HFCs), these correlations have been observed in a wide variety of taxa. The difficulty of obtaining wild pedigree data, however, means that empirical investigations of how pedigree inbreeding influences HFCs are rare. Here, we assess evidence for inbreeding depression in three life-history traits (hatching and fledging success and juvenile survival) in an isolated population of Stewart Island robins using both pedigree- and molecular-derived measures of relatedness. We found results from the two measures were highly correlated and supported evidence for significant but weak inbreeding depression. However, standardized effect sizes for inbreeding depression based on the pedigree-based kin coefficients (k) were greater and had smaller standard errors than those based on molecular genetic measures of relatedness (RI), particularly for hatching and fledging success. Nevertheless, the results presented here support the use of molecular-based measures of relatedness in bottlenecked populations when information regarding inbreeding depression is desired but pedigree data on relatedness are unavailable. © 2013 The Authors. Journal of Evolutionary Biology © 2013 European Society For Evolutionary Biology.

  6. Passage-Based Bibliographic Coupling: An Inter-Article Similarity Measure for Biomedical Articles.

    Rey-Long Liu

    Full Text Available Biomedical literature is an essential source of biomedical evidence. To translate the evidence for biomedicine study, researchers often need to carefully read multiple articles about specific biomedical issues. These articles thus need to be highly related to each other. They should share similar core contents, including research goals, methods, and findings. However, given an article r, it is challenging for search engines to retrieve highly related articles for r. In this paper, we present a technique PBC (Passage-based Bibliographic Coupling that estimates inter-article similarity by seamlessly integrating bibliographic coupling with the information collected from context passages around important out-link citations (references in each article. Empirical evaluation shows that PBC can significantly improve the retrieval of those articles that biomedical experts believe to be highly related to specific articles about gene-disease associations. PBC can thus be used to improve search engines in retrieving the highly related articles for any given article r, even when r is cited by very few (or even no articles. The contribution is essential for those researchers and text mining systems that aim at cross-validating the evidence about specific gene-disease associations.

  7. Passage-Based Bibliographic Coupling: An Inter-Article Similarity Measure for Biomedical Articles.

    Liu, Rey-Long


    Biomedical literature is an essential source of biomedical evidence. To translate the evidence for biomedicine study, researchers often need to carefully read multiple articles about specific biomedical issues. These articles thus need to be highly related to each other. They should share similar core contents, including research goals, methods, and findings. However, given an article r, it is challenging for search engines to retrieve highly related articles for r. In this paper, we present a technique PBC (Passage-based Bibliographic Coupling) that estimates inter-article similarity by seamlessly integrating bibliographic coupling with the information collected from context passages around important out-link citations (references) in each article. Empirical evaluation shows that PBC can significantly improve the retrieval of those articles that biomedical experts believe to be highly related to specific articles about gene-disease associations. PBC can thus be used to improve search engines in retrieving the highly related articles for any given article r, even when r is cited by very few (or even no) articles. The contribution is essential for those researchers and text mining systems that aim at cross-validating the evidence about specific gene-disease associations.

  8. Measuring lexical similarity methods for textual mapping in nursing diagnoses in Spanish and SNOMED-CT.

    Cruanes, Jorge; Romá-Ferri, M Teresa; Lloret, Elena


    One of the current problems in the health domain is the reuse and sharing the clinical information between different professionals, as they are written in natural language using specific terminologies. To overcome this issue it is necessary to use a common terminology, like SNOMED-CT, allowing an information reuse that offers the health professionals the quickest access to quality information. In order to use this terminology all the other terminologies have to be mapped to it. One solution to perform that mapping is using a lexical similarity approach. In this paper we analyze the appropriateness of 15 lexical similarity methods for mapping a set of NANDA-I labels to a set of SMOED-CT descriptions in Spanish. Our aim is to establish how to choose the best algorithm in this domain, from the recall and the precision point of view. After running six different tests, we have established that the three best algorithms where those that maximize the recall, because they always return the best solution.

  9. Assessing protein-protein interactions based on the semantic similarity of interacting proteins.

    Cui, Guangyu; Kim, Byungmin; Alguwaizani, Saud; Han, Kyungsook


    The Gene Ontology (GO) has been used in estimating the semantic similarity of proteins since it has the largest and reliable vocabulary of gene products and characteristics. We developed a new method which can assess Protein-Protein Interactions (PPI) using the branching factor and information content of the common ancestor of interacting proteins in the GO hierarchy. We performed a comparative evaluation of the measure with other GO-based similarity measures and evaluation results showed that our method outperformed others in most GO domains.

  10. Sparse multivariate measures of similarity between intra-modal neuroimaging datasets

    Rosa, M.J.; Mehta, M.A.; Merlo Pich, E.; Risterucci, C.; Zelaya, F.; Reinders, A.A.T.S.; Williams, S.; Dazzan, P.; Doyle, O.M.; Marquand, A.F.


    An increasing number of neuroimaging studies are based on either combining more than one data modality (inter-modal) or combining more than one measurement from the same modality (intra-modal). To date, most intra-modal studies using multivariate statistics have focused on differences between datase

  11. Generalization of Subpixel Analysis for Hyperspectral Data With Flexibility in Spectral Similarity Measures

    Chen, Jin; Jia, Xiuping; Yang, Wei; Matsushita, Bunkei


    Several spectral unmixing techniques have been developed for subpixel mapping using hyperspectral data in the past two decades, among which the fully constrained least squares method based on the linear spectral mixture model (LSMM) has been widely accepted. However, the shortage of this method is that the Euclidean spectral distance measure is used, and therefore, it is sensitive to the magnitude of the spectra. While other spectral matching criteria are available, such as spectral angle map...

  12. How to compare movement? A review of physical movement similarity measures in geographic information science and beyond.

    Ranacher, Peter; Tzavella, Katerina


    In geographic information science, a plethora of different approaches and methods is used to assess the similarity of movement. Some of these approaches term two moving objects similar if they share akin paths. Others require objects to move at similar speed and yet others consider movement similar if it occurs at the same time. We believe that a structured and comprehensive classification of movement comparison measures is missing. We argue that such a classification not only depicts the status quo of qualitative and quantitative movement analysis, but also allows for identifying those aspects of movement for which similarity measures are scarce or entirely missing. In this review paper we, first, decompose movement into its spatial, temporal, and spatiotemporal movement parameters. A movement parameter is a physical quantity of movement, such as speed, spatial path, or temporal duration. For each of these parameters we then review qualitative and quantitative methods of how to compare movement. Thus, we provide a systematic and comprehensive classification of different movement similarity measures used in geographic information science. This classification is a valuable first step toward a GIS toolbox comprising all relevant movement comparison methods.

  13. Indirect two-sided relative ranking: a robust similarity measure for gene expression data

    Licamele Louis


    Full Text Available Abstract Background There is a large amount of gene expression data that exists in the public domain. This data has been generated under a variety of experimental conditions. Unfortunately, these experimental variations have generally prevented researchers from accurately comparing and combining this wealth of data, which still hides many novel insights. Results In this paper we present a new method, which we refer to as indirect two-sided relative ranking, for comparing gene expression profiles that is robust to variations in experimental conditions. This method extends the current best approach, which is based on comparing the correlations of the up and down regulated genes, by introducing a comparison based on the correlations in rankings across the entire database. Because our method is robust to experimental variations, it allows a greater variety of gene expression data to be combined, which, as we show, leads to richer scientific discoveries. Conclusions We demonstrate the benefit of our proposed indirect method on several datasets. We first evaluate the ability of the indirect method to retrieve compounds with similar therapeutic effects across known experimental barriers, namely vehicle and batch effects, on two independent datasets (one private and one public. We show that our indirect method is able to significantly improve upon the previous state-of-the-art method with a substantial improvement in recall at rank 10 of 97.03% and 49.44%, on each dataset, respectively. Next, we demonstrate that our indirect method results in improved accuracy for classification in several additional datasets. These datasets demonstrate the use of our indirect method for classifying cancer subtypes, predicting drug sensitivity/resistance, and classifying (related cell types. Even in the absence of a known (i.e., labeled experimental barrier, the improvement of the indirect method in each of these datasets is statistically significant.

  14. Novel Agent Based-approach for Industrial Diagnosis: A Combined use Between Case-based Reasoning and Similarity Measure

    Fatima Zohra Benkaddour


    Full Text Available In spunlace nonwovens industry, the maintenance task is very complex, it requires experts and operators collaboration. In this paper, we propose a new approach integrating an agent- based modelling with case-based reasoning that utilizes similarity measures and preferences module. The main purpose of our study is to compare and evaluate the most suitable similarity measure for our case. Furthermore, operators that are usually geographically dispersed, have to collaborate and negotiate to achieve mutual agreements, especially when their proposals (diagnosis lead to a conflicting situation. The experimentation shows that the suggested agent-based approach is very interesting and efficient for operators and experts who collaborate in INOTIS enterprise.

  15. A path-based measurement for human miRNA functional similarities using miRNA-disease associations

    Ding, Pingjian; Luo, Jiawei; Xiao, Qiu; Chen, Xiangtao


    Compared with the sequence and expression similarity, miRNA functional similarity is so important for biology researches and many applications such as miRNA clustering, miRNA function prediction, miRNA synergism identification and disease miRNA prioritization. However, the existing methods always utilized the predicted miRNA target which has high false positive and false negative to calculate the miRNA functional similarity. Meanwhile, it is difficult to achieve high reliability of miRNA functional similarity with miRNA-disease associations. Therefore, it is increasingly needed to improve the measurement of miRNA functional similarity. In this study, we develop a novel path-based calculation method of miRNA functional similarity based on miRNA-disease associations, called MFSP. Compared with other methods, our method obtains higher average functional similarity of intra-family and intra-cluster selected groups. Meanwhile, the lower average functional similarity of inter-family and inter-cluster miRNA pair is obtained. In addition, the smaller p-value is achieved, while applying Wilcoxon rank-sum test and Kruskal-Wallis test to different miRNA groups. The relationship between miRNA functional similarity and other information sources is exhibited. Furthermore, the constructed miRNA functional network based on MFSP is a scale-free and small-world network. Moreover, the higher AUC for miRNA-disease prediction indicates the ability of MFSP uncovering miRNA functional similarity.

  16. 软直觉模糊集的相似、距离和熵测度%Similarity measure, distance measure and entropy of intuitionistic fuzzy soft sets

    刘雅雅; 秦克云; 陈明奎


    软直觉模糊集之间的相似性可以用相似测度进行度量.通过基例指出以往文献中所提出的相似测度不合理,进而给出一种更为合理的相似测度,并将纠正后的相似测度应用于震后隧道安全状况的判定;引入软直觉模糊集的距离测度、熵测度的公理化定义以及相应的计算公式;给出在参数集合不同的情况下,软直觉模糊集之间相似性的度量方法.%Similarity measure can be used to calculate the similarity between two intuitionistic fuzzy soft sets.We give counterexamples to show that the similarity measures presented in some literatures may be unreasonable,and thus we propose an improved similarity measure.Furthermore,the new similarity measure is applied to the determination of the security situation of tunnels after earthquakes.The axiomatic definitions of distance measure and entropy are also introduced.Some formulae are presented to calculate these measures.At the same time,some formulae are given to evaluate the similarity between two intuitionistic fuzzy soft sets with different parameter sets.

  17. A Novel Similarity Measure to Induce Semantic Classes and Its Application for Language Model Adaptation in a Dialogue System

    Ya-Li Li; Wei-Qun Xu; Yong-Hong Yan


    In this paper,we propose a novel co-occurrence probabilities based similarity measure for inducing semantic classes.Clustering with the new similarity measure outperforms the widely used distance based on Kullback-Leibler divergence in precision,recall and F1 evaluation.In our experiments,we induced semantic clases from unannotated in-domain corpus and then used the induced classes and structures to generate large in-domain corpus which was then used for language model adaptation.Character recognition rate was improved from 85.2% to 91%.We imply a new measure to solve the lack of domain data problem by first induction then generation for a dialogue system.

  18. New method for similarity measures between Vague sets%Vague集相似度量的新方法

    蔡正琦; 普措才仁; 田双亮; 曹永春


    According to the theory of similarity measure between intervals, it is shown that four factors influencing Vague sets and Vague values shoud be taken into account while calculating the similarity degree. Some existing similarity measures are reviewed and compared. Some faults of existed methods are pointed out. A new method for similarity measures between Vague sets(values) is put forward,and is proved to satisfy some rules.The validity and advantage of this method are illustrated by an example.%根据区间相似性的原理,指出影响 Vague 集相似性度量的 4 个因素,指出了现有方法的不足,提出了一种新的 Vague 集相似性度量方法,证明了它满足若干准则.通过例子证明了方法的有效性和优越性.

  19. An Evaluation of a Knowledge Base of Words and Thesauruses on Measuring the Semantic Similarity between Words

    Kawashima, Takahiro; Ishikawa, Tsutomu

    We have developed a knowledge base of words as a tool to measure the semantic similarity between words. In this paper, we evaluate the knowledge base of words comparing with thesauruses, which are commonly used for measuring similarity. Thesauruses of NIHONGO-GOI-TAIKEI(NGT) and Japan Electronic Dictionary(EDR) are selected for the evaluation. For similarity calculation methods using thesauruses, we adopt a newly proposed method, in which each word is represented with vector using the structural feature of thesauruses and the degree of similarity between words is calculated by the inner product of their vectors, in addition to traditional methods based on the path length between categories or the depth of the subsumer. Evaluation is carried out through the two methods, that is, a traditional method based on human rating and the method we have already proposed, feasible for evaluating automatically without human judgment. Evaluation result shows that the knowledge base of word is superior to the both thesauruses(NGT outperforming EDR) as measurement tools, and the proposed calculation method outperforms the traditional ones. The result also shows that our evaluation method is a practical one, by investigating the correlation of both methods.

  20. Biogenic volatile organic compound emissions during BEARPEX 2009 measured by eddy covariance and flux-gradient similarity methods

    J.-H. Park


    Full Text Available The Biosphere Effects on AeRosols and Photochemistry EXperiment (BEARPEX took place in Blodgett Forest, a Ponderosa pine forest in the Sierra Nevada Mountains of California, during summer 2009. We deployed a Proton Transfer Reaction – Mass Spectrometer (PTR-MS to measure fluxes and concentrations of biogenic volatile organic compounds (BVOCs. Eighteen ion species including the major BVOC expected at the site were measured sequentially at 5 heights to observe their vertical gradient from the forest floor to above the canopy. Fluxes of the 3 dominant BVOCs methanol, 2-Methyl-3-butene-2-ol (MBO, and monoterpenes, were measured above the canopy by the eddy covariance method. Canopy scale fluxes were also determined by the flux-gradient similarity method (K-theory. A universal K (Kuniv was determined as the mean of individual K's calculated from the measured fluxes divided by vertical gradients for methanol, MBO, and monoterpenes. This Kuniv was then multiplied by the gradients of each observed ion species to compute their fluxes. The flux-gradient similarity method showed very good agreement with the Eddy Covariance method. Fluxes are presented for all measured species and compared to historical measurements from the same site, and used to test emission algorithms used to model fluxes at the regional scale. MBO was the dominant emission observed followed by methanol, monoterpenes, acetone, and acetaldehyde. The flux-gradient similarity method is shown to be a useful, and we recommend its use especially in experimental conditions when fast measurement of BVOC species is not available.

  1. HPOSim: an R package for phenotypic similarity measure and enrichment analysis based on the human phenotype ontology.

    Deng, Yue; Gao, Lin; Wang, Bingbo; Guo, Xingli


    Phenotypic features associated with genes and diseases play an important role in disease-related studies and most of the available methods focus solely on the Online Mendelian Inheritance in Man (OMIM) database without considering the controlled vocabulary. The Human Phenotype Ontology (HPO) provides a standardized and controlled vocabulary covering phenotypic abnormalities in human diseases, and becomes a comprehensive resource for computational analysis of human disease phenotypes. Most of the existing HPO-based software tools cannot be used offline and provide only few similarity measures. Therefore, there is a critical need for developing a comprehensive and offline software for phenotypic features similarity based on HPO. HPOSim is an R package for analyzing phenotypic similarity for genes and diseases based on HPO data. Seven commonly used semantic similarity measures are implemented in HPOSim. Enrichment analysis of gene sets and disease sets are also implemented, including hypergeometric enrichment analysis and network ontology analysis (NOA). HPOSim can be used to predict disease genes and explore disease-related function of gene modules. HPOSim is open source and freely available at SourceForge (

  2. Learning a similarity-based distance measure for image database organization from human partitionings of an image set

    Squire, David


    In this paper we employ human judgments of image similarity to improve the organization of an image database. We first derive a statistic, $\\kappa_B$ which measures the agreement between two partitionings of an image set. $\\kappa_B$ is used to assess agreement both amongst and between human and machine partitionings. This provides a rigorous means of choosing between competing image database organization systems, and of assessing the performance of such systems with respect to human judgments...

  3. From the similarities between neutrons and radon to advanced radon-detection and improved cold fusion neutron-measurements

    Tommasino, L.; Espinosa, G.


    Neutrons and radon are both ubiquitous in the earth's crust. The neutrons of terrestrial origin are strongly related to radon since they originate mainly from the interactions between the alpha particles from the decays of radioactive-gas (namely Radon and Thoron) and the light nuclei. Since the early studies in the field of neutrons, the radon gas was used to produce neutrons by (α, n) reactions in beryllium. Another important similarity between radon and neutrons is that they can be detected only through the radiations produced respectively by decays or by nuclear reactions. These charged particles from the two distinct nuclear processes are often the same (namely alpha-particles). A typical neutron detector is based on a radiator facing a alpha-particle detector, such as in the case of a neutron film badge. Based on the similarity between neutrons and radon, a film badge for radon has been recently proposed. The radon film badge, in addition to be similar, may be even identical to the neutron film badge. For these reasons, neutron measurements can be easily affected by the presence of unpredictable large radon concentration. In several cold fusion experiments, the CR-39 plastic films (typically used in radon and neutron film-badges), have been the detectors of choice for measuring neutrons. In this paper, attempts will be made to prove that most of these neutron-measurements might have been affected by the presence of large radon concentrations.

  4. Robust recognition of degraded machine-printed characters using complementary similarity measure and error-correction learning

    Hagita, Norihiro; Sawaki, Minako


    Most conventional methods in character recognition extract geometrical features such as stroke direction, connectivity of strokes, etc., and compare them with reference patterns in a stored dictionary. Unfortunately, geometrical features are easily degraded by blurs, stains and the graphical background designs used in Japanese newspaper headlines. This noise must be removed before recognition commences, but no preprocessing method is completely accurate. This paper proposes a method for recognizing degraded characters and characters printed on graphical background designs. This method is based on the binary image feature method and uses binary images as features. A new similarity measure, called the complementary similarity measure, is used as a discriminant function. It compares the similarity and dissimilarity of binary patterns with reference dictionary patterns. Experiments are conducted using the standard character database ETL-2 which consists of machine-printed Kanji, Hiragana, Katakana, alphanumeric, an special characters. The results show that this method is much more robust against noise than the conventional geometrical feature method. It also achieves high recognition rates of over 92% for characters with textured foregrounds, over 98% for characters with textured backgrounds, over 98% for outline fonts, and over 99% for reverse contrast characters.

  5. Performance assessment of a remotely readable graphite oxide (GO)-based tamper-evident seal

    Cattaneo, Alessandro; Marchi, Alexandria N.; Bossert, Jason A.; Dumont, Joseph H.; Gupta, Gautam; Mascareñas, David D. L.


    Tamper-evident seals are commonly used for non-proliferation applications. A properly engineered tamper-evident seal enables the detection of unauthorized access to a protected item or a secured zone. Tamper-evident seals must be susceptible to malicious attacks. These attacks should cause irreversible and detectable damage to the seals. At the same time, tamper-evident seals must demonstrate robustness to environmental changes in order to minimize false-positive and false-negative rates under real operating conditions. The architecture of the tamper-evident seal presented in this paper features a compressive sampling (CS) acquisition scheme, which provides the seal with a means for self- authentication and self-state of health awareness. The CS acquisition scheme is implemented using a micro-controller unit (MCU) and an array of resistors engraved on a graphite oxide (GO) film. CS enables compression and encryption of messages sent from the seal to the remote reader in a non-bit sensitive fashion. As already demonstrated in our previous work through the development of a simulation framework, the CS non-bit sensitive property ensures satisfactory reconstruction of the encrypted messages sent back to the reader when the resistance values of the resistor array are simultaneously affected by modest changes. This work investigates the resistive behavior of the reduced GO film to changes in temperature and humidity when tested in an environmental chamber. The goal is to characterize the humidity and temperature range for reliable operation of a GO-based seal.

  6. Four-Dimensional Computerized Tomography (4D-CT) Reconstruction Based on the Similarity Measure of Spatial Adjacent Images

    ZHANG Shu-xu; ZHOU Ling-hong; CHEN Guang-jie; LIN Sheng-qu; YE Yu-sheng; ZHANG Hai-nan


    Objective:To investigate the feasibility of a 4D-CT reconstruction method based on the similarity principle of spatial adjacent images and mutual information measure. Methods:A motor driven sinusoidal motion platform made in house was used to create one-dimensional periodical motion that was along the longitudinal axis of the CT couch. The amplitude of sinusoidal motion was set to an amplitude of ±1 cm. The period of the motion was adjustable and set to 3.5 s. Phantom objects of two eggs were placed in a Styrofoam block, which in turn were placed on the motion platform. These objects were used to simulate volumes of interest undergoing ideal periodic motion. CT data of static phantom were acquired using a multi-slice general electric (GE) LightSpeed 16-slice CT scanner in an axial mode. And the CT data of periodical motion phantom were acquired in an axial and cine-mode scan. A software program was developed by using VC++ and VTK software tools to resort the CT data and reconstruct the 4D-CT. Then all of the CT data with same phase were sorted by the program into the same series based on the similarity principle of spatial adjacent images and mutual information measure among them, and 3D reconstruction of different phase CT data were completed by using the software. Results:All of the CT data were sorted accurately into different series based on the similarity principle of spatial adjacent images and mutual information measures among them. Compared with the unsorted CT data, the motion artifacts in the 3D reconstruction of sorted CT data were reduced significantly, and all of the sorted CT series result in a 4D-CT that reflected the characteristic of the periodical motion phantom. Conclusion:Time-resolved 4D-CT reconstruction can be implemented with any general multi-slice CT scanners based on the similarity principle of spatial adjacent images and mutual information measure.The process of the 4D-CT data acquisition and reconstruction were not restricted to the

  7. Similarity between partners in real and perceived personality traits as measured by the Myers-Briggs type indicator.

    Nordvik, H


    From 90 couples, 90 male and 90 female subjects, two sets of scores on the four personality dimensions measured by the Myers-Briggs Type Indicator (MBTI) were obtained by letting each person answer each item twice, first in the ordinary way and then as he or she believed the partner would answer the item. Correlations between partners' self-reported scores were all close to zero, whereas the correlations between the partner-reported scores and the self-reported scores were high for both males and females and for all the four dimensions measured by the MBTI, thus indicating that partners were not similar in personality traits, but they had a realistic perception of each other. The results support the hypothesis that mating is random in terms of personality traits.

  8. Incidental Learning: A Brief, Valid Measure of Memory Based on the WAIS-IV Vocabulary and Similarities Subtests.

    Spencer, Robert J; Reckow, Jaclyn; Drag, Lauren L; Bieliauskas, Linas A


    We assessed the validity of a brief incidental learning measure based on the Similarities and Vocabulary subtests of the Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV). Most neuropsychological assessments for memory require intentional learning, but incidental learning occurs without explicit instruction. Incidental memory tests such as the WAIS-III Symbol Digit Coding subtest have existed for many years, but few memory studies have used a semantically processed incidental learning model. We conducted a retrospective analysis of 37 veterans with traumatic brain injury, referred for outpatient neuropsychological testing at a Veterans Affairs hospital. As part of their evaluation, the participants completed the incidental learning tasks. We compared their incidental learning performance to their performance on traditional memory measures. Incidental learning scores correlated strongly with scores on the California Verbal Learning Test-Second Edition (CVLT-II) and Brief Visuospatial Memory Test-Revised (BVMT-R). After we conducted a partial correlation that controlled for the effects of age, incidental learning correlated significantly with the CVLT-II Immediate Free Recall, CVLT-II Short-Delay Recall, CVLT-II Long-Delay Recall, and CVLT-II Yes/No Recognition Hits, and with the BVMT-R Delayed Recall and BVMT-R Recognition Discrimination Index. Our incidental learning procedures derived from subtests of the WAIS-IV Edition are an efficient and valid way of measuring memory. These tasks add minimally to testing time and capitalize on the semantic encoding that is inherent in completing the Similarities and Vocabulary subtests.

  9. Modelling expertise at different levels of granularity using semantic similarity measures in the context of collaborative knowledge-curation platforms.

    Ziaimatin, Hasti; Groza, Tudor; Tudorache, Tania; Hunter, Jane


    Collaboration platforms provide a dynamic environment where the content is subject to ongoing evolution through expert contributions. The knowledge embedded in such platforms is not static as it evolves through incremental refinements - or micro-contributions. Such refinements provide vast resources of tacit knowledge and experience. In our previous work, we proposed and evaluated a Semantic and Time-dependent Expertise Profiling (STEP) approach for capturing expertise from micro-contributions. In this paper we extend our investigation to structured micro-contributions that emerge from an ontology engineering environment, such as the one built for developing the International Classification of Diseases (ICD) revision 11. We take advantage of the semantically related nature of these structured micro-contributions to showcase two major aspects: (i) a novel semantic similarity metric, in addition to an approach for creating bottom-up baseline expertise profiles using expertise centroids; and (ii) the application of STEP in this new environment combined with the use of the same semantic similarity measure to both compare STEP against baseline profiles, as well as to investigate the coverage of these baseline profiles by STEP.

  10. Itakura Distance: A Useful Similarity Measure between EEG and EOG Signals in Computer-aided Classification of Sleep Stages.

    Estrada, E; Nava, P; Nazeran, H; Behbehani, K; Burk, J; Lucas, E


    Sleep is a natural periodic state of rest for the body, in which the eyes usually close and consciousness is completely or partially lost. Consequently, there is a decrease in bodily movements and responsiveness to external stimuli. Slow wave sleep is of immense interest as it is the most restorative sleep stage during which the body recovers from weariness. During this sleep stage, electroencephalographic (EEG) and electro-oculographic (EOG) signals interfere with each other and they share a temporal similarity. In this investigation we used the EEG and EOG signals acquired from 10 patients undergoing overnight polysomnography with their sleep stages determined by certified sleep specialists based on RK rules. In this pilot study, we performed spectral estimation of EEG signals by Autoregressive (AR) modeling, and then used Itakura Distance to measure the degree of similarity between EEG and EOG signals. We finally calculated the statistics of the results and displayed them in an easy to visualize fashion to observe tendencies for each sleep stage. We found that Itakura Distance is the smallest for sleep stages 3 and 4. We intend to deploy this feature as an important element in automatic classification of sleep stages.

  11. Mapping Rice Cropping Systems in Vietnam Using an NDVI-Based Time-Series Similarity Measurement Based on DTW Distance

    Xudong Guan


    Full Text Available Normalized Difference Vegetation Index (NDVI derived from Moderate Resolution Imaging Spectroradiometer (MODIS time-series data has been widely used in the fields of crop and rice classification. The cloudy and rainy weather characteristics of the monsoon season greatly reduce the likelihood of obtaining high-quality optical remote sensing images. In addition, the diverse crop-planting system in Vietnam also hinders the comparison of NDVI among different crop stages. To address these problems, we apply a Dynamic Time Warping (DTW distance-based similarity measure approach and use the entire yearly NDVI time series to reduce the inaccuracy of classification using a single image. We first de-noise the NDVI time series using S-G filtering based on the TIMESAT software. Then, a standard NDVI time-series base for rice growth is established based on field survey data and Google Earth sample data. NDVI time-series data for each pixel are constructed and the DTW distance with the standard rice growth NDVI time series is calculated. Then, we apply thresholds to extract rice growth areas. A qualitative assessment using statistical data and a spatial assessment using sampled data from the rice-cropping map reveal a high mapping accuracy at the national scale between the statistical data, with the corresponding R2 being as high as 0.809; however, the mapped rice accuracy decreased at the provincial scale due to the reduced number of rice planting areas per province. An analysis of the results indicates that the 500-m resolution MODIS data are limited in terms of mapping scattered rice parcels. The results demonstrate that the DTW-based similarity measure of the NDVI time series can be effectively used to map large-area rice cropping systems with diverse cultivation processes.

  12. Value Similarities among Fathers, Mothers, and Adolescents and the Role of a Cultural Stereotype: Different Measurement Strategies Reconsidered

    Roest, Annette M. C.; Dubas, Judith Semon; Gerris, Jan R. M.; Engels, Rutger C. M. E.


    In research on value similarity and transmission between parents and adolescents, no consensus exists on the level of value similarity. Reports of high-value similarities coexist with reports of low-value similarities within the family. The present study shows that different conclusions may be explained by the use of different measurement…

  13. Value Similarities Among Fathers, Mothers, and Adolescents and the Role of a Cultural Stereotype: Different Measurement Strategies Reconsidered

    Roest, A.M.C.; Dubas, J.S.; Gerris, J.R.M.; Engels, R.C.M.E.


    In research on value similarity and transmission between parents and adolescents, no consensus exists on the level of value similarity. Reports of high-value similarities coexist with reports of low-value similarities within the family. The present study shows that different conclusions may be expla

  14. Statistical power of intensity- and feature-based similarity measures for registration of multimodal remote sensing images

    Uss, M.; Vozel, B.; Lukin, V.; Chehdi, K.


    This paper investigates performance characteristics of similarity measures (SM) used in image registration domain to discriminate between aligned and not-aligned reference and template image (RI and TI) fragments. The study emphasizes registration of multimodal remote sensing images including optical-to-radar, optical-to-DEM, and radar-to- DEM scenarios. We compare well-known area-based SMs such as Mutual Information, Normalized Correlation Coefficient, Phase Correlation, and feature-based SM using SIFT and SIFT-OCT descriptors. In addition, a new SM called logLR based on log-likelihood ratio test and parametric modeling of a pair of RI and TI fragments by the Fractional Brownian Motion model is proposed. While this new measure is restricted to linear intensity change between RI and TI (assumption somewhat restrictive for multimodal registration), it takes explicitly into account noise properties of RI and TI and multivariate mutual distribution of RI and TI pixels. Unlike other SMs, distribution of logLR measure for the null hypothesis does not depend on registration scenario or fragments size and follows closely chi-squared distribution according to Wilks's theorem. We demonstrate that a SM utility for image registration purpose can be naturally represented in (True Positive Rate, Positive Likelihood Rate) coordinates. Experiments on real images show that overall the logLR SM outperforms the other SMs in terms of area under the ROC curve, denoted AUC. It also provides the highest Positive Likelihood Rate for True Positive Rate values below 0.4-0.6. But for certain registration problem types, logLR can be second or third best after MI or SIFT SMs.

  15. Validation of INSAT-3D sounder data with in situ measurements and other similar satellite observations over India

    Venkat Ratnam, Madineni; Hemanth Kumar, Alladi; Jayaraman, Achuthan


    To date, several satellites measurements are available which can provide profiles of temperature and water vapour with reasonable accuracies. However, the temporal resolution has remained poor, particularly over the tropics, as most of them are polar orbiting. At this juncture, the launch of INSAT-3D (Indian National Satellite System) by the Indian Space Research Organization (ISRO) on 26 July 2013 carrying a multi-spectral imager covering visible to long-wave infrared made it possible to obtain profiles of temperature and water vapour over India with higher temporal and vertical resolutions and altitude coverage, besides other parameters. The initial validation of INSAT-3D data is made with the high temporal (3 h) resolution radiosonde observations launched over Gadanki (13.5° N, 79.2° E) during a special campaign and routine evening soundings obtained at 12:00 UTC (17:30 LT). We also compared INSAT-3D data with the radiosonde observations obtained from 34 India Meteorological Department stations. Comparisons were also made over India with data from other satellites like AIRS, MLS and SAPHIR and from ERA-Interim and NCEP reanalysis data sets. INSAT-3D is able to show better coverage over India with high spatial and temporal resolutions as expected. Good correlation in temperature between INSAT-3D and in situ measurements is noticed except in the upper tropospheric and lower stratospheric regions (positive bias of 2-3 K). There is a mean dry bias of 20-30 % in the water vapour mixing ratio. Similar biases are noticed when compared to other satellites and reanalysis data sets. INSAT-3D shows a large positive bias in temperature above 25° N in the lower troposphere. Thus, caution is advised when using these data for tropospheric studies. Finally it is concluded that temperature data from INSAT-3D are of high quality and can be directly assimilated for better forecasts over India.

  16. 家用和类似用途器具电磁场(EMF)的测量%Electromagnetic Fields (EMF) Measurement for Household and Similar Electrical Appliances



    The requirement of Electromagnetic Fields had been added into Low Voltage Directive, the test introduces the actuality of Electromagnetic Fields measurement, focuses in electromagnetic fields evaluation and measurement for household and similar electrical appliances.

  17. Similarity measure and topology evolution of foreign exchange markets using dynamic time warping method: Evidence from minimal spanning tree

    Wang, Gang-Jin; Xie, Chi; Han, Feng; Sun, Bo


    In this study, we employ a dynamic time warping method to study the topology of similarity networks among 35 major currencies in international foreign exchange (FX) markets, measured by the minimal spanning tree (MST) approach, which is expected to overcome the synchronous restriction of the Pearson correlation coefficient. In the empirical process, firstly, we subdivide the analysis period from June 2005 to May 2011 into three sub-periods: before, during, and after the US sub-prime crisis. Secondly, we choose NZD (New Zealand dollar) as the numeraire and then, analyze the topology evolution of FX markets in terms of the structure changes of MSTs during the above periods. We also present the hierarchical tree associated with the MST to study the currency clusters in each sub-period. Our results confirm that USD and EUR are the predominant world currencies. But USD gradually loses the most central position while EUR acts as a stable center in the MST passing through the crisis. Furthermore, an interesting finding is that, after the crisis, SGD (Singapore dollar) becomes a new center currency for the network.

  18. A novel method for condition monitoring of rotating machinery based on statistical linguistic analysis and weighted similarity measures

    Lin, Jinshan; Dou, Chunhong


    Defective rotating machinery generally produces complex fluctuations due to non-stationary and nonlinear properties of dynamical systems. Consequently, dynamical structures of vibration data from rotating machinery are hard to disclose. As a result, condition monitoring of rotating machinery is fairly challenging. In this paper, statistical linguistic analysis (SLA), a novel tool for time series analysis, was introduced to analyze dynamical mechanisms hidden in vibration data of rotating machinery. SLA maps original vibration data from rotating machinery to a binary symbolic sequence by exploiting potential of increase and decreases of time intervals. Next, by sliding a window and identifying the elements in each window as a ;word;, a group of words is created. Then, by counting the occurrence of each word type, the binary symbolic sequence can be converted into a word frequency sequence. Next, a weighted similarity measure (WSM) defined in this paper serves to detect a change of running conditions of rotating machinery. As a result, this paper proposed a novel method for condition monitoring of rotating machinery based on SLA and WSM. Afterwards, the performance of the proposed method was validated using vibration data from both gearboxes and rolling bearings. Also, the proposed method was compared with conventional temporal statistical parameters, Approximate Entropy and Sample Entropy. The results indicate that the proposed method performs better than the other methods in condition monitoring of rotating machinery. Also, compared with either of Correlation Coefficients and Standardized Euclidean Distances, the WSM gives a somewhat better performance in reflecting a change of dynamical structures.

  19. Web Similarity

    Cohen, A.R.; Vitányi, P.M.B.


    Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or any other large electronic database, for instance Wikipedia, and a search engine that returns reliable aggregate page counts. For sets of search terms the NWD gives a similarity on a scale fr

  20. The Effects of Attitude Similarity and Utility on Liking for a Stranger: Measurement of Attraction with the IJS.

    Nesler, Mitchell S.; And Others

    Research has demonstrated that attraction to a stranger is a function of the proportion of similar attitudes reported by that stranger. Traditional theories of attraction do not usually differentiate between respect or esteem for another and liking. This study used a 2 x 2 factorial experiment to test the hypothesis that the desire to work with…

  1. A new protein binding pocket similarity measure based on comparison of clouds of atoms in 3D: application to ligand prediction

    Zaslavskiy Mikhail


    Full Text Available Abstract Background Predicting which molecules can bind to a given binding site of a protein with known 3D structure is important to decipher the protein function, and useful in drug design. A classical assumption in structural biology is that proteins with similar 3D structures have related molecular functions, and therefore may bind similar ligands. However, proteins that do not display any overall sequence or structure similarity may also bind similar ligands if they contain similar binding sites. Quantitatively assessing the similarity between binding sites may therefore be useful to propose new ligands for a given pocket, based on those known for similar pockets. Results We propose a new method to quantify the similarity between binding pockets, and explore its relevance for ligand prediction. We represent each pocket by a cloud of atoms, and assess the similarity between two pockets by aligning their atoms in the 3D space and comparing the resulting configurations with a convolution kernel. Pocket alignment and comparison is possible even when the corresponding proteins share no sequence or overall structure similarities. In order to predict ligands for a given target pocket, we compare it to an ensemble of pockets with known ligands to identify the most similar pockets. We discuss two criteria to evaluate the performance of a binding pocket similarity measure in the context of ligand prediction, namely, area under ROC curve (AUC scores and classification based scores. We show that the latter is better suited to evaluate the methods with respect to ligand prediction, and demonstrate the relevance of our new binding site similarity compared to existing similarity measures. Conclusions This study demonstrates the relevance of the proposed method to identify ligands binding to known binding pockets. We also provide a new benchmark for future work in this field. The new method and the benchmark are available at

  2. Measurement of Steroids in Rats after Exposure to an Endocrine Disruptor: Mass Spectrometry and Radioimmunoassay Demonstrate Similar Results

    Commercially available radioimmunoassays (RIAs) are frequently used in toxicological studies to evaluate effects of endocrine disrupting chemicals (EDCs) on steroidogenesis in rats. Currently there are limited data comparing steroid concentrations in rats as measured by RIAs to t...

  3. An improved method for functional similarity analysis of genes based on Gene Ontology.

    Tian, Zhen; Wang, Chunyu; Guo, Maozu; Liu, Xiaoyan; Teng, Zhixia


    Measures of gene functional similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene functional similarity methods have been proposed based on the semantic similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene functional similarity reliably is still a challenging problem. We propose WIS, an effective method to measure the gene functional similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene functional similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at .

  4. Similarity Measure of Content Based Image Retrieval%基于内容图像检索的相似性度量研究



    图像的相似性度量是基于内容的图像检索技术中的一个非常关键的问题。理想的图像相似性度量方法应该能满足人的视觉特性,能够使得视觉上相似的图像间具有较小的距离,也就是说二者的相似度越大,其距离就越小。很显然,选择的相似性度量方法对图像检索结果的影响很大,相似性度量方法的好坏会直接影响到图像检索的性能。所以对常用的相似性度量的方法进行分析,并提出将来相似性度量的研究方向很有必要。%Image similarity measure is the content based image retrieval technology is a very key problem. The ideal method for image similarity measure should be able to meet the human visual characteristics, can make the visually similar images with the smaller distance, that is to say the two similarity is greater, the distance is smaller. Obviously, selection of similarity measurement method for image retrieval resuhs are greatly influenced. By the method of similarity measure, directly affects the performanee of image retrieval. In this paper the similarity measurement method are analyzed, and puts forward the future study direction of similarity measure.

  5. Similarity Scaling

    Schnack, Dalton D.

    In Lecture 10, we introduced a non-dimensional parameter called the Lundquist number, denoted by S. This is just one of many non-dimensional parameters that can appear in the formulations of both hydrodynamics and MHD. These generally express the ratio of the time scale associated with some dissipative process to the time scale associated with either wave propagation or transport by flow. These are important because they define regions in parameter space that separate flows with different physical characteristics. All flows that have the same non-dimensional parameters behave in the same way. This property is called similarity scaling.

  6. Effective similarity measure for XML retrieval results%有效的XML检索结果的相似性度量

    刘喜平; 万常选


    As XML has become a de facto standard for formatting and exchanging data on the web and in digital library and scientific applications, there is an increasing need for managing, clustering and retrieving XML data. XML information retrieval is one of the most active areas in database and information retrieval research. In information retrieval, retrieval results organization is an important aspect and effective technique. For example, results clustering has been studied and proved effective in improving retrieval quality. When information retrieval meets XML, it is natural to borrow and extend traditional techniques such as result clustering and apply these techniques to XML retrieval. Clustering XML retrieval results, however, is non-trivial and cannot employ traditional techniques built for traditional information retrieval directly. The core of clustering is similarity measure between data objects, and the similarity measure for XML retrieval results is still open. In this paper, we study the similarity measures of XML retrieval results, and propose novel structural and content similarity measures. Firstly, to remove redundant information, we compute the structural summaries of document trees to reduce the original documents. Summary tree (I. E. Structural summary) still has a lot of structural information. In order to depict the summary tree in a comprehensive way, the paper proposes two feature sets, which reflect structural features of summary tree from different perspectives and are complementary to each other. Corresponding to these feature sets, we present a two-dimensional structural similarity measure comprising two similarities! Horizontal structural similarity and vertical structural similarity. Each of them represents the similarity from one particular perspective and the combination of them will give rise to an accurate structural similarity measure. On the other hand, we propose structural content model to describe the content. A content similarity

  7. Measuring similarity and improving stability in biomarker identification methods applied to Fourier-transform infrared (FTIR) spectroscopy.

    Trevisan, Júlio; Park, Juhyun; Angelov, Plamen P; Ahmadzai, Abdullah A; Gajjar, Ketan; Scott, Andrew D; Carmichael, Paul L; Martin, Francis L


    FTIR spectroscopy is a powerful diagnostic tool that can also derive biochemical signatures of a wide range of cellular materials, such as cytology, histology, live cells, and biofluids. However, while classification is a well-established subject, biomarker identification lacks standards and validation of its methods. Validation of biomarker identification methods is difficult because, unlike classification, there is usually no reference biomarker against which to test the biomarkers extracted by a method. In this paper, we propose a framework to assess and improve the stability of biomarkers derived by a method, and to compare biomarkers derived by different method set-ups and between different methods by means of a proposed "biomarkers similarity index".

  8. MATLAB code to estimate landslide volume from single remote sensed image using genetic algorithm and imagery similarity measurement

    Wang, Ting-Shiuan; Yu, Teng-To; Lee, Shing-Tsz; Peng, Wen-Fei; Lin, Wei-Ling; Li, Pei-Ling


    Information regarding the scale of a hazard is crucial for the evaluation of its associated impact. Quantitative analysis of landslide volume immediately following the event can offer better understanding and control of contributory factors and their relative importance. Such information cannot be gathered for each landslide event, owing to limitations in obtaining useable raw data and the necessary procedures of each applied technology. Empirical rules are often used to predict volume change, but the resulting accuracy is very low. Traditional methods use photogrammetry or light detection and ranging (LiDAR) to produce a post-event digital terrain model (DTM). These methods are both costly and time-intensive. This study presents a technique to estimate terrain change volumes quickly and easily, not only reducing waiting time but also offering results with less than 25% error. A genetic algorithm (GA) programmed MATLAB is used to intelligently predict the elevation change for each pixel of an image. This deviation from the pre-event DTM becomes a candidate for the post-event DTM. Thus, each changed DTM is converted into a shadow relief image and compared with a single post-event remotely sensed image for similarity ranking. The candidates ranked in the top two thirds are retained as parent chromosomes to produce offspring in the next generation according to the rules of GAs. When the highest similarity index reaches 0.75, the DTM corresponding to that hillshade image is taken as the calculated post-event DTM. As an example, a pit with known volume is removed from a flat, inclined plane to demonstrate the theoretical capability of the code. The method is able to rapidly estimate the volume of terrain change within an error of 25%, without the delays involved in obtaining stereo image pairs, or the need for ground control points (GCPs) or professional photogrammetry software.

  9. Similarity of Fibroglandular Breast Tissue Content Measured from Magnetic Resonance and Mammographic Images and by a Mathematical Algorithm

    Fatima Nayeem


    Full Text Available Women with high breast density (BD have a 4- to 6-fold greater risk for breast cancer than women with low BD. We found that BD can be easily computed from a mathematical algorithm using routine mammographic imaging data or by a curve-fitting algorithm using fat and nonfat suppression magnetic resonance imaging (MRI data. These BD measures in a strictly defined group of premenopausal women providing both mammographic and breast MRI images were predicted as well by the same set of strong predictor variables as were measures from a published laborious histogram segmentation method and a full field digital mammographic unit in multivariate regression models. We also found that the number of completed pregnancies, C-reactive protein, aspartate aminotransferase, and progesterone were more strongly associated with amounts of glandular tissue than adipose tissue, while fat body mass, alanine aminotransferase, and insulin like growth factor-II appear to be more associated with the amount of breast adipose tissue. Our results show that methods of breast imaging and modalities for estimating the amount of glandular tissue have no effects on the strength of these predictors of BD. Thus, the more convenient mathematical algorithm and the safer MRI protocols may facilitate prospective measurements of BD.


    Grygorczuk, J.; Czechowski, A.; Grzedzielski, S., E-mail: [Space Research Centre, Warsaw (Poland)


    The solar wind carves a cavity in the interstellar plasma bounded by a surface, called the heliopause (HP), that separates the plasma and magnetic field of solar origin from those of interstellar origin. It is now generally accepted that in 2012 August Voyager 1 (V1) crossed that boundary. Unexpectedly, the magnetic fields on both sides of the HP, although theoretically independent of each other, were found to be similar in direction. This delayed the identification of the boundary as the HP and led to many alternative explanations. Here, we show that the Voyager 1 observations can be readily explained and, after the Interstellar Boundary Explorer (IBEX) discovery of the ribbon, could even have been predicted. Our explanation relies on the fact that the Voyager 1 and undisturbed interstellar field directions (which we assume to be given by the IBEX ribbon center (RC)) share the same heliolatitude (∼34.°5) and are not far separated in longitude (difference ∼27°). Our result confirms that Voyager 1 has indeed crossed the HP and offers the first independent confirmation that the IBEX RC is in fact the direction of the undisturbed interstellar magnetic field. For Voyager 2, we predict that the difference between the inner and outer magnetic field directions at the HP will be significantly larger than that observed by Voyager 1 (∼30° instead of ∼20°), and that the outer field direction will be close to the RC.

  11. Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.

    Xiaomei Wu

    Full Text Available BACKGROUND: Explicit comparisons based on the semantic similarity of Gene Ontology terms provide a quantitative way to measure the functional similarity between gene products and are widely applied in large-scale genomic research via integration with other models. Previously, we presented an edge-based method, Relative Specificity Similarity (RSS, which takes the global position of relevant terms into account. However, edge-based semantic similarity metrics are sensitive to the intrinsic structure of GO and simply consider terms at the same level in the ontology to be equally specific nodes, revealing the weaknesses that could be complemented using information content (IC. RESULTS AND CONCLUSIONS: Here, we used the IC-based nodes to improve RSS and proposed a new method, Hybrid Relative Specificity Similarity (HRSS. HRSS outperformed other methods in distinguishing true protein-protein interactions from false. HRSS values were divided into four different levels of confidence for protein interactions. In addition, HRSS was statistically the best at obtaining the highest average functional similarity among human-mouse orthologs. Both HRSS and the groupwise measure, simGIC, are superior in correlation with sequence and Pfam similarities. Because different measures are best suited for different circumstances, we compared two pairwise strategies, the maximum and the best-match average, in the evaluation. The former was more effective at inferring physical protein-protein interactions, and the latter at estimating the functional conservation of orthologs and analyzing the CESSM datasets. In conclusion, HRSS can be applied to different biological problems by quantifying the functional similarity between gene products. The algorithm HRSS was implemented in the C programming language, which is freely available from

  12. Vague集中的分解定理与相似度量%Decomposition Theorem and Measures of Similarity in Vague Sets

    闫德勤; 迟忠先


    Based on fuzzy set theory this paper gives decomposition theorem in Vague sets, and by discussing themeasures of similarity of Vague sets this paper presents a new kind of the measure. The results can be used for fur-ther research and applications of Vague sets.

  13. Non-Metric Similarity Measures


    are numeric. Out of 11 data sets used, six are from text mining domain, two from music classification and retrieval domain, 2 from character...algorithm for discovering clusters in large spatial databases with noise . Proceed- ings of ACM SIGKDD Conference on Knowledge Discovery and Data

  14. A Novel Relevance Feedback Approach Based on Similarity Measure Modification in an X-Ray Image Retrieval System Based on Fuzzy Representation Using Fuzzy Attributed Relational Graph

    Hossien Pourghassem


    Full Text Available Relevance feedback approaches is used to improve the performance of content-based image retrieval systems. In this paper, a novel relevance feedback approach based on similarity measure modification in an X-ray image retrieval system based on fuzzy representation using fuzzy attributed relational graph (FARG is presented. In this approach, optimum weight of each feature in feature vector is calculated using similarity rate between query image and relevant and irrelevant images in user feedback. The calculated weight is used to tune fuzzy graph matching algorithm as a modifier parameter in similarity measure. The standard deviation of the retrieved image features is applied to calculate the optimum weight. The proposed image retrieval system uses a FARG for representation of images, a fuzzy matching graph algorithm as similarity measure and a semantic classifier based on merging scheme for determination of the search space in image database. To evaluate relevance feedback approach in the proposed system, a standard X-ray image database consisting of 10000 images in 57 classes is used. The improvement of the evaluation parameters shows proficiency and efficiency of the proposed system.

  15. Intuitionistic fuzzy similarity measure approach based on orientation%基于倾向性的直觉模糊相似度量方法

    王毅; 刘三阳; 程月蒙; 余晓东


    针对现有直觉模糊相似度量所存在的不足,提出一种基于倾向性的直觉模糊相似度量方法。首先,基于直觉指数所表征的中立证据中支持与反对的程度呈均衡状态假设,揭示了影响直觉模糊集相似性大小的3个相互作用因素之间的内部关系,给出了相似度量的几何表示。其次,对现有部分相似度量方法在某些情况下无法表述的问题进行了分析,定义了满足直觉模糊相似性的直观约束条件,给出一种直觉模糊相似度量的公理化定义。再次,揭示了直觉指数对证据的倾向性影响,提出了一种基于倾向性的直觉模糊相似度量方法。最后,通过算例分析比较,验证该方法的正确性、合理性、有效性。%A approach to intuitionistic fuzzy similarity measure based on orientation is proposed.Aiming at the deficiency of the present Intuitionistic fuzzy similarity measures.First,the internal relationships of three in-teracting factors which directly affect intuitionistic fuzzy similarity are revealed and then three-dimensional illus-tration is presented based on the hypothesis that the supportability and opposability of neutral evidences are in an equilibrium state indicated by the intuitionistic index.Second,the problems that existing intuitionistic fuzzy similarity measures cannot express are analysed,some explicit constraints for intuitionistic fuzzy similarity are given and thus an axiomatic definition of intuitionistic fuzzy similarity measures is put forward.Third,with the revelation of the impact of intuitionistic index on evidence,a approach to intuitionistic fuzzy similarity measure is proposed.Finally,through analyzing and comparing by a set of calculating examples,it is proved that the pro-posed approach is correct,reasonable and valid.

  16. The semantic similarity ensemble

    Andrea Ballatore


    Full Text Available Computational measures of semantic similarity between geographic terms provide valuable support across geographic information retrieval, data mining, and information integration. To date, a wide variety of approaches to geo-semantic similarity have been devised. A judgment of similarity is not intrinsically right or wrong, but obtains a certain degree of cognitive plausibility, depending on how closely it mimics human behavior. Thus selecting the most appropriate measure for a specific task is a significant challenge. To address this issue, we make an analogy between computational similarity measures and soliciting domain expert opinions, which incorporate a subjective set of beliefs, perceptions, hypotheses, and epistemic biases. Following this analogy, we define the semantic similarity ensemble (SSE as a composition of different similarity measures, acting as a panel of experts having to reach a decision on the semantic similarity of a set of geographic terms. The approach is evaluated in comparison to human judgments, and results indicate that an SSE performs better than the average of its parts. Although the best member tends to outperform the ensemble, all ensembles outperform the average performance of each ensemble's member. Hence, in contexts where the best measure is unknown, the ensemble provides a more cognitively plausible approach.

  17. 面向Artifact的业务流程行为相似性度量方法%Behavior Similarity Measure Method for Artifact-oriented Business Process

    刘海滨; 刘国华; 王颖; 赵丹枫


    面向Artifact的业务流程是以数据为中心的业务流程的代表.与传统以过程为中心的业务流程相似,为了更好的对流程模型进行流程检索、流程挖掘等操作,计算流程间的相似性或距离是一个关键的问题.给出一种面向Artifact的业务流程行为相似性度量方法.首先,通过测量流程模型之间关键Artifact的相似性来评估流程处理的核心业务数据的相似度.其次,根据关键Artifact生命周期特性,测量任务执行路径中任务依赖关系的相似性.最后,测量生命周期中关键Artifact属性赋值序列的相似性.理论和实例分析表明,该方法是一个有效的相似性度量方法.%Artifact-oriented business process is the representative of data-centric business process. Similar with traditional control-centric business process, it is a critical issue to determine the similarity or the distance between two processes which would enable a better operation of the Artifact-oriented business model, such as process retrieval, process mining, etc. A novel behavior similarity measure method for Artifact-oriented business process is proposed. Firstly, this method computes the similarity of business core data in the process by measuring the similarity of key Artifact. Secondly, it measures the task dependence relation in the task executing paths according to the lifecycle characteristic of key Artifact. Finally, it measures the similarity of key Artifact attribute assignment sequence in the task executing path. Theoretical and instantiation analysis also demonstrate the validity of this method.

  18. 一种基于相似度量的离群点检测方法%A Kind of Outlier Detection Algorithm Based on Similarity Measurement

    孙启林; 方宏彬; 张健; 刘明术


    Outlier detection is an important content in data mining and is widely used in the field of credit card fraud detection, network invasion detection and so on. According to hierarchical clustering and similarity, this paper presents the concept of high dimensional data similarity measurement function and class density, based on class density,the outlier of high dimensional data is redefined so that a kind of outlier detection algorithm based on similarity measurement is proposed. Experiment shows that this algorithm has certain value on outlier detection in high dimensional data.%离群点检测在是数据挖掘的重要领域,广泛应用在信用卡欺诈检测、网络入侵检测等重要方面,文中在结合层次聚类和相似性,给出高维数据的相似度量函数与类密度的概念,并基于类密度重新定义高维数据的离群点,从而提出一种基于相似度量的离群点检测算法;实验表明:算法对高维数据中的离群点检测有一定的价值。

  19. Similarity and inclusion measures between IT2 FSs%区间二型模糊相似度与包含度

    郑高; 肖建; 蒋强; 张勇


    The similarity and inclusion measures between fuzzy sets are two important concepts in fuzzy set theory, but little effort as to them has been made on type-2 fuzzy sets. Therefore, a similarity measure and an inclusion measure between interval type-2 fuzzy sets (IT2 FSs) are proposed. Firstly, the axiomatic definitions of two measures are selected. Then, based on the selected definitions, the computation formulas are proposed, and four theorems that two measures can be transformed by each other are demonstrated. Finally, examples are presented to validate their performance and combine the proposed similarity measure with Yang and Shih's clustering method for an application to clustering analysis of Gaussian IT2 FSs, and a reasonably hierarchical clustering tree in different α-levels is obtained. Simulation results show the practicability of the proposed measures.%相似度与包含度是模糊集合理论中的两个重要概念,但对于二型模糊集合的研究还较为少见.鉴于此,提出了新的区间二型模糊相似度与包含度.首先选择了二者的公理化定义;然后基于公理化定义提出了新的计算公式,并讨论了二者的相互转换关系;最后通过实例来验证二者的性能,并将区间二型模糊相似度与Yang-Shih聚类方法相结合,用于高斯区间二型模糊集合的聚类分析,得到了合理的层次聚类树.仿真实例表明新测度具有一定的实用价值.

  20. Protein-protein interaction inference based on semantic similarity of Gene Ontology terms.

    Zhang, Shu-Bo; Tang, Qiang-Rong


    Identifying protein-protein interactions is important in molecular biology. Experimental methods to this issue have their limitations, and computational approaches have attracted more and more attentions from the biological community. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most powerful indicators for protein interaction. However, conventional methods based on GO similarity fail to take advantage of the specificity of GO terms in the ontology graph. We proposed a GO-based method to predict protein-protein interaction by integrating different kinds of similarity measures derived from the intrinsic structure of GO graph. We extended five existing methods to derive the semantic similarity measures from the descending part of two GO terms in the GO graph, then adopted a feature integration strategy to combines both the ascending and the descending similarity scores derived from the three sub-ontologies to construct various kinds of features to characterize each protein pair. Support vector machines (SVM) were employed as discriminate classifiers, and five-fold cross validation experiments were conducted on both human and yeast protein-protein interaction datasets to evaluate the performance of different kinds of integrated features, the experimental results suggest the best performance of the feature that combines information from both the ascending and the descending parts of the three ontologies. Our method is appealing for effective prediction of protein-protein interaction.

  1. Measurement Method of Source Code Similarity Based on Word%基于单词的源程序相似度度量方法

    朱红梅; 孙未; 王鲁; 张亮


    为了帮助教师快速准确地识别程序设计类作业中的抄袭现象,本文研究了一种源程序相似度度量方法,根据学生提交的源程序,基于单词统计程序源代码之间的编辑距离和最长公共子序列的长度,计算程序对之间的相似度,通过设定合理的动态阈值,判断源程序对之间是否存在抄袭。实验结果表明,该方法能够及时有效和准确地识别学生提交的相似源程序。%In order to help teachers to identify quickly and accurately the plagiarism among students' source codes, this paper works out a method of measuring the similarity of source codes. Based on editing distance be-tween words and the length of longest common subsequence, we calculate the similarity of the source programs submitted by students, and by setting a reasonable dynamic sensory threshold, we determine whether there is pla-giarism. Experimental results show that this method can identify effectively and accurately similar source codes.

  2. Introduction on background medium theory about celestial body motion orbit and foundation of fractional-dimension calculus about self-similar fractal measure calculation

    Yan, Kun


    In this paper, by discussing the basic hypotheses about the continuous orbit and discrete orbit in two research directions of the background medium theory for celestial body motion, the concrete equation forms and their summary of the theoretic frame of celestial body motion are introduced. Future more, by discussing the general form of Binet's equation of celestial body motion orbit and it's solution of the advance of the perihelion of planets, the relations and differences between the continuous orbit theory and Newton's gravitation theory and Einstein's general relativity are given. And by discussing the fractional-dimension expanded equation for the celestial body motion orbits, the concrete equations and the prophesy data of discrete orbit or stable orbits of celestial bodies which included the planets in the Solar system, satellites in the Uranian system, satellites in the Earth system and satellites obtaining the Moon obtaining from discrete orbit theory are given too. Especially, as the preliminary exploration and inference to the gravitation curve of celestial bodies in broadly range, the concept for the ideal black hole with trend to infinite in mass density difficult to be formed by gravitation only is explored. By discussing the position hypothesis of fractional-dimension derivative about general function and the formula form the hypothesis of fractional-dimension derivative about power function, the concrete equation formulas of fractional-dimension derivative, differential and integral are described distinctly further, and the difference between the fractional-dimension derivative and the fractional-order derivative are given too. Subsequently, the concrete forms of measure calculation equations of self-similar fractal obtaining by based on the definition of form in fractional-dimension calculus about general fractal measure are discussed again, and the differences with Hausdorff measure method or the covering method at present are given. By applying

  3. 基于本体结构的语义相似度计算%Semantic Similarity Measurement Based on Ontology

    杨方颖; 蒋正翔; 张姗姗


    语义相似度是语义网络和信息检索领域的重要内容.本体结构为语义相似度计算提供了新的思路,但现有的方法都存在着不同程度的缺陷.为了提高已有方法的有效性,在分析语义相似度经典方法的基础上,充分利用本体的结构信息,综合考虑概念在本体图中的位置、语义距离,共享属性量和共享信息等因素,提出了一个基于本体结构的语义相似度算法.实验部分以维基百科中公开发布的氨基酸本体为例,通过与经典方法计算结果的对比,证明了算法的有效性.%Semantic similarity is one of the most important parts in domains of semantic networks and information retrieval.The structure of ontology provides a new perspective for semantic similarity measurement.However,there are varying degrees of defaults in the existing methods.In order to improve the effectiveness of the existing methods,a novel approach for semantic similarity measurement based on ontology construction is proposed after a deep research on various classical approaches.It takes into account many factors,including semantic distance,level,semantic coincidence degree and information content.In order to verify the effectiveness of the algorithm,some comparative experiments are conducted using the Wikipedia disclosed amino acid ontology as an example.

  4. A similarity measuring-based discretization method%一种基于相似性度量的离散化方法

    丁剑; 白凤伟


    This paper describes a discretization method using similarity measuring theory aiming at solving the inadequacies of information entropy method. After numeric attributes are discretized, the amount of information of each interval is measured using one similarity measuring formula called algebra-geometry mean distance formula and the distribution of class values would be fairly consistent within an interval. The number of intervals is decided by the size of the dataset. First, our discretization method and the information entropy-based discretization are combined to discretize several datasets, and then Naive Bayes Simple classifier is used to compare the accuracies of these discretized datasets. The result shows that our discretization method have better correct classification rate against the information entropy-based discretization.%针对基于信息熵的离散化方法的不足,提出了一种应用相似性度量理论将数值型属性进行离散化的方法.数值型属性离散化后,每一个区间所获得的信息量用一个叫做代数-几何平均数距离公式的相似性度量公式来度量;区间的数目由训练数据集合的大小动态决定.将此方法和基于信息熵的离散化方法在一些数据集合上进行实验,并用朴素贝叶斯分类器对离散化后的数据集合进行分类,结果表明该方法有更好的分类正确率.

  5. Radiometric Normalization of Temporal Images Combining Automatic Detection of Pseudo-Invariant Features from the Distance and Similarity Spectral Measures, Density Scatterplot Analysis, and Robust Regression

    Ana Paula Ferreira de Carvalho


    Full Text Available Radiometric precision is difficult to maintain in orbital images due to several factors (atmospheric conditions, Earth-sun distance, detector calibration, illumination, and viewing angles. These unwanted effects must be removed for radiometric consistency among temporal images, leaving only land-leaving radiances, for optimum change detection. A variety of relative radiometric correction techniques were developed for the correction or rectification of images, of the same area, through use of reference targets whose reflectance do not change significantly with time, i.e., pseudo-invariant features (PIFs. This paper proposes a new technique for radiometric normalization, which uses three sequential methods for an accurate PIFs selection: spectral measures of temporal data (spectral distance and similarity, density scatter plot analysis (ridge method, and robust regression. The spectral measures used are the spectral angle (Spectral Angle Mapper, SAM, spectral correlation (Spectral Correlation Mapper, SCM, and Euclidean distance. The spectral measures between the spectra at times t1 and t2 and are calculated for each pixel. After classification using threshold values, it is possible to define points with the same spectral behavior, including PIFs. The distance and similarity measures are complementary and can be calculated together. The ridge method uses a density plot generated from images acquired on different dates for the selection of PIFs. In a density plot, the invariant pixels, together, form a high-density ridge, while variant pixels (clouds and land cover changes are spread, having low density, facilitating its exclusion. Finally, the selected PIFs are subjected to a robust regression (M-estimate between pairs of temporal bands for the detection and elimination of outliers, and to obtain the optimal linear equation for a given set of target points. The robust regression is insensitive to outliers, i.e., observation that appears to deviate

  6. Analysis of Similarities & Differences between Tax Preservative Measures and Tax Enforcement Measures%税收保全与税收强制执行措施的比较分析



    In this paper,tax and tax enforcement measures preserve the differences and start to explain the similarities and differences between the preservation of tax measures and tax measures,so that they can help the implementation of the strict distinctions to avoid confusion.%本文从税收保全与税收强制执行措施的区别与联系入手说明了两者的异同,有利于在执行过程中严格区分,避免混淆。

  7. Word Semantic Similarity Measurement Based on Evidence Theory%基于证据理论的单词语义相似度度量

    王俊华; 左祥麟; 左万利


    Measuring semantic similarity between words is a classical and hot problem in nature language processing, the achievement of which has great impact on many applications such as word sense disambiguation, machine translation, ontology mapping, computational linguistics, etc. This paper proposes a novel approach to measure words semantic similarity by combining evidence theory with knowledge base. Firstly, we extract evidences based on WordNet;secondly, we analyze the reasonableness of the extracted evidence using scatter plot;thirdly, we generate basic probability assignment by statistics and piecewise linear interpolation technique; fourthly, we obtain global basic probability assignment by integrating evidence conflict resolution, importance distribution, and D-S combination rules; finally, we quantify word semantic similarity. On data set R&G(65), we conducted experiment through 5-fold cross validation, and the correlation of our experimental results with human judgment was 0.912, with 0.4% improvements over existing best practice P&S, 7%∼13% improvements over classical methods (reLHS、distJC、simLC、simL, simR); the experimental results based on M&C(30) and WordSim353 were also good with correlations being 0.915 and 0.941. The operational efficiency of our method is as good as classical methods0, showing that using evidence theory to measure word semantic similarity is reasonable and effective.%单词语义相似度度量一直是自然语言处理领域的经典和热点问题,其成果可对词义消歧、机器翻译、本体映射、计算语言学等应用具有重要影响.本文通过结合证据理论和知识库,提出一个新颖的度量单词语义相似度度量途径.首先,借助通用本体WordNet 获取证据;其次,利用散点图分析证据的合理性;然后,使用统计和分段线性插值生成基本信任分配函数;最后,结合证据冲突处理、重要度分配和D-S 合成规则实现信息融合获得全局基本信任

  8. Similar head impact acceleration measured using instrumented ear patches in a junior rugby union team during matches in comparison with other sports.

    King, Doug A; Hume, Patria A; Gissane, Conor; Clark, Trevor N


    OBJECTIVE Direct impact with the head and the inertial loading of the head have been postulated as major mechanisms of head-related injuries, such as concussion. METHODS This descriptive observational study was conducted to quantify the head impact acceleration characteristics in under-9-year-old junior rugby union players in New Zealand. The impact magnitude, frequency, and location were collected with a wireless head impact sensor that was worn by 14 junior rugby players who participated in 4 matches. RESULTS A total of 721 impacts > 10g were recorded. The median (interquartile range [IQR]) number of impacts per player was 46 (IQR 37-58), resulting in 10 (IQR 4-18) impacts to the head per player per match. The median impact magnitudes recorded were 15g (IQR 12g-21g) for linear acceleration and 2296 rad/sec(2) (IQR 1352-4152 rad/sec(2)) for rotational acceleration. CONCLUSIONS There were 121 impacts (16.8%) above the rotational injury risk limit and 1 (0.1%) impact above the linear injury risk limit. The acceleration magnitude and number of head impacts in junior rugby union players were higher than those previously reported in similar age-group sports participants. The median linear acceleration for the under-9-year-old rugby players were similar to 7- to 8-year-old American football players, but lower than 9- to 12-year-old youth American football players. The median rotational accelerations measured were higher than the median and 95th percentiles in youth, high school, and collegiate American football players.

  9. Improvement of Similarity Measure:Pearson Product-Moment Correlation Coefficient%相似度的评价指标相关系数的改进

    刘永锁; 孟庆华; 陈蓉; 王健松; 蒋淑敏; 胡育筑


    Aim To study the reason of the insensitiveness of Pearson product-moment correlation coefficient as a similarity measure and the method to improve its sensitivity. Methods Experimental and simulated data sets were used. Results The distribution range of the data sets influences the sensitivity of Pearson product-moment correlation coefficient.Weighted Pearson product-moment correlation coefficient is more sensitive when the range of the data set is large. Conclusion Weighted Pearson product-moment correlation coefficient is necessary when the range of the data set is large.%目的研究相似度的评价指标:相关系数的灵敏度低的原因及其改进的方法.方法利用实验数据和模拟数据研究相关系数的灵敏度低的问题.结果相关系数的灵敏度受数据的分布范围的影响,在数据的分布范围宽时加权相关系数更灵敏.结论在数据的分布范围宽时有必要进行加权运算.

  10. Similarity measure, distance measure and entropy of interval-valued fuzzy soft sets%区间值模糊软集的相似度及距离和熵

    王玲; 秦克云; 刘雅雅


    基于直角坐标系,提出区间值模糊软集新的相似性度量,进而给出区间值模糊软集的距离度量.在此基础上,提出区间值模糊软集熵的公理化定义,给出区间值模糊软集熵的计算方法,讨论这些度量的基本性质.%Based on Cartesian coordinates,we present new similarity measures of interval-valued fuzzy soft sets.Several distance measures between interval-valued fuzzy soft sets also have been given.Furthermore,a new axiomatic definition of entropy for intervalvalued fuzzy soft sets is introduced,some formulas have been put forward to calculate the entropy of interval-valued fuzzy soft sets.The basic properties of these measures are analyzed.

  11. Conditioned Attraction, Similarity, and Evaluative Meaning. Language, Personality, Social, and Cross-Cultural Study and Measurement of the Human A-R-D (Motivational) System.

    Stalling, Richard B.

    Recent stimulus-response formulations have indicated that similarity between persons functions as an unconditioned stimulus (UCS) and that interpersonal attraction is a classically conditioned evaluative response. The thesis of this study is that similarity is a correlate of evaluative meaning and that the latter rather than the former is…

  12. SAR图像相干斑抑制中的像素相关性测量%The Pixel-similarity Measurement in SAR Image Despeckling

    李光廷; 杨亮; 黄平平; 禹卫东


      The Pixel Relativity (PR) measurement of SAR image, which is the key of the despeckling techniques based on weighted average, is researched in three aspects. Firstly, the rationality of ratio PR model is expounded, and two new ratio PR models, which are the LOG-domain Gaussian model and the pixel similarity probability model, are proposed. Meanwhile, the Probability Density Function (PDF) of SAR image and the PDF of the ratio between pixels are transformed into ratio PR models. Then, in order to evaluate the four ratio PR models, the weighted maximum likelihood filters are designed using the PR. Finally, a novel method, performed by calibrating the maximum location of the PR model, is introduced to improve the radiation preservation of those models whose maximum do not locate at 1. The effectiveness of the two proposed PR models and the approach to calibrate the maximum location of the PR model, are indicated by the theoretical analysis and experimental comparison.%  像素相关性(Pixel Relativity, PR)测量是权重平均相干斑抑制算法中的关键技术,该文从3个方面对SAR图像中的PR模型进行了研究。首先,该文论述了比值PR模型的合理性,提出了两种新的比值PR模型(对数高斯模型与像素相似概率模型),并将SAR图像概率密度函数与比值概率密度函数转化为比值PR模型;然后,为了对4种PR模型应用于相干斑抑制的性能进行比较,设计了基于PR的权重最大似然滤波器;最后,针对相关性最大值位置不为1的PR模型存在的辐射保持能力差的问题,提出了模型最大值位置校正的方法。理论分析与实验表明了提出模型及最大值位置校正方法的有效性。

  13. Finding patients using similarity measures in a rare diseases-oriented clinical data warehouse: Dr. Warehouse and the needle in the needle stack.

    Garcelon, Nicolas; Neuraz, Antoine; Benoit, Vincent; Salomon, Rémi; Kracker, Sven; Suarez, Felipe; Bahi-Buisson, Nadia; Hadj-Rabia, Smail; Fischer, Alain; Munnich, Arnold; Burgun, Anita


    In the context of rare diseases, it may be helpful to detect patients with similar medical histories, diagnoses and outcomes from a large number of cases with automated methods. To reduce the time to find new cases, we developed a method to find similar patients given an index case leveraging data from the electronic health records. We used the clinical data warehouse of a children academic hospital in Paris, France (Necker-Enfants Malades), containing about 400,000 patients. Our model was based on a vector space model (VSM) to compute the similarity distance between an index patient and all the patients of the data warehouse. The dimensions of the VSM were built upon Unified Medical Language System concepts extracted from clinical narratives stored in the clinical data warehouse. The VSM was enhanced using three parameters: a pertinence score (TF-IDF of the concepts), the polarity of the concept (negated/not negated) and the minimum number of concepts in common. We evaluated this model by displaying the most similar patients for five different rare diseases: Lowe Syndrome (LOWE), Dystrophic Epidermolysis Bullosa (DEB), Activated PI3K delta Syndrome (APDS), Rett Syndrome (RETT) and Dowling Meara (EBS-DM), from the clinical data warehouse representing 18, 103, 21, 84 and 7 patients respectively. The percentages of index patients returning at least one true positive similar patient in the Top30 similar patients were 94% for LOWE, 97% for DEB, 86% for APDS, 71% for EBS-DM and 99% for RETT. The mean number of patients with the exact same genetic diseases among the 30 returned patients was 51%. This tool offers new perspectives in a translational context to identify patients for genetic research. Moreover, when new molecular bases are discovered, our strategy will help to identify additional eligible patients for genetic screening. Copyright © 2017. Published by Elsevier Inc.

  14. Evaluating the Applicability of Data-Driven Dietary Patterns to Independent Samples with a Focus on Measurement Tools for Pattern Similarity.

    Castelló, Adela; Buijsse, Brian; Martín, Miguel; Ruiz, Amparo; Casas, Ana M; Baena-Cañada, Jose M; Pastor-Barriuso, Roberto; Antolín, Silvia; Ramos, Manuel; Muñoz, Monserrat; Lluch, Ana; de Juan-Ferré, Ana; Jara, Carlos; Lope, Virginia; Jimeno, María A; Arriola-Arellano, Esperanza; Díaz, Elena; Guillem, Vicente; Carrasco, Eva; Pérez-Gómez, Beatriz; Vioque, Jesús; Pollán, Marina


    Diet is a key modifiable risk for many chronic diseases, but it remains unclear whether dietary patterns from one study sample are generalizable to other independent populations. The primary objective of this study was to assess whether data-driven dietary patterns from one study sample are applicable to other populations. The secondary objective was to assess the validity of two criteria of pattern similarity. Six dietary patterns-Western (n=3), Mediterranean, Prudent, and Healthy- from three published studies on breast cancer were reconstructed in a case-control study of 973 breast cancer patients and 973 controls. Three more internal patterns (Western, Prudent, and Mediterranean) were derived from this case-control study's own data. Applicability was assessed by comparing the six reconstructed patterns with the three internal dietary patterns, using the congruence coefficient (CC) between pattern loadings. In cases where any pair met either of two commonly used criteria for declaring patterns similar (CC ≥0.85 or a statistically significant [Ppatterns was double-checked by comparing their associations to risk for breast cancer, to assess whether those two criteria of similarity are actually reliable. Five of the six reconstructed dietary patterns showed high congruence (CC >0.9) to their corresponding dietary pattern derived from the case-control study's data. Similar associations with risk for breast cancer were found in all pairs of dietary patterns that had high CC but not in all pairs of dietary patterns with statistically significant correlations. Similar dietary patterns can be found in independent samples. The P value of a correlation coefficient is less reliable than the CC as a criterion for declaring two dietary patterns similar. This study shows that diet scores based on a particular study are generalizable to other populations. Copyright © 2016 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.

  15. Learning Multi-modal Similarity

    McFee, Brian


    In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, e.g., nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of video. Integrating such heterogeneous data to form a holistic similarity space is therefore a key challenge to be overcome in many real-world applications. We present a novel multiple kernel learning technique for integrating heterogeneous data into a single, unified similarity space. Our algorithm learns an optimal ensemble of kernel transfor- mations which conform to measurements of human perceptual similarity, as expressed by relative comparisons. To cope with the ubiquitous problems of subjectivity and inconsistency in multi- media similarity, we develop graph-based techniques to filter similarity measurements, resulting in a simplified and robust training procedure.


    Lutsenko Y. V.


    Full Text Available The article discusses the application of automated system-cognitive analysis (ASC-analysis, its mathematical model is a system of information theory and implements, its software tools – intellectual system called "Eidos" for solving one of the important tasks of ampelography: to quantify the similarities and differences of different clones of grapes using contours of the leaves. To solve this task we perform the following steps: 1 digitization of scanned images of the leaves and creation their mathematical models; 2 formation mathematical models of specific leaves with the application of information theory; 3 modeling the generalized images of leaves of different clones on the basis of specific leaves (multiparameter typing; 4 verification of the model by identifying specific leaf images with generic clones, i.e., classes (system identification; 5 quantification of the similarities and differences of the clones, i.e. cluster-constructive analysis of generalized images of leaves of various clones. The specific shape of the contour of the leaf is regarded as noise information on the clone to which it relates, including information about the true shape of a leaf of this clone (clean signal and noise, which distort the real shape, due to the random influence of the environment. Software tools of ASA-analysis which is intellectual "Eidos" system provides the noise suppression and the detection of a signal about the true shape of a leaf of each clone on the basis of a number of noisy concrete examples of the leaves of this clone. This creates a single image of the shape of the leaf of each clone, independent of their specific implementations, i.e. "Eidos" of these images (in the sense of Plato - the prototype or archetype (in the Jungian sense of the images

  17. Similarities and Differences of the Soleus and Gastrocnemius H-reflexes during Varied Body Postures, Foot Positions, and Muscle Function: Multifactor Designs for Repeated Measures

    Sabbahi Mohamed A


    Full Text Available Abstract Background Although the soleus (Sol, medial gastrocnemius (MG, and lateral gastrocnemius (LG muscles differ in function, composition, and innervations, it is a common practice is to investigate them as single H-reflex recording. The purpose of this study was to compare H-reflex recordings between these three sections of the triceps surae muscle group of healthy participants while lying and standing during three different ankle positions. Methods The Sol, MG and LG muscles' H-reflexes were recorded from ten participants during prone lying and standing with the ankle in neutral, maximum dorsiflexion, and maximum plantarflexion positions. Four traces were averaged for each combination of conditions. Three-way ANOVAs (posture X ankle position X muscle with planned comparisons were used for statistical comparisons. Results Although the H-reflex in the three muscle sections differed in latency and amplitude, its dependency on posture and ankle position was similar. The H-reflex amplitudes and maximum H-reflex to M-response (H/M ratios were significantly 1 lower during standing compared to lying with the ankle in neutral, 2 greater during standing with the ankle in plantarflexion compared to neutral, and 3 less with the ankle in dorsiflexion compared to neutral during lying and standing for all muscles (p ≤ .05. Conclusion Varying demands are required for muscles activated during distinctly different postures and ankle movement tasks.

  18. Discrimination of membrane transporter protein types using K-nearest neighbor method derived from the similarity distance of total diversity measure.

    Zuo, Yong-Chun; Su, Wen-Xia; Zhang, Shi-Hua; Wang, Shan-Shan; Wu, Cheng-Yan; Yang, Lei; Li, Guang-Peng


    Membrane transporters play crucial roles in the fundamental cellular processes of living organisms. Computational techniques are very necessary to annotate the transporter functions. In this study, a multi-class K nearest neighbor classifier based on the increment of diversity (KNN-ID) was developed to discriminate the membrane transporter types when the increment of diversity (ID) was introduced as one of the novel similarity distances. Comparisons with multiple recently published methods showed that the proposed KNN-ID method outperformed the other methods, obtaining more than 20% improvement for overall accuracy. The overall prediction accuracy reached was 83.1%, when the K was selected as 2. The prediction sensitivity achieved 76.7%, 89.1%, 80.1% for channels/pores, electrochemical potential-driven transporters, primary active transporters, respectively. Discrimination and comparison between any two different classes of transporters further demonstrated that the proposed method is a potential classifier and will play a complementary role for facilitating the functional assignment of transporters.

  19. Temporary inhibition of dorsal or ventral hippocampus by muscimol: distinct effects on measures of innate anxiety on the elevated plus maze, but similar disruption of contextual fear conditioning.

    Zhang, Wei-Ning; Bast, Tobias; Xu, Yan; Feldon, Joram


    Studies in rats, involving hippocampal lesions and hippocampal drug infusions, have implicated the hippocampus in the modulation of anxiety-related behaviors and conditioned fear. The ventral hippocampus is considered to be more important for anxiety- and fear-related behaviors than the dorsal hippocampus. In the present study, we compared the role of dorsal and ventral hippocampus in innate anxiety and classical fear conditioning in Wistar rats, examining the effects of temporary pharmacological inhibition by the GABA-A agonist muscimol (0.5 ug/0.5 ul/side) in the elevated plus maze and on fear conditioning to a tone and the conditioning context. In the elevated plus maze, dorsal and ventral hippocampal muscimol caused distinct behavioral changes. The effects of ventral hippocampal muscimol were consistent with suppression of locomotion, possibly accompanied by anxiolytic effects, whereas the pattern of changes caused by dorsal hippocampal muscimol was consistent with anxiogenic effects. In contrast, dorsal and ventral hippocampal muscimol caused similar effects in the fear conditioning experiments, disrupting contextual, but not tone, fear conditioning.

  20. Equilibrium drug solubility measurements in 96-well plates reveal similar drug solubilities in phosphate buffer pH 6.8 and human intestinal fluid.

    Heikkilä, Tiina; Karjalainen, Milja; Ojala, Krista; Partola, Kirsi; Lammert, Frank; Augustijns, Patrick; Urtti, Arto; Yliperttula, Marjo; Peltonen, Leena; Hirvonen, Jouni


    This study was conducted to develop a high throughput screening (HTS) method for the assessment of equilibrium solubility of drugs. Solid-state compounds were precipitated from methanol in 96-well plates, in order to eliminate the effect of co-solvent. Solubility of twenty model drugs was analyzed in water and aqueous solutions (pH 1.2 and 6.8) in 96-well plates and in shake-flasks (UV detection). The results obtained with the 96-well plate method correlated well (R(2)=0.93) between the shake-flask and 96-well plates over the wide concentration scale of 0.002-169.2mg/ml. Thereafter, the solubility tests in 96-well plates were performed using fasted state human intestinal fluid (HIF) from duodenum of healthy volunteers. The values of solubility were similar in phosphate buffer solution (pH 6.8) and HIF over the solubility range of 10(2)-10(5)μg/ml. The new 96-well plate method is useful for the screening of equilibrium drug solubility during the drug discovery process and it also allows the use of human intestinal fluid in solubility screening.

  1. Geodetic measurements reveal similarities between post–Last Glacial Maximum and present-day mass loss from the Greenland ice sheet

    Khan, Shfaqat Abbas; Sasgen, Ingo; Bevis, Michael;


    level for centuries to come. Our new deglaciation history and GIA uplift estimates suggest that studies that use the Gravity Recovery and Climate Experiment satellite mission to infer present-day changes in the GrIS may have erroneously corrected for GIA and underestimated the mass loss by about 20...... and ocean load changes occurring since the Last Glacial Maximum (LGM; ~21 thousand years ago) and may be used to constrain the GrIS deglaciation history. We use data from the Greenland Global Positioning System network to directly measure GIA and estimate basinwide mass changes since the LGM. Unpredicted...

  2. A comparison of similar aerosol measurements made on the NASA P3-B, DC-8, and NSF C-130 aircraft during TRACE-P and ACE-Asia

    Moore, K. G.; Clarke, A. D.; Kapustin, V. N.; McNaughton, C.; Anderson, B. E.; Winstead, E. L.; Weber, R.; Ma, Y.; Lee, Y. N.; Talbot, R.; Dibb, J.; Anderson, T.; Doherty, S.; Covert, D.; Rogers, D.


    Two major aircraft experiments occurred off the Pacific coast of Asia during spring 2001: the NASA sponsored Transport and Chemical Evolution over the Pacific (TRACE-P) and the National Science Foundation (NSF) sponsored Aerosol Characterization Experiment-Asia (ACE-Asia). Both experiments studied emissions from the Asian continent (biomass burning, urban/industrial pollution, and dust). TRACE-P focused on trace gases and aerosol during March/April and was based primarily in Hong Kong and Yokota Air Force Base, Japan, and involved two aircraft: the NASA DC-8 and the NASA P3-B. ACE-Asia focused on aerosol and radiation during April/May and was based in Iwakuni Marine Corps Air Station, Japan, and involved the NSF C-130. This paper compares aerosol measurements from these aircraft including aerosol concentrations, size distributions (and integral properties), chemistry, and optical properties. Best overall agreement (generally within RMS instrumental uncertainty) was for physical properties of the submircron aerosol, including condensation nuclei concentrations, scattering coefficients, and differential mobility analyzer and optical particle counter (OPC) accumulation mode size distributions. Larger differences (typically outside of the RMS uncertainty) were often observed for parameters related to the supermicron aerosols (total scattering and absorption coefficients, coarse mode Forward Scattering Spectrometer Probe and OPC size distributions/integral properties, and soluble chemical species usually associated with the largest particles, e.g., Na+, Cl-, Ca2+, and Mg2+), where aircraft sampling is more demanding. Some of the observed differences reflect different inlets (e.g., low-turbulence inlet enhancement of coarse mode aerosol), differences in sampling lines, and instrument configuration and design. Means and variances of comparable measurements for horizontal legs were calculated, and regression analyses were performed for each platform and allow for an

  3. Distance learning for similarity estimation

    Yu, J.; Amores, J.; Sebe, N.; Radeva, P.; Tian, Q.


    In this paper, we present a general guideline to find a better distance measure for similarity estimation based on statistical analysis of distribution models and distance functions. A new set of distance measures are derived from the harmonic distance, the geometric distance, and their generalized

  4. Inequalities between similarities for numerical data

    Warrens, Matthijs J.


    Similarity measures are entities that can be used to quantify the similarity between two vectors with real numbers. We present inequalities between seven well known similarities. The inequalities are valid if the vectors contain non-negative real numbers.

  5. Geodetic measurements reveal similarities between post-Last Glacial Maximum and present-day mass loss from the Greenland ice sheet.

    Khan, Shfaqat A; Sasgen, Ingo; Bevis, Michael; van Dam, Tonie; Bamber, Jonathan L; Wahr, John; Willis, Michael; Kjær, Kurt H; Wouters, Bert; Helm, Veit; Csatho, Beata; Fleming, Kevin; Bjørk, Anders A; Aschwanden, Andy; Knudsen, Per; Munneke, Peter Kuipers


    Accurate quantification of the millennial-scale mass balance of the Greenland ice sheet (GrIS) and its contribution to global sea-level rise remain challenging because of sparse in situ observations in key regions. Glacial isostatic adjustment (GIA) is the ongoing response of the solid Earth to ice and ocean load changes occurring since the Last Glacial Maximum (LGM; ~21 thousand years ago) and may be used to constrain the GrIS deglaciation history. We use data from the Greenland Global Positioning System network to directly measure GIA and estimate basin-wide mass changes since the LGM. Unpredicted, large GIA uplift rates of +12 mm/year are found in southeast Greenland. These rates are due to low upper mantle viscosity in the region, from when Greenland passed over the Iceland hot spot about 40 million years ago. This region of concentrated soft rheology has a profound influence on reconstructing the deglaciation history of Greenland. We reevaluate the evolution of the GrIS since LGM and obtain a loss of 1.5-m sea-level equivalent from the northwest and southeast. These same sectors are dominating modern mass loss. We suggest that the present destabilization of these marine-based sectors may increase sea level for centuries to come. Our new deglaciation history and GIA uplift estimates suggest that studies that use the Gravity Recovery and Climate Experiment satellite mission to infer present-day changes in the GrIS may have erroneously corrected for GIA and underestimated the mass loss by about 20 gigatons/year.

  6. Geodetic measurements reveal similarities between post–Last Glacial Maximum and present-day mass loss from the Greenland ice sheet

    Khan, Shfaqat A.; Sasgen, Ingo; Bevis, Michael; van Dam, Tonie; Bamber, Jonathan L.; Wahr, John; Willis, Michael; Kjær, Kurt H.; Wouters, Bert; Helm, Veit; Csatho, Beata; Fleming, Kevin; Bjørk, Anders A.; Aschwanden, Andy; Knudsen, Per; Munneke, Peter Kuipers


    Accurate quantification of the millennial-scale mass balance of the Greenland ice sheet (GrIS) and its contribution to global sea-level rise remain challenging because of sparse in situ observations in key regions. Glacial isostatic adjustment (GIA) is the ongoing response of the solid Earth to ice and ocean load changes occurring since the Last Glacial Maximum (LGM; ~21 thousand years ago) and may be used to constrain the GrIS deglaciation history. We use data from the Greenland Global Positioning System network to directly measure GIA and estimate basin-wide mass changes since the LGM. Unpredicted, large GIA uplift rates of +12 mm/year are found in southeast Greenland. These rates are due to low upper mantle viscosity in the region, from when Greenland passed over the Iceland hot spot about 40 million years ago. This region of concentrated soft rheology has a profound influence on reconstructing the deglaciation history of Greenland. We reevaluate the evolution of the GrIS since LGM and obtain a loss of 1.5-m sea-level equivalent from the northwest and southeast. These same sectors are dominating modern mass loss. We suggest that the present destabilization of these marine-based sectors may increase sea level for centuries to come. Our new deglaciation history and GIA uplift estimates suggest that studies that use the Gravity Recovery and Climate Experiment satellite mission to infer present-day changes in the GrIS may have erroneously corrected for GIA and underestimated the mass loss by about 20 gigatons/year. PMID:27679819

  7. An interference-free and label-free sandwich-type magnetic silicon microsphere -rGO-based probe for fluorescence detection of microRNA.

    Li, Shiyu; He, Kui; Liao, Rong; Chen, Chunyan; Chen, Xiaoming; Cai, Changqun


    An interference-free and label-free sensing platform was developed for the highly sensitive detection of microRNA-21 (miRNA-21) in vitro by magnetic silicon microsphere (MNP)-reduced graphene oxide (rGO)-based sandwich probe. In this method, DNA capture probes (P1) were connected with MNPs at the 5' end and hybridized with completely complementary target miRNA. Subsequently, rGO was retained and induced the fluorescence quenching in the supernatant. Through the magnetic separation, the supernatant environment was simplified and the interference to analytical signal was eliminated. When DNA capture probe-modified magnetic silicon microspheres (MNP-P1) were adsorbed through rGO in the absence of a target and formed a sandwich structure, the formed nanostructure was easily removed from the solution by a magnetic field and the fluorescence intensity was maximally recovered. This proposed strategy, which both overcame the expensive and cumbersome fluorescent labeling, and eliminated interference to analytical signal for guaranteeing high signal-to-background ratio, exhibited high sensitivity with a detection limit as low as 0.098nM and special selectivity toward miRNA-21. The method was potentially applicable for not only detection of miRNA-21 but also various biomarker analyses just by changing capture probes. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Distance learning for similarity estimation.

    Yu, Jie; Amores, Jaume; Sebe, Nicu; Radeva, Petia; Tian, Qi


    In this paper, we present a general guideline to find a better distance measure for similarity estimation based on statistical analysis of distribution models and distance functions. A new set of distance measures are derived from the harmonic distance, the geometric distance, and their generalized variants according to the Maximum Likelihood theory. These measures can provide a more accurate feature model than the classical Euclidean and Manhattan distances. We also find that the feature elements are often from heterogeneous sources that may have different influence on similarity estimation. Therefore, the assumption of single isotropic distribution model is often inappropriate. To alleviate this problem, we use a boosted distance measure framework that finds multiple distance measures which fit the distribution of selected feature elements best for accurate similarity estimation. The new distance measures for similarity estimation are tested on two applications: stereo matching and motion tracking in video sequences. The performance of boosted distance measure is further evaluated on several benchmark data sets from the UCI repository and two image retrieval applications. In all the experiments, robust results are obtained based on the proposed methods.

  9. Similarity transformations of MAPs

    Andersen Allan T.


    Full Text Available We introduce the notion of similar Markovian Arrival Processes (MAPs and show that the event stationary point processes related to two similar MAPs are stochastically equivalent. This holds true for the time stationary point processes too. We show that several well known stochastical equivalences as e.g. that between the H 2 renewal process and the Interrupted Poisson Process (IPP can be expressed by the similarity transformations of MAPs. In the appendix the valid region of similarity transformations for two-state MAPs is characterized.

  10. Asthma and Rhinitis Are Associated with Less Objectively-Measured Moderate and Vigorous Physical Activity, but Similar Sport Participation, in Adolescent German Boys: GINIplus and LISAplus Cohorts

    Berdel, Dietrich; Bauer, Carl-Peter; Koletzko, Sibylle; Nowak, Dennis; Heinrich, Joachim; Schulz, Holger


    Introduction Physical activity (PA) protects against most noncommunicable diseases and has been associated with decreased risk of allergic phenotype, which is increasing worldwide. However, the association is not always present; furthermore it is not clear whether it is strongest for asthma, rhinitis, symptoms of these, or atopic sensitization; which sex is most affected; or whether it can be explained by either avoidance of sport or exacerbation of symptoms by exercise. Interventions are thus difficult to target. Methods PA was measured by one-week accelerometry in 1137 Germans (mean age 15.6 years, 47% boys) from the GINIplus and LISAplus birth cohorts, and modeled as a correlate of allergic symptoms, sensitization, or reported doctor-diagnosed asthma or rhinitis. Results 8.3% of children had asthma, of the remainder 7.9% had rhinitis, and of the remainder 32% were sensitized to aero-allergens (atopic). 52% were lung-healthy controls. Lung-healthy boys and girls averaged 46.4 min and 37.8 min moderate-to-vigorous PA per day, of which 14.6 and 11.4 min was vigorous. PA in allergic girls was not altered, but boys with asthma got 13% less moderate and 29% less vigorous PA, and those with rhinitis with 13% less moderate PA, than lung-healthy boys. Both sexes participated comparably in sport (70 to 84%). Adolescents with wheezing (up to 68%, in asthma) and/or nose/eye symptoms (up to 88%, in rhinitis) were no less active. Conclusions We found that asthma and rhinitis, but not atopy, were independently associated with low PA in boys, but not in girls. These results indicate that allergic boys remain a high-risk group for physical inactivity even if they participate comparably in sport. Research into the link between PA and allergy should consider population-specific and sex-specific effects, and clinicians, parents, and designers of PA interventions should specifically address PA in allergic boys to ensure full participation. PMID:27560942

  11. 一种有效的多元时间序列相似性度量算法分析%The Analysis for an Effective Algorithm of Similarity Measurement of Multivariate Time Series

    郭小芳; 李锋; 刘庆华


    为验证Eros距离对MTS数据集相似性度量的有效性,针对不同MTS数据集进行了相似性搜索实验研究.结果表明:相对于其他的传统多元时间序列相似性度量,基于Eros距离的相似性度量方法比传统的方法在查全率-查准率上具有更大的优越性.%In order to show the validity of Eros for similarity search on MTS datasets, several experiments were performed on different datasets. The experimental results show that the method of similarity measurement based on Eros distance has superiority in Recall-Precision as compared to the traditional similarity measurements for MTS datasets.

  12. Clustering by Pattern Similarity

    Hai-xun Wang; Jian Pei


    The task of clustering is to identify classes of similar objects among a set of objects. The definition of similarity varies from one clustering model to another. However, in most of these models the concept of similarity is often based on such metrics as Manhattan distance, Euclidean distance or other Lp distances. In other words, similar objects must have close values in at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. The new similarity concept models a wide range of applications. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, because it is able to capture not only the closeness of values of certain leading indicators but also the closeness of (purchasing, browsing, etc.) patterns exhibited by the customers. In addition to the novel similarity model, this paper also introduces an effective and efficient algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its performance.

  13. Judgments of brand similarity

    Bijmolt, THA; Wedel, M; Pieters, RGM; DeSarbo, WS

    This paper provides empirical insight into the way consumers make pairwise similarity judgments between brands, and how familiarity with the brands, serial position of the pair in a sequence, and the presentation format affect these judgments. Within the similarity judgment process both the

  14. New Similarity Functions

    Yazdani, Hossein; Ortiz-Arroyo, Daniel; Kwasnicka, Halina


    In data science, there are important parameters that affect the accuracy of the algorithms used. Some of these parameters are: the type of data objects, the membership assignments, and distance or similarity functions. This paper discusses similarity functions as fundamental elements in membership...

  15. Judgments of brand similarity

    Bijmolt, THA; Wedel, M; Pieters, RGM; DeSarbo, WS


    This paper provides empirical insight into the way consumers make pairwise similarity judgments between brands, and how familiarity with the brands, serial position of the pair in a sequence, and the presentation format affect these judgments. Within the similarity judgment process both the formatio

  16. New Similarity Functions

    Yazdani, Hossein; Ortiz-Arroyo, Daniel; Kwasnicka, Halina


    In data science, there are important parameters that affect the accuracy of the algorithms used. Some of these parameters are: the type of data objects, the membership assignments, and distance or similarity functions. This paper discusses similarity functions as fundamental elements in membership...... assignments. The paper introduces Weighted Feature Distance (WFD), and Prioritized Weighted Feature Distance (PWFD), two new distance functions that take into account the diversity in feature spaces. WFD functions perform better in supervised and unsupervised methods by comparing data objects on their feature...... spaces, in addition to their similarity in the vector space. Prioritized Weighted Feature Distance (PWFD) works similarly as WFD, but provides the ability to give priorities to desirable features. The accuracy of the proposed functions are compared with other similarity functions on several data sets...

  17. Domain similarity based orthology detection.

    Bitard-Feildel, Tristan; Kemena, Carsten; Greenwood, Jenny M; Bornberg-Bauer, Erich


    Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationally feasible in a reasonable amount of time. We propose to speed up the detection of orthologous proteins by using strings of domains to characterize the proteins. We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain content similarity, and new software, named porthoDom. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain content similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDom, the wrapper developed for proteinortho. porthoDom makes use of domain content similarity measures to group proteins together before searching for orthologs. By using domains instead of amino acid sequences, the reduction of the search space decreases the computational complexity of an all-against-all sequence comparison. We demonstrate that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. porthoDom has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho. The implementation of porthoDom is released using python and C++ languages and is available under the GNU GPL licence 3 at .

  18. Similarity Measuring of Spatial Topological Relations Based on Topological Predication%基于拓扑谓词的空间拓扑关系相似性度量模型与应用

    安晓亚; 杨云; 刘平芝


    Similarity measuring of spatial topological relations is the important part of similarity measuring of spatial data, and also is the basic and key technology of spatial data retrieval and spatial scene query. Its meaning is to measure the similarity of topological relationships between multiple data entities in different sources, differ-ent sources scales of the same region. Common topological relations have been abstracted into nine topological predications. Current researches mainly focus on the topological relations similarity measuring between two sim-ple entities, but mostly do not involve topological relations similarity measuring for the entire data sets, as well as the complex line targets. In this paper we present a method of measuring simple topological relations based on 9- intersection matrix, that is, the distance between two 9- intersection matrixes as the simple topological rela-tions distance to measure the differences between two simple topological relations, so that we can get a simple to-pological relations similarity. Then considering the quantity similarity and dimension similarity between entity sets, we can get the simple topological relations similarity measuring model between entity sets. In this paper we establish a similarity measuring model of complex topological predication by using the strategy of decompos-ing-combination based on the simple topological relations similarity measuring model. Firstly, the complex topol-ogy relationship is broken down into a number of local topological relationships. Then through a combination of local topological relations similarity, we get the complex topology relationship similarity measuring model. At last, the method is used to measure similarity of different scales and different sources data. Experimental results show that the selection of cartographic generalization impact the topological relations similarity between entity sets mostly, and other factors with smaller impacts to the experimental

  19. Similar component analysis

    ZHANG Hong; WANG Xin; LI Junwei; CAO Xianguang


    A new unsupervised feature extraction method called similar component analysis (SCA) is proposed in this paper. SCA method has a self-aggregation property that the data objects will move towards each other to form clusters through SCA theoretically,which can reveal the inherent pattern of similarity hidden in the dataset. The inputs of SCA are just the pairwise similarities of the dataset,which makes it easier for time series analysis due to the variable length of the time series. Our experimental results on many problems have verified the effectiveness of SCA on some engineering application.

  20. Efficient Video Similarity Measurement and Search


    sequence matching techniques for video copy detection,” in Proceedings of SPIE – Storage and Retrieval for Media Databases 2002, San Jose , CA, January 2002...Proceedings of the Storage and Retrieval for Media Datbases 2001, San Jose , USA, jan 2001, vol. 4315, pp. 188–195. [19] D. Adjeroh, I. King, and M.C. Lee, “A... Vasconcelos , “On the complexity of probabilistic image retrieval,” in Pro- ceedings Eighth IEEE International Conference on Computer Vision, Vancou- ver, B.C

  1. Gender similarities and differences.

    Hyde, Janet Shibley


    Whether men and women are fundamentally different or similar has been debated for more than a century. This review summarizes major theories designed to explain gender differences: evolutionary theories, cognitive social learning theory, sociocultural theory, and expectancy-value theory. The gender similarities hypothesis raises the possibility of theorizing gender similarities. Statistical methods for the analysis of gender differences and similarities are reviewed, including effect sizes, meta-analysis, taxometric analysis, and equivalence testing. Then, relying mainly on evidence from meta-analyses, gender differences are reviewed in cognitive performance (e.g., math performance), personality and social behaviors (e.g., temperament, emotions, aggression, and leadership), and psychological well-being. The evidence on gender differences in variance is summarized. The final sections explore applications of intersectionality and directions for future research.

  2. Compression-based Similarity

    Vitanyi, Paul M B


    First we consider pair-wise distances for literal objects consisting of finite binary files. These files are taken to contain all of their meaning, like genomes or books. The distances are based on compression of the objects concerned, normalized, and can be viewed as similarity distances. Second, we consider pair-wise distances between names of objects, like "red" or "christianity." In this case the distances are based on searches of the Internet. Such a search can be performed by any search engine that returns aggregate page counts. We can extract a code length from the numbers returned, use the same formula as before, and derive a similarity or relative semantics between names for objects. The theory is based on Kolmogorov complexity. We test both similarities extensively experimentally.

  3. Similarity or difference?

    Villadsen, Anders Ryom


    While the organizational structures and strategies of public organizations have attracted substantial research attention among public management scholars, little research has explored how these organizational core dimensions are interconnected and influenced by pressures for similarity....... In this paper I address this topic by exploring the relation between expenditure strategy isomorphism and structure isomorphism in Danish municipalities. Different literatures suggest that organizations exist in concurrent pressures for being similar to and different from other organizations in their field......-shaped relation exists between expenditure strategy isomorphism and structure isomorphism in a longitudinal quantitative study of Danish municipalities....

  4. Segmentation Similarity and Agreement

    Fournier, Chris


    We propose a new segmentation evaluation metric, called segmentation similarity (S), that quantifies the similarity between two segmentations as the proportion of boundaries that are not transformed when comparing them using edit distance, essentially using edit distance as a penalty function and scaling penalties by segmentation size. We propose several adapted inter-annotator agreement coefficients which use S that are suitable for segmentation. We show that S is configurable enough to suit a wide variety of segmentation evaluations, and is an improvement upon the state of the art. We also propose using inter-annotator agreement coefficients to evaluate automatic segmenters in terms of human performance.

  5. Incremental Similarity and Turbulence

    Barndorff-Nielsen, Ole E.; Hedevang, Emil; Schmiegel, Jürgen

    This paper discusses the mathematical representation of an empirically observed phenomenon, referred to as Incremental Similarity. We discuss this feature from the viewpoint of stochastic processes and present a variety of non-trivial examples, including those that are of relevance for turbulence...

  6. Roget's Thesaurus and Semantic Similarity

    Jarmasz, Mario


    We have implemented a system that measures semantic similarity using a computerized 1987 Roget's Thesaurus, and evaluated it by performing a few typical tests. We compare the results of these tests with those produced by WordNet-based similarity measures. One of the benchmarks is Miller and Charles' list of 30 noun pairs to which human judges had assigned similarity measures. We correlate these measures with those computed by several NLP systems. The 30 pairs can be traced back to Rubenstein and Goodenough's 65 pairs, which we have also studied. Our Roget's-based system gets correlations of .878 for the smaller and .818 for the larger list of noun pairs; this is quite close to the .885 that Resnik obtained when he employed humans to replicate the Miller and Charles experiment. We further evaluate our measure by using Roget's and WordNet to answer 80 TOEFL, 50 ESL and 300 Reader's Digest questions: the correct synonym must be selected amongst a group of four words. Our system gets 78.75%, 82.00% and 74.33% of ...

  7. Similarity of atoms in molecules

    Cioslowski, J.; Nanayakkara, A. (Florida State Univ., Tallahassee, FL (United States))


    Similarity of atoms in molecules is quantitatively assessed with a measure that employs electron densities within respective atomic basins. This atomic similarity measure does not rely on arbitrary assumptions concerning basis functions or 'atomic orbitals', is relatively inexpensive to compute, and has straightforward interpretation. Inspection of similarities between pairs of carbon, hydrogen, and fluorine atoms in the CH[sub 4], CH[sub 3]F, CH[sub 2]F[sub 2], CHF[sub 3], CF[sub 4], C[sub 2]H[sub 2], C[sub 2]H[sub 4], and C[sub 2]H[sub 6] molecules, calculated at the MP2/6-311G[sup **] level of theory, reveals that the atomic similarity is greatly reduced by a change in the number or the character of ligands (i.e. the atoms with nuclei linked through bond paths to the nucleus of the atom in question). On the other hand, atoms with formally identical (i.e. having the same nuclei and numbers of ligands) ligands resemble each other to a large degree, with the similarity indices greater than 0.95 for hydrogens and 0.99 for non-hydrogens. 19 refs., 6 tabs.

  8. A Text Similarity Measurement Combining Word Semantic Information with TF-IDF Method%一种结合词项语义信息和TF-IDF方法的文本相似度量方法

    黄承慧; 印鉴; 侯昉


    传统的文本相似度量方法大多采用TF-IDF方法把文本建模为词频向量,利用余弦相似度量等方法计算文本之间的相似度.这些方法忽略了文本中词项的语义信息.改进的基于语义的文本相似度量方法在传统词频向量中扩充了语义相似的词项,进一步增加了文本表示向量的维度,但不能很好地反映两篇文本之间的相似程度.文中在TF-IDF模型基础上分析文本中重要词汇的语义信息,提出了一种新的文本相似度量方法.该方法首先应用自然语言处理技术对文本进行预处理,然后利用TF-IDF方法寻找文本中具有较高TF-IDF值的重要词项.借助外部词典分析词项之间的语义相似度,结合该文提出的词项相似度加权树以及文本语义相似度定义计算两篇文本之间的相似度.最后利用文本相似度在基准文本数据集合上进行聚类实验.实验结果表明文中提出的方法在基于F-度量值标准上优于TF-IDF以及另一种基于词项语义相似性的方法.%Traditional text similarity measurements use TF-IDF method to model text documents as term frequency vectors, and compute similarity between text documents by using cosine similarity. These methods ignore semantic information of text documents, and semantic information enhanced methods distinguish between text documents poorly because extended vectors with semantic similar terms aggravate the curse of dimensionality. This paper proposes a similarity measurement, which is based on TF-IDF method, and analyzes similarity between important terms in text documents. This approach uses NLP technology to pre-process text, and uses TF-IDF method to filter those key terms that have higher TF-IDF value than other common terms. With the proposed data structure TSWT (Term Similarity Weight Tree) and the definition of semantic similarity, this paper resolves the semantic information of those key terms to compute similarities between text

  9. Parameters Correlation and optimization in Text Similarity Measurement%文本相似性度量中参数相关性与优化配置研究

    张祖平; 徐昕; 龙军; 袁鑫攀


    针对文本相似性度量中的相似度阈值、准确率、召回率、shingle滑动窗口大小、shingle权重系数和文本属性等参数相互影响、关系复杂的问题,研究了这些参数之间的相关性,并结合实际应用需求,提出各参数可优化配置的建议,分析与设计了相似度阈值可适应文本篇幅属性的相似性度量算法.通过某基金2009年的7378个项目申请书的比对分析,结果表明:提出的算法不但适用于大规模的文本集合,而且在短小的文本集合中进行相似性度量也具有很高的应用价值,其准确率和召回率均可高达95%以上.%Parameters in text similarity measurement such as similarity threshold, precision, recall rate, size of shingle moving window , shingle weighted coefficient and text attributes are interrelated and their relationship are complicated. Based on the analysis of parameter correlations and practical requirements, we suggest optimized parameter configurations and design a similarity measurement algorithm to adjust the similarity threshold to the text length contribute. The algorithm is applied to the text similarity analysis of 7378 proposals for some fund in 2009. The results demonstrate that, no matter the text length is long or short, the algorithm is so efficient that the precision and recall rate are both higher than 95%.

  10. Similarity Measures for Tree Based Network:A Classification Perspective%树型网络相似性度量方法研究:一个分类视角1)

    李雪琴; 李聪; 马丽; 梁昌勇


    Similarity measures of tree based network are widely used in various areas such as information retrieval and data mining. A comparative study was done based on current research achievements of similarity measures of tree based network. Firstly we classified tree based network into two types, i.e. ordered tree and unordered tree. And then the similarity measures of ordered tree were classified into four categories, including operating strategy based, decomposition strategy based, path comparison based and node comparison based methods;meanwhile, the similarity measures of unordered tree were classified into two categories, including bilateral matching method and largest public subtree method. According to the abovementioned similarity measures, related classic algorithms and subsequent optimized algorithms were reviewed detailedly. Furthermore, the processing objects, principles, advantages, disadvantages, applicable scopes, requirements and reasons of these algorithms were summarized. Finally we indicated several future research topics.%树型网络的相似性度量方法在信息检索、数据挖掘等众多领域应用广泛。针对现有研究成果进行比较研究,在将树型网络划分为有序树和无序树的基础上,进一步将有序树的相似性度量方法归纳为基于操作策略(operating strategy)、基于分解策略(decomposition strategy)、基于路径比较(path comparison)、基于节点比较(node comparison)四大类;将无序树的相似性度量方法归纳为双边匹配(bilateral matching)法、最大公共子树(largest public subtree)法两大类;对于上述每类相似性度量方法,通过分析相关经典算法及后续优化算法,总结了各类相似性度量方法的处理对象、原理、优缺点、适用范围、领域应用要求及适用原因。最后探讨了本领域的未来研究方向。

  11. Interval-valued hesitant fuzzy entropy and interval-valued hesitant fuzzy similarity measures%区间犹豫模糊熵和区间犹豫模糊相似度



    Based on the concept of hesitant fuzzy entropy, this paper introduces the concepts of entropy and similarity measures for interval-valued hesitant fuzzy information, and discusses their relationships. Firstly, the axiomatic definition of entropy for interval-valued hesitant fuzzy sets is proposed. Two entropy measure formulas are further developed, and they satisfies four axiomatic requirements of interval-valued hesitant fuzzy entropy. Then, the paper presents the concept of the interval-valued hesitant fuzzy weighted entropy depending on the interval-valued hesitant fuzzy entropy. Finally, the concept of interval-valued hesitant fuzzy similarity measures is given, and it studies the relationships between inter-val-valued hesitant fuzzy similarity measures and interval-valued hesitant fuzzy entropy.%基于犹豫模糊熵的概念,提出了区间犹豫模糊熵和相似度的概念,同时研究了它们之间的相互关系。给出了区间犹豫模糊熵的公理化定义,在此基础上构造了两种形式的熵测度公式,并且证明了它们满足区间犹豫模糊熵的四条公理化准则;依据区间犹豫模糊熵引入了区间犹豫模糊加权熵的概念;提出了区间犹豫模糊相似度的概念,并且研究了区间犹豫模糊环境下的熵和相似度之间的关系。

  12. More Similar Than Different

    Pedersen, Mogens Jin


    What role do employee features play into the success of different personnel management practices for serving high performance? Using data from a randomized survey experiment among 5,982 individuals of all ages, this article examines how gender conditions the compliance effects of different...... incentive treatments—each relating to the basic content of distinct types of personnel management practices. The findings show that males and females are more similar than different in terms of the incentive treatments’ effects: Significant average effects are found for three out of five incentive...

  13. Similar dissection of sets

    Akiyama, Shigeki; Okazaki, Ryotaro; Steiner, Wolfgang; Thuswaldner, Jörg


    In 1994, Martin Gardner stated a set of questions concerning the dissection of a square or an equilateral triangle in three similar parts. Meanwhile, Gardner's questions have been generalized and some of them are already solved. In the present paper, we solve more of his questions and treat them in a much more general context. Let $D\\subset \\mathbb{R}^d$ be a given set and let $f_1,...,f_k$ be injective continuous mappings. Does there exist a set $X$ such that $D = X \\cup f_1(X) \\cup ... \\cup f_k(X)$ is satisfied with a non-overlapping union? We prove that such a set $X$ exists for certain choices of $D$ and $\\{f_1,...,f_k\\}$. The solutions $X$ often turn out to be attractors of iterated function systems with condensation in the sense of Barnsley. Coming back to Gardner's setting, we use our theory to prove that an equilateral triangle can be dissected in three similar copies whose areas have ratio $1:1:a$ for $a \\ge (3+\\sqrt{5})/2$.

  14. Quantifying Similarity in Seismic Polarizations

    Eaton, D. W. S.; Jones, J. P.; Caffagni, E.


    Measuring similarity in seismic attributes can help identify tremor, low S/N signals, and converted or reflected phases, in addition to diagnosing site noise and sensor misalignment in arrays. Polarization analysis is a widely accepted method for studying the orientation and directional characteristics of seismic phases via. computed attributes, but similarity is ordinarily discussed using qualitative comparisons with reference values. Here we introduce a technique for quantitative polarization similarity that uses weighted histograms computed in short, overlapping time windows, drawing on methods adapted from the image processing and computer vision literature. Our method accounts for ambiguity in azimuth and incidence angle and variations in signal-to-noise (S/N) ratio. Using records of the Mw=8.3 Sea of Okhotsk earthquake from CNSN broadband sensors in British Columbia and Yukon Territory, Canada, and vertical borehole array data from a monitoring experiment at Hoadley gas field, central Alberta, Canada, we demonstrate that our method is robust to station spacing. Discrete wavelet analysis extends polarization similarity to the time-frequency domain in a straightforward way. Because histogram distance metrics are bounded by [0 1], clustering allows empirical time-frequency separation of seismic phase arrivals on single-station three-component records. Array processing for automatic seismic phase classification may be possible using subspace clustering of polarization similarity, but efficient algorithms are required to reduce the dimensionality.

  15. 带权相似度度量方法及其在光谱异常判定中的应用%Method of weighted similarity measurement and its application in the spectral outlier determination

    唐天彪; 杨辉华; 梁晓智; 郭拓; 李灵巧; 罗国安


    In order to detect the spectral outliers and discriminate their causes, this paper proposes a weighted similarity measure method based on the similarity theory, using spectral similarity to discriminate the abnormal spec-trums. Taking the Near-Infrared Spectral (NIRS) analysis in the process of Traditional Chinese Medicine production for example, and performing the determination of spectral outliers by combining the proposed method and the similarity of spectrum in feature wavelength range. Experiment shows that the proposed method is more sensitive than other ones (correlation coefficient and vector cosine similarity) , can noticeably reflect spectral changes in the critical wavelength range, and the results of determination coincide with the actual situation very well, so this method has strong practicability in the process of online near-infrared monitoring.%为了检测异常光谱以及判定异常光谱产生的原因,从相似学原理出发,提出带权相似度度量方法,利用光谱的相似度来判别异常光谱.以中药生产过程中的近红外光谱分析为例,利用带权相似度方法并结合特征谱段的相似性,对异常光谱进行判定.实验结果表明,相比常用的相关系数法、夹角余弦法,带权相似度法灵敏性更高,更能反映关键波长范围内的谱图变化,其判定结果与实际情况重合性较好,从而在近红外在线监测过程中具有很强的实用性.

  16. Similarity transformed semiclassical dynamics

    Van Voorhis, Troy; Heller, Eric J.


    In this article, we employ a recently discovered criterion for selecting important contributions to the semiclassical coherent state propagator [T. Van Voorhis and E. J. Heller, Phys. Rev. A 66, 050501 (2002)] to study the dynamics of many dimensional problems. We show that the dynamics are governed by a similarity transformed version of the standard classical Hamiltonian. In this light, our selection criterion amounts to using trajectories generated with the untransformed Hamiltonian as approximate initial conditions for the transformed boundary value problem. We apply the new selection scheme to some multidimensional Henon-Heiles problems and compare our results to those obtained with the more sophisticated Herman-Kluk approach. We find that the present technique gives near-quantitative agreement with the the standard results, but that the amount of computational effort is less than Herman-Kluk requires even when sophisticated integral smoothing techniques are employed in the latter.

  17. 2D/3D图像配准中的相似性测度和优化算法%Comparison of similarity measurement and optimization methods in 2 D/3 D image registration

    张冉; 王雷; 夏威; 高欣


    在手术引导治疗中,2D/3D图像配准能辅助医生准确定位病人病灶,而准确的配准涉及相似性测度和优化算法等众多方面。为了研究相似性测度和优化算法对2D/3D图像刚性配准的影响,本文结合6种相似性测度和4种优化方法在配准“金标准”数据上进行了2D/3D图像配准实验,并从配准成功率、平均迭代次数和平均配准时间三个方面对配准结果进行了对比研究。实验结果表明,以模式强度为相似性测度,用Powell方法进行优化搜索是最佳配准组合。并且,在不改变相似性测度条件下,Powell方法是所用优化方法中配准效果最好的优化方法。%In surgical guide treatment,2D/3D medical image registration can provide the precise position of patient for surgeon.Accurate registration involves many aspects,such as similarity measurements and optimization methods.In or-der to investigate the influence of similarity measurements and optimization methods on 2D/3D image registration,a comparison of six similarity measurements in combination with four optimization methods is performed using the public and available porcine skull phantom datasets from Medical University Vienna.Comparison is performed for the regis-tration results based on success rate,the number of iterations and execution time.The results show that the most accu-racy registration is obtained by pattern intensity combined with Powell.Furthermore,the best 2D/3D registration re-sults are obtained by Powell search strategy with fixed similarity measurement.

  18. 一种基于几何特征的表情相似性度量方法%A Similarity Measurement Method of Facial Expression Based on Geometric Features

    黄忠; 胡敏; 王晓华


    在表演驱动、表情克隆等人脸动画中,需要寻找最相似表情以提高动画真实感和逼真度。基于面部表情几何特征提出一种特征加权的表情相似性度量方法。首先,在主动外观模型上,利用链码描述各区域的形状特征以刻画局部表情细节,并根据区域特征点间的拓扑关系构建形变特征以反映整体表情信息。然后,采用特征加权方式对融合的几何特征进行相似性度量,并将权重的求解过程转化为加权目标函数最小化。最后,利用求解的权重以及特征加权函数度量表情间的相似性,寻找与之最相似的表情图像。在BU-3DFE数据库和FEEDTUM数据库上的实验结果表明,该方法在寻找相似表情的正确率方面明显高于现有的度量方法,并且对不同类型、不同强度的表情描述保持较好鲁棒性,尤其在嘴型、脸颊收缩、嘴开合幅度等表情细节维持较高相似度。%In facial animations such as performance-driven and expression cloning, it needs to find the most similar expression to enhance the reality and fidelity of animations. A feature-weighted expression similarity measurement method is proposed based on facial geometric features. Firstly, chain code is used to characterize shape features for local expression regions, meanwhile deformation features are built based on topological relations among regional feature points to reflect holistic expression information. Then, feature-weighted method is adopted to measure the similarities of fused geometric features, and the solving process of feature weights is transformed to minimizing process of the weighted objective function. Finally, the solved weights as well as feature weighting functions are performed to measure similarities between two expressions and seek the most similar image with a input expression image. The experimental results on BU-3 DFE database and FEEDTUM database show that the proposed method

  19. 基于本体分类结构计算医疗领域语义相似度的方法%Semantic Similarity Measures Based on Ontology Hierarchy Structure in Biomedical Domain



    相似度计算能提高从医疗源数据进行信息检索的效率并使得异构临床数据的集成变得更加容易。不同学者基于单个医疗本体,将经典的相似度计算方法用于医疗术语的相似性评估。本文选定基于距离的LCH方法,依据Pederson基准,对比该算法在基于MeSH、SNOMEDCT、UMLS本体时的相关度值,并就计算结果进行分析和解释。%Semantic similarity computation can promote the efficiency of information retrieval of biomedical resources, and make the integration of heterogeneous clinical data more easier. Various experts devote themselves on the application of classic semantic similarity measures over single biomedical ontology and develop them. In this paper, we compare the results for LCH measure over various ontologies such as MeSH SNOMED CT and UMLS. Finally we analysis and explain the results.

  20. 普通二型模糊相似度与包含度及其关系%Similarity and inclusion measures between general type-2 fuzzy sets and their relations

    郑高; 肖建; 蒋强; 王梦玲


    To overcome the problem of simplified methods of rule bases of a type-2 fuzzy logic system can not eliminate the harmful effects of redundant fuzzy sets,a similarity measure and an inclusion measure between general type-2 fuzzy sets are proposed.Firstly,based on the selected axiomatic definitions of two fuzzy measures,the computation formulas are given by considering the footprint of uncertainty(FOU) and the secondary membership function.Then, several properties of the proposed inclusion measure are discussed,and the theorems that two fuzzy measures can be transformed by each other are presented.Finally,two examples are given to validate their performance and the proposed similarity measure is applied to clustering analysis of general type-2 fuzzy data. The cluster results consist of a hierarchical tree in different-levels and can reasonably differentiate these type-2 fuzzy sets.%针对现有的二型模糊系统规则库精简方法不能有效地消除冗余模糊集合的问题,提出了新的普通二型模糊相似度与包含度,首先,基于2种模糊性测度的公理化定义,提出了计算公式;然后,讨论丁普通二型模糊包含度的性质,并提出了2种测度的相互转换定理;最后,通过实例验证了2种新测度的性能,井将提出的普通二型模糊相似度用于高斯普通二型模糊集合的聚类分析,聚类结果由不同α-水平上的层次聚类树组成,可以合理地区分这些集合.

  1. Word Semantic Similarity Measurement Based on Naïve Bayes Model%基于朴素贝叶斯模型的单词语义相似度度量

    王俊华; 左万利; 闫昭


    单词语义相似度度量是自然语言处理领域的经典和热点问题.通过结合朴素贝叶斯模型和知识库,提出一个新颖的度量单词语义相似度度量途径.首先借助通用本体 WordNet 获取属性变量,然后使用统计和分段线性插值生成条件概率分布列,继而通过贝叶斯推理实现信息融合获得后验概率,并在此基础上量化单词语义相似度.主要贡献是定义了单词对距离和深度,并将朴素贝叶斯模型用于单词语义相似度度量.在基准数据集 R&G(65)上,对比算法评判结果与人类评判结果的相关度,采用5折交叉验证对算法进行分析,样本 Pearson 相关度达到0.912,比当前最优方法高出0.4%,比经典算法高出7%~13%;Spearman 相关度达到0.873,比经典算法高出10%~20%;且算法的运行效率和经典算法相当.实验结果显示将朴素贝叶斯模型和知识库相结合解决单词语义相似度问题是合理有效的.%Measuring semantic similarity between words is a classical and hot problem in nature language processing ,the achievement of which has great impact on many applications such as word sense disambiguation , machine translation , ontology mapping , computational linguistics , etc . A novel approach is proposed to measure words semantic similarity by combining Naïve Bayes model with knowledge base . To start , extract attribute variables based on WordNet ; then , generate conditional probability distribution by statistics and piecewise linear interpolation technique ; after that ,obtain posteriori through Bayesian inference ;at last ,quantify word semantic similarity .The main contributions are definition of distance and depth between word pairs with small amount of computation and high degree of distinguishing the characteristics from words’ sense , and word semantic similarity measurement based on naïve Bayesian model .On benchmark data set R

  2. Multi-attribute decision making method based on improved similarity measure of intuitionistic fuzzy rough sets%基于直觉模糊粗糙集相似度的多属性决策方法

    范成礼; 邢清华; 邹志刚; 范学渊


    将直觉模糊粗糙集应用于多属性决策问题,提出了基于改进的直觉模糊粗糙集相似度的多属性决策方法。针对现有的直觉模糊粗糙集相似度忽略犹豫度而造成度量不精确的问题,提出了一种改进的直觉模糊粗糙集相似性度量方法,并揭示其若干重要性质。在此基础上,将属性值用直觉模糊粗糙集表示,并通过各个方案与直觉模糊粗糙集正、负理想方案的相似度比较,实现决策方案排序。数值实例表明了该方法的可行性和有效性,其在态势评估、目标识别等信息融合领域有良好的应用前景。%Intuitionistic Fuzzy Rough Sets(IFRS) are applied to the problems of Multi-Attribute Decision Making (MADM), and the method of MADM base on the improved similarity measure of IFRS is presented. Firstly, the im-proved similarity measure of IFRS is proposed which conquers the question of accurate degree of similarity measure by adding the hesitancy degree, and several important characters of it are revealed. Furthermore, the new method compares the alternatives with positive and negative ideal solution to realize alternative ranking, whose attribute values are consid-ered as IFRS. At last, the practical example shows the feasibility and effectiveness of the proposed method, which has the preferable application foreground in information fusion field, such as situation assessment and target recognition.

  3. Data similarity measurement method for shape-based distance in thermal vacuum test%基于形态距离的真空热试验数据相似性度量研究

    谢吉慧; 郄殿福


    For data automatic monitoring in the vacuum thermal test, the data abnormity detection method based on data mining is studied, and an improved DTW shape-based distance similarity measurement method is proposed. The algorithm reduces the amount of computation by the wavelet transformation and the search width limit, which avoids the distortion of the shape symbol by adjusting the calculation method. Its parameters are universal for the vacuum thermal test data and the best range of parameters is obtained through the statistical analysis of the actual sample data's similarity clustering accuracy.%针对真空热试验过程中的数据自动化监视需求,利用数据挖掘手段开展数据异常监测方法研究,提出了一种改进DTW-形态距离相似性度量算法,通过调整形态符号的计算方法,避免了数据规范化带来的形态符号计算失真问题.对实际样本数据相似性聚类准确率进行统计分析,获得了相关参数的最佳取值范围,达到了较高的聚类精度.

  4. Performance Indexes: Similarities and Differences

    André Machado Caldeira


    Full Text Available The investor of today is more rigorous on monitoring a financial assets portfolio. He no longer thinks only in terms of the expected return (one dimension, but in terms of risk-return (two dimensions. Thus new perception is more complex, since the risk measurement can vary according to anyone’s perception; some use the standard deviation for that, others disagree with this measure by proposing others. In addition to this difficulty, there is the problem of how to consider these two dimensions. The objective of this essay is to study the main performance indexes through an empirical study in order to verify the differences and similarities for some of the selected assets. One performance index proposed in Caldeira (2005 shall be included in this analysis.

  5. 基于证据相似性度量的冲突性区间证据融合方法%A New Fusion Method of Conflicting Interval Evidence Based on the Similarity Measure of Evidence

    冯海山; 徐晓滨; 文成林


    基于证据相似性度量,该文提出一种冲突性区间证据融合的新方法.首先,定义了扩展型Pignistic概率转换,将区间证据转换为区间型Pignistic概率.利用区间模糊集的归一化欧式距离,求取区间型Pignistic概率之间的相似性,以此确定两两证据间的相似度矩阵,从中获取区间证据的置信度.然后,基于该置信度对原始的区间证据进行加权平均得到新的区间证据,利用Demspter区间证据组合公式对其进行融合.该方法可以有效地减弱高冲突性区间证据在组合规则中的作用,从而减小融合后所得区间证据的宽度,最终可降低决策中的不确定性.最后通过多个典型算例验证了经冲突处理后再对区间证据进行融合,要比直接融合能够产生更为合理和可靠的结果.%Based on the similarity measure of evidence, a new method for combining conflicting interval evidence is proposed. Firstly, interval evidence can be transformed into interval-valued Pignistic probability by using the defined extended Pignistic probability function. Using the normalized Euclidean distance of interval-valued fuzzy sets, the similarity between Pignistic probabilities of interval evidence are obtained, and similarity measure matrix can be constructed, from which the credibility degrees (weights) of interval evidence can be got. Secondly, based on the credibility degrees, new interval evidence can be obtained by modified and weightedly averaging the original interval evidence. Using Demspter interval evidence combination rule, the fusion result can be obtained by combining the new interval evidence. The proposed method can effectively eliminate the effect of highly conflicting interval evidence in combination so as to reduce the width of combined interval evidence. Therefore the uncertainty of decision-making can be decreased. Finally, in classical numerical examples, compared with the fused results by directly using Demspter interval

  6. On distributional assumptions and whitened cosine similarities

    Loog, Marco


    Recently, an interpretation of the whitened cosine similarity measure as a Bayes decision rule was proposed (C. Liu, "The Bayes Decision Rule Induced Similarity Measures,'' IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 1086-1090, June 2007. This communication makes th...... the observation that some of the distributional assumptions made to derive this measure are very restrictive and, considered simultaneously, even inconsistent....

  7. 引入相似性度量的GPU实时图形跟踪渲染技术%GPU Real-time Tracking Graphics Rendering Technology Based on Similar Measurement



    提出一种基于目标分布场相似性度量的实时图形跟踪渲染算法.使用OpenFlight的建模环境提供GPU实时图形渲染三维图形观察器,得到一个有二维层次的结构图,进行目标分布场设计,结合静态视点图像的运动方程,通过对图像自然分层,保留原始图像的基本信息,为了在跟踪中使分布场能适应各种复杂场景,需要对原始的分布场进行高斯平滑,通过目标分布场相似性度量,实现GPU实时图形跟踪渲染.仿真结果表明,采用该算法进行实时图形渲染,可以提高渲染跟踪效率,搜索时间短,误差率较低,提高了图形的渲染真实感.%This paper proposed a similar target field real-time graphics rendering algorithm based on the tracking measure-ment. Modeling environment using OpenFlight GPU real-time rendering of 3D graphics viewer, a two-dimensional layered structure, the target distribution design, combined with the equations of motion of a static view images, the image of natural stratification, retains the basic information of the original image, in order to make the field can adapt to a variety of complex scene in the trace, Gauss needs to smooth the original distribution field, the distribution of target similarity measurement, real-time graphics rendering GPU tracking. The simulation results show that, by using the algorithm of real-time graphics rendering, can improve the efficiency of rendering tracking, search time is short, low error rate, improve the graphics render-ing.

  8. 一种基于模糊模型相似测量的字符无监督分类法%An Approach to Unsupervised Character Classification Based on Similarity Measure in Fuzzy Model

    卢达; 钱忆平; 谢铭培; 浦炜


    提出了一种能有效完成对无监督字符分类的模糊逻辑方法,以提高字符识别系统的速度,正确性和鲁棒性.字符首先被分为8种印刷结构类,然后采用模式匹配方法将各类字符分别转换成基于一非线性加权相似函数的模糊样板集合.模糊无监督字符的分类是字符匹配的一种自然范例并发展了加权模糊相似测量的研究.本文讨论了该模糊模型的特性并用以加快字符分类处理,经过字符分类,在字符识别时由于只需针对较小的模糊样板集合而变得容易和快速.%This paper presents a fuzzy logic approach to efficiently perform unsupervised character classification for improvement in robustness, correctness and speed of a character recognition system. The characters are first split into eight typographical categories. The classification scheme uses pattern matching to classify the characters in each category into a set of fuzzy prototypes based on a nonlinear weighted similarity function. The fuzzy unsupervised character classification, which is natural in the representation of prototypes for character matching, is developed and a weighted fuzzy similarity measure is explored.The characteristics of the fuzzy model are discussed and used in speeding up the classification process. After classification, the character recognition which is simply applied on a smaller set of the fuzzy prototypes, becomes much easier and less time-consuming.

  9. A Similarity Search Using Molecular Topological Graphs

    Yoshifumi Fukunishi


    Full Text Available A molecular similarity measure has been developed using molecular topological graphs and atomic partial charges. Two kinds of topological graphs were used. One is the ordinary adjacency matrix and the other is a matrix which represents the minimum path length between two atoms of the molecule. The ordinary adjacency matrix is suitable to compare the local structures of molecules such as functional groups, and the other matrix is suitable to compare the global structures of molecules. The combination of these two matrices gave a similarity measure. This method was applied to in silico drug screening, and the results showed that it was effective as a similarity measure.

  10. Reputation Model Based on Similarity Measure of Trust Value in Wireless Sensor Networks%基于信任贴近度的无线传感器网络信誉模型

    姚放吾; 张文超


    信誉评估模型作为传统密钥安全机制的有效补充,对无线传感器网络的可靠运行和安全保障具有重要意义.文中提出了一种基于信任贴近度的无线传感器网络信誉模型RMSMTV.本模型建立在簇型网络拓扑结构上,利用模糊贴近度理论衡量邻居节点推荐的信任值的可信度,并运用矩阵论方法实现了信任值整合过程中自适应分配权重,最终得到了节点综合信任值.考虑到无线传感器网络节点能量有限的特性,还对簇头节点轮换策略进行了讨论.最后通过实验验证了RMSMTV具有良好的容错性和鲁棒性,能实时、准确地发现恶意节点的攻击,有效提高了无线传感器网络的安全性.%Reputation evaluation model,the supplement to the traditional key security mechanism,has significant nwming to guarantee reliable operation and security to the wireless sensor network. It proposes a reputation model based on similarity measure of trust value in wireless sensor networks. This model,constructed on the duster-type network topology,uses the fuzzy similarity measure theory to evaluate the reliability of die recommended trust values of neighbor nodes. Then use the matrix theory method to realize the adaptive value distribution weight in the integration process of trust value. Finally,get the comprehensive trust value of a node by combining two values. Taking the consideration of limitation of the energy in the wireless sensor network node, it also has the discussion of the node rotation strategy. Finally the experiment proves that RMSMTV has good fault tolerance and robustness,can find the attack of malicious node in real-time and accurately,and effectively improve the security of wireless sensor networks.

  11. Fuzzy classification algorithm based on fuzzy concept similarity and fuzzy entropy measure%基于模糊概念相似性与模糊熵度量的模糊分类算法

    冯兴华; 刘晓东; 刘亚清


    在 AFS(axiomatic fuzzy set)理论框架下,提出了一种基于模糊概念相似性与模糊熵度量的分类算法。模糊分类规则的前件通过概念聚合得到,一种基于模糊概念相似性与模糊熵度量的概念选择函数指导聚合过程;然后,利用剪枝算法对得到的模糊规则集进行剪枝,得到最终的分类规则集。用8组来自 UCI数据库的数据集作为实验数据对算法进行验证,并与7种经典分类方法进行比较。实验结果表明该算法能得到较高的分类精度,分类结果明显优于参照的分类方法。%A method to construct a fuzzy concept similarity and fuzzy entropy measure-based classifier by using the axiomatic fuzzy set (AFS)theory is developed.A selection index based on fuzzy concept similarity and fuzzy entropy measure is proposed.Being guided by the selection index,the antecedents of the fuzzy classification rules are selected from the fuzzy concepts which are found when using the aggregation algorithm.And then,the obtained fuzzy rules are pruned by pruning algorithm,and the final classification rule group is obtained.The performance of the proposed classifier is compared with the results produced by 7 classifiers commonly encountered in the literatures when using eight datasets taken from the UCI Machine Learning Repository.It has been found that the accuracy on test data produced by the proposed classifier is higher than that produced by the other classifiers.

  12. 基于区间数度量的区间值模糊集合的贴近度和模糊度的关系%Relationship between Similarity Measure and Entropy of Interval-valued Fuzzy Sets Based on Interval-number Measurement

    曾文艺; 赵宜宾


    给出了基于区间数度量的区间值模糊集合的贴近度和模糊度的概念,详细研究了区间值模糊集合的贴近度和模糊度之间的关系,并基于公理化定义,证明了它们二者之间的相互转化关系,最后,给出了若干公式来计算区间值模糊集合的贴近度和模糊度.%In this paper, we introduce the concepts of similarity measure and entropy of interval-valued fuzzy sets based on interval-number measurement, investigate the relationship between similarity measure and entropy of interval-valued fuzzy sets in detail and prove that they can be transformed by each other based on their axiomatic definitions. Finally, we propose some new formulas to calculate similarity measure and entropy of interval-valued fuzzy sets.

  13. Functional Similarity and Interpersonal Attraction.

    Neimeyer, Greg J.; Neimeyer, Robert A.


    Students participated in dyadic disclosure exercises over a five-week period. Results indicated members of high functional similarity dyads evidenced greater attraction to one another than did members of low functional similarity dyads. "Friendship" pairs of male undergraduates displayed greater functional similarity than did…

  14. Functional Similarity and Interpersonal Attraction.

    Neimeyer, Greg J.; Neimeyer, Robert A.


    Students participated in dyadic disclosure exercises over a five-week period. Results indicated members of high functional similarity dyads evidenced greater attraction to one another than did members of low functional similarity dyads. "Friendship" pairs of male undergraduates displayed greater functional similarity than did "nominal" pairs from…


    Q. X. Xu


    Full Text Available The semantic similarities are important in concept definition, recognition, categorization, interpretation, and integration. Many semantic similarity models have been established to evaluate semantic similarities of objects or/and concepts. To find out the suitability and performance of different models in evaluating concept similarities, we make a comparison of four main types of models in this paper: the geometric model, the feature model, the network model, and the transformational model. Fundamental principles and main characteristics of these models are introduced and compared firstly. Land use and land cover concepts of NLCD92 are employed as examples in the case study. The results demonstrate that correlations between these models are very high for a possible reason that all these models are designed to simulate the similarity judgement of human mind.

  16. a Comparison of Semantic Similarity Models in Evaluating Concept Similarity

    Xu, Q. X.; Shi, W. Z.


    The semantic similarities are important in concept definition, recognition, categorization, interpretation, and integration. Many semantic similarity models have been established to evaluate semantic similarities of objects or/and concepts. To find out the suitability and performance of different models in evaluating concept similarities, we make a comparison of four main types of models in this paper: the geometric model, the feature model, the network model, and the transformational model. Fundamental principles and main characteristics of these models are introduced and compared firstly. Land use and land cover concepts of NLCD92 are employed as examples in the case study. The results demonstrate that correlations between these models are very high for a possible reason that all these models are designed to simulate the similarity judgement of human mind.

  17. Renewing the Respect for Similarity

    Shimon eEdelman


    Full Text Available In psychology, the concept of similarity has traditionally evoked a mixture of respect, stemmingfrom its ubiquity and intuitive appeal, and concern, due to its dependence on the framing of the problemat hand and on its context. We argue for a renewed focus on similarity as an explanatory concept, bysurveying established results and new developments in the theory and methods of similarity-preservingassociative lookup and dimensionality reduction — critical components of many cognitive functions, aswell as of intelligent data management in computer vision. We focus in particular on the growing familyof algorithms that support associative memory by performing hashing that respects local similarity, andon the uses of similarity in representing structured objects and scenes. Insofar as these similarity-basedideas and methods are useful in cognitive modeling and in AI applications, they should be included inthe core conceptual toolkit of computational neuroscience.

  18. Similarity Learning of Manifold Data.

    Chen, Si-Bao; Ding, Chris H Q; Luo, Bin


    Without constructing adjacency graph for neighborhood, we propose a method to learn similarity among sample points of manifold in Laplacian embedding (LE) based on adding constraints of linear reconstruction and least absolute shrinkage and selection operator type minimization. Two algorithms and corresponding analyses are presented to learn similarity for mix-signed and nonnegative data respectively. The similarity learning method is further extended to kernel spaces. The experiments on both synthetic and real world benchmark data sets demonstrate that the proposed LE with new similarity has better visualization and achieves higher accuracy in classification.

  19. Bilateral Trade Flows and Income Distribution Similarity.

    Martínez-Zarzoso, Inmaculada; Vollmer, Sebastian


    Current models of bilateral trade neglect the effects of income distribution. This paper addresses the issue by accounting for non-homothetic consumer preferences and hence investigating the role of income distribution in the context of the gravity model of trade. A theoretically justified gravity model is estimated for disaggregated trade data (Dollar volume is used as dependent variable) using a sample of 104 exporters and 108 importers for 1980-2003 to achieve two main goals. We define and calculate new measures of income distribution similarity and empirically confirm that greater similarity of income distribution between countries implies more trade. Using distribution-based measures as a proxy for demand similarities in gravity models, we find consistent and robust support for the hypothesis that countries with more similar income-distributions trade more with each other. The hypothesis is also confirmed at disaggregated level for differentiated product categories.

  20. Distances and similarities in intuitionistic fuzzy sets

    Szmidt, Eulalia


    This book presents the state-of-the-art in theory and practice regarding similarity and distance measures for intuitionistic fuzzy sets. Quantifying similarity and distances is crucial for many applications, e.g. data mining, machine learning, decision making, and control. The work provides readers with a comprehensive set of theoretical concepts and practical tools for both defining and determining similarity between intuitionistic fuzzy sets. It describes an automatic algorithm for deriving intuitionistic fuzzy sets from data, which can aid in the analysis of information in large databases. The book also discusses other important applications, e.g. the use of similarity measures to evaluate the extent of agreement between experts in the context of decision making.

  1. A direct assay of carboxyl-containing small molecules by SALDI-MS on a AgNP/rGO-based nanoporous hybrid film.

    Hong, Min; Xu, Lidan; Wang, Fangli; Geng, Zhirong; Li, Haibo; Wang, Huaisheng; Li, Chen-Zhong


    Silver nanoparticles (AgNPs) and reduced graphene oxide (rGO) hybrid nanoporous structures fabricated by the layer-by-layer (LBL) electrostatic self-assembly have been applied as a simple platform for the rapid analysis of carboxyl-containing small molecules by surface-assisted laser desorption/ionization (D/I) mass spectrometry (SALDI-MS). By the simple one-step deposition of analytes onto the (AgNP/rGO)9 multilayer film, the MS measurements of various carboxyl-containing small molecules (including amino acids, fatty acids and organic dicarboxylic acids) can be done. In contrast to other energy transfer materials relative to AgNPs, the signal interferences of a Ag cluster (Agn(+) or Agn(-)) and a C cluster (Cn(+) or Cn(-)) have been effectively reduced or eliminated. The effects of various factors, such as the pore structure and composition of the substrates, on the efficiency of D/I have been investigated by comparing with the (AgNP)9 LBL nanoporous structure, (AgNP/rGO)9/(SiO2NP)6 LBL multilayer film and AgNP/prGO nanocomposites.

  2. Dynamic similarity in erosional processes

    Scheidegger, A.E.


    A study is made of the dynamic similarity conditions obtaining in a variety of erosional processes. The pertinent equations for each type of process are written in dimensionless form; the similarity conditions can then easily be deduced. The processes treated are: raindrop action, slope evolution and river erosion. ?? 1963 Istituto Geofisico Italiano.

  3. Wavelet transform in similarity paradigm

    Z.R. Struzik; A.P.J.M. Siebes (Arno)


    textabstract[INS-R9802] Searching for similarity in time series finds still broader applications in data mining. However, due to the very broad spectrum of data involved, there is no possibility of defining one single notion of similarity suitable to serve all applications. We present a powerful

  4. Similarity-based pattern analysis and recognition

    Pelillo, Marcello


    This accessible text/reference presents a coherent overview of the emerging field of non-Euclidean similarity learning. The book presents a broad range of perspectives on similarity-based pattern analysis and recognition methods, from purely theoretical challenges to practical, real-world applications. The coverage includes both supervised and unsupervised learning paradigms, as well as generative and discriminative models. Topics and features: explores the origination and causes of non-Euclidean (dis)similarity measures, and how they influence the performance of traditional classification alg

  5. Similarity of samples and trimming

    Álvarez-Esteban, Pedro C; Cuesta-Albertos, Juan A; Matrán, Carlos; 10.3150/11-BEJ351


    We say that two probabilities are similar at level $\\alpha$ if they are contaminated versions (up to an $\\alpha$ fraction) of the same common probability. We show how this model is related to minimal distances between sets of trimmed probabilities. Empirical versions turn out to present an overfitting effect in the sense that trimming beyond the similarity level results in trimmed samples that are closer than expected to each other. We show how this can be combined with a bootstrap approach to assess similarity from two data samples.

  6. Contextual Bandits with Similarity Information

    Slivkins, Aleksandrs


    In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time-invariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now well-understood, a lot of recent work has focused on MAB problems with exponentially or infinitely large strategy sets, where one needs to assume extra structure in order to make the problem tractable. In particular, recent literature considered information on similarity between arms. We consider similarity information in the setting of "contextual bandits", a natural extension of the basic MAB problem where before each round an algorithm is given the "context" -- a hint about the payoffs in this round. Contextual bandits are directly motivated by placing advertisements on webpages, one of the crucial problems in sponsored search. A particularly simple way to represent similarity information in the contextual bandit setting is via a "similarity distance...

  7. Self-similar aftershock rates

    Davidsen, Jörn; Baiesi, Marco


    In many important systems exhibiting crackling noise—an intermittent avalanchelike relaxation response with power-law and, thus, self-similar distributed event sizes—the "laws" for the rate of activity after large events are not consistent with the overall self-similar behavior expected on theoretical grounds. This is particularly true for the case of seismicity, and a satisfying solution to this paradox has remained outstanding. Here, we propose a generalized description of the aftershock rates which is both self-similar and consistent with all other known self-similar features. Comparing our theoretical predictions with high-resolution earthquake data from Southern California we find excellent agreement, providing particularly clear evidence for a unified description of aftershocks and foreshocks. This may offer an improved framework for time-dependent seismic hazard assessment and earthquake forecasting.

  8. Unmixing of spectrally similar minerals

    Debba, Pravesh


    Full Text Available -bearing oxide/hydroxide/sulfate minerals in complex mixtures be obtained using hyperspectral data? Debba (CSIR) Unmixing of spectrally similar minerals MERAKA 2009 3 / 18 Method of spectral unmixing Old method: problem Linear Spectral Mixture Analysis (LSMA...

  9. Self-similar aftershock rates

    Davidsen, Jörn


    In many important systems exhibiting crackling noise --- intermittent avalanche-like relaxation response with power-law and, thus, self-similar distributed event sizes --- the "laws" for the rate of activity after large events are not consistent with the overall self-similar behavior expected on theoretical grounds. This is in particular true for the case of seismicity and a satisfying solution to this paradox has remained outstanding. Here, we propose a generalized description of the aftershock rates which is both self-similar and consistent with all other known self-similar features. Comparing our theoretical predictions with high resolution earthquake data from Southern California we find excellent agreement, providing in particular clear evidence for a unified description of aftershocks and foreshocks. This may offer an improved way of time-dependent seismic hazard assessment and earthquake forecasting.

  10. Community Detection by Neighborhood Similarity

    LIU Xu; XIE Zheng; YI Dong-Yun


    Detection of the community structure in a network is important for understanding the structure and dynamics of the network.By exploring the neighborhood of vertices,a local similarity metric is proposed,which can be quickly computed.The resulting similarity matrix retains the same support as the adjacency matrix.Based on local similarity,an agglomerative hierarchical clustering algorithm is proposed for community detection.The algorithm is implemented by an efficient max-heap data structure and runs in nearly linear time,thus is capable of dealing with large sparse networks with tens of thousands of nodes.Experiments on synthesized and real-world networks demonstrate that our method is efficient to detect community structures,and the proposed metric is the most suitable one among all the tested similarity indices.%Detection of the community structure in a network is important for understanding the structure and dynamics of the network. By exploring the neighborhood of vertices, a local similarity metric is proposed, which can be quickly computed. The resulting similarity matrix retains the same support as the adjacency matrix. Based on local similarity, an agglomerative hierarchical clustering algorithm is proposed for community detection. The algorithm is implemented by an efficient max-heap data structure and runs in nearly linear time, thus is capable of dealing with large sparse networks with tens of thousands of nodes. Experiments on synthesized and real-world networks demonstrate that our method is efficient to detect community structures, and the proposed metric is the most suitable one among all the tested similarity indices.

  11. Solid-state nuclear magnetic resonance measurements of HIV fusion peptide 13CO to lipid 31P proximities support similar partially inserted membrane locations of the α helical and β sheet peptide structures.

    Gabrys, Charles M; Qiang, Wei; Sun, Yan; Xie, Li; Schmick, Scott D; Weliky, David P


    Fusion of the human immunodeficiency virus (HIV) membrane and the host cell membrane is an initial step of infection of the host cell. Fusion is catalyzed by gp41, which is an integral membrane protein of HIV. The fusion peptide (FP) is the ∼25 N-terminal residues of gp41 and is a domain of gp41 that plays a key role in fusion catalysis likely through interaction with the host cell membrane. Much of our understanding of the FP domain has been accomplished with studies of "HFP", i.e., a ∼25-residue peptide composed of the FP sequence but lacking the rest of gp41. HFP catalyzes fusion between membrane vesicles and serves as a model system to understand fusion catalysis. HFP binds to membranes and the membrane location of HFP is likely a significant determinant of fusion catalysis perhaps because the consequent membrane perturbation reduces the fusion activation energy. In the present study, many HFPs were synthesized and differed in the residue position that was (13)CO backbone labeled. Samples were then prepared that each contained a singly (13)CO labeled HFP incorporated into membranes that lacked cholesterol. HFP had distinct molecular populations with either α helical or oligomeric β sheet structure. Proximity between the HFP (13)CO nuclei and (31)P nuclei in the membrane headgroups was probed by solid-state NMR (SSNMR) rotational-echo double-resonance (REDOR) measurements. For many samples, there were distinct (13)CO shifts for the α helical and β sheet structures so that the proximities to (31)P nuclei could be determined for each structure. Data from several differently labeled HFPs were then incorporated into a membrane location model for the particular structure. In addition to the (13)CO labeled residue position, the HFPs also differed in sequence and/or chemical structure. "HFPmn" was a linear peptide that contained the 23 N-terminal residues of gp41. "HFPmn_V2E" contained the V2E mutation that for HIV leads to greatly reduced extent of fusion and

  12. Discovering Music Structure via Similarity Fusion

    Arenas-García, Jerónimo; Parrado-Hernandez, Emilio; Meng, Anders;

    Automatic methods for music navigation and music recommendation exploit the structure in the music to carry out a meaningful exploration of the “song space”. To get a satisfactory performance from such systems, one should incorporate as much information about songs similarity as possible; however...... semantics”, in such a way that all observed similarities can be satisfactorily explained using the latent semantics. Therefore, one can think of these semantics as the real structure in music, in the sense that they can explain the observed similarities among songs. The suitability of the PLSA model...... for representing music structure is studied in a simplified scenario consisting of 4412 songs and two similarity measures among them. The results suggest that the PLSA model is a useful framework to combine different sources of information, and provides a reasonable space for song representation....

  13. Discovering Music Structure via Similarity Fusion

    Arenas-García, Jerónimo; Parrado-Hernandez, Emilio; Meng, Anders

    Automatic methods for music navigation and music recommendation exploit the structure in the music to carry out a meaningful exploration of the “song space”. To get a satisfactory performance from such systems, one should incorporate as much information about songs similarity as possible; however...... semantics”, in such a way that all observed similarities can be satisfactorily explained using the latent semantics. Therefore, one can think of these semantics as the real structure in music, in the sense that they can explain the observed similarities among songs. The suitability of the PLSA model...... for representing music structure is studied in a simplified scenario consisting of 4412 songs and two similarity measures among them. The results suggest that the PLSA model is a useful framework to combine different sources of information, and provides a reasonable space for song representation....

  14. Notions of similarity for computational biology models

    Waltemath, Dagmar


    Computational models used in biology are rapidly increasing in complexity, size, and numbers. To build such large models, researchers need to rely on software tools for model retrieval, model combination, and version control. These tools need to be able to quantify the differences and similarities between computational models. However, depending on the specific application, the notion of similarity may greatly vary. A general notion of model similarity, applicable to various types of models, is still missing. Here, we introduce a general notion of quantitative model similarities, survey the use of existing model comparison methods in model building and management, and discuss potential applications of model comparison. To frame model comparison as a general problem, we describe a theoretical approach to defining and computing similarities based on different model aspects. Potentially relevant aspects of a model comprise its references to biological entities, network structure, mathematical equations and parameters, and dynamic behaviour. Future similarity measures could combine these model aspects in flexible, problem-specific ways in order to mimic users\\' intuition about model similarity, and to support complex model searches in databases.

  15. Semantic similarity between ontologies at different scales

    Zhang, Qingpeng; Haglin, David J.


    In the past decade, existing and new knowledge and datasets has been encoded in different ontologies for semantic web and biomedical research. The size of ontologies is often very large in terms of number of concepts and relationships, which makes the analysis of ontologies and the represented knowledge graph computational and time consuming. As the ontologies of various semantic web and biomedical applications usually show explicit hierarchical structures, it is interesting to explore the trade-offs between ontological scales and preservation/precision of results when we analyze ontologies. This paper presents the first effort of examining the capability of this idea via studying the relationship between scaling biomedical ontologies at different levels and the semantic similarity values. We evaluate the semantic similarity between three Gene Ontology slims (Plant, Yeast, and Candida, among which the latter two belong to the same kingdom—Fungi) using four popular measures commonly applied to biomedical ontologies (Resnik, Lin, Jiang-Conrath, and SimRel). The results of this study demonstrate that with proper selection of scaling levels and similarity measures, we can significantly reduce the size of ontologies without losing substantial detail. In particular, the performance of Jiang-Conrath and Lin are more reliable and stable than that of the other two in this experiment, as proven by (a) consistently showing that Yeast and Candida are more similar (as compared to Plant) at different scales, and (b) small deviations of the similarity values after excluding a majority of nodes from several lower scales. This study provides a deeper understanding of the application of semantic similarity to biomedical ontologies, and shed light on how to choose appropriate semantic similarity measures for biomedical engineering.

  16. Music Retrieval based on Melodic Similarity

    Typke, R.


    This thesis introduces a method for measuring melodic similarity for notated music such as MIDI files. This music search algorithm views music as sets of notes that are represented as weighted points in the two-dimensional space of time and pitch. Two point sets can be compared by calculating how mu

  17. Efficient Similarity Retrieval in Music Databases

    Ruxanda, Maria Magdalena; Jensen, Christian Søndergaard


    object is modeled as a time sequence of high-dimensional feature vectors, and dynamic time warping (DTW) is used as the similarity measure. To accomplish this, the paper extends techniques for time-series-length reduction and lower bounding of DTW distance to the multi-dimensional case. Further...

  18. Use of gonadotropin-releasing hormone agonist trigger during in vitro fertilization is associated with similar endocrine profiles and oocyte measures in women with and without polycystic ovary syndrome.

    O'Neill, Kathleen E; Senapati, Suneeta; Dokras, Anuja


    To compare endocrine profiles and IVF outcomes after using GnRH agonists (GnRHa) to trigger final oocyte maturation in women with polycystic ovary syndrome (PCOS) with other hyper-responders. Retrospective cohort study. Academic center. Forty women with PCOS and 74 hyper-responders without PCOS. GnRHa trigger. Number of oocytes. Serum E2, LH, and P levels on the day of GnRHa trigger and the day after trigger did not differ significantly between groups. There were no significant differences in total number of oocytes or percent mature oocytes obtained between groups after controlling for age, antral follicle count, and total days of stimulation. The overall rate of no retrieval of oocytes after trigger was low (2.6%). Fertilization, implantation, clinical pregnancy, and live-birth rates were similar in the two groups. No patients developed ovarian hyperstimulation syndrome (OHSS). The similar post-GnRHa trigger hormone profiles and mature oocyte yield support the routine use of GnRHa trigger to prevent OHSS in women with PCOS. Copyright © 2015 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.

  19. Revisiting Inter-Genre Similarity

    Sturm, Bob L.; Gouyon, Fabien


    We revisit the idea of ``inter-genre similarity'' (IGS) for machine learning in general, and music genre recognition in particular. We show analytically that the probability of error for IGS is higher than naive Bayes classification with zero-one loss (NB). We show empirically that IGS does...


    Ramshankar Vijayalakshmi


    Full Text Available Recently Biopharmaceuticals are the new chemotherapeutical agents that are called as “Biosimilars” or “follow on protein products” by the European Medicines Agency (EMA and the American regulatory agencies (Food and Drug Administration respectively. Biosimilars are extremely similar to the reference molecule but not identical, however close their similarities may be. A regulatory framework is therefore in place to assess the application for marketing authorisation of biosimilars. When a biosimilar is similar to the reference biopharmaceutical in terms of safety, quality, and efficacy, it can be registered. It is important to document data from clinical trials with a view of similar safety and efficacy. If the development time for a generic medicine is around 3 years, a biosimilar takes about 6-9 years. Generic medicines need to demonstrate bioequivalence only unlike biosimilars that need to conduct phase I and Phase III clinical trials. In this review, different biosimilars that are already being used successfully in the field on Oncology is discussed. Their similarity, differences and guidelines to be followed before a clinically informed decision to be taken, is discussed. More importantly the regulatory guidelines that are operational in India with a work flow of making a biosimilar with relevant dos and dont’s are discussed. For a large populous country like India, where with improved treatments in all sectors including oncology, our ageing population is increasing. For the health care of this sector, we need more newer, cheaper and effective biosimilars in the market. It becomes therefore important to understand the regulatory guidelines and steps to come up with more biosimilars for the existing population and also more information is mandatory for the practicing clinicians to translate these effectively into clinical practice.

  1. Partial order similarity based on mutual information

    Tibély, Gergely; Palla, Gergely


    Comparing the ranking of candidates by different voters is an important topic in social and information science with a high relevance from the point of view of practical applications. In general, ties and pairs of incomparable candidates may occur, thus, the alternative rankings are described by partial orders. Various distance measures between partial orders have already been introduced, where zero distance is corresponding to a perfect match between a pair of partial orders, and larger values signal greater differences. Here we take a different approach and propose a similarity measure based on adjusted mutual information. In general, the similarity value of unity is corresponding to exactly matching partial orders, while a low similarity is associated to a pair of independent partial orders. The time complexity of the computation of this similarity measure is $\\mathcal{O}(\\left|{\\mathcal C}\\right|^3)$ in the worst case, and $\\mathcal{O}(\\left|{\\mathcal C}\\right|^2\\ln \\left|{\\mathcal C}\\right|)$ in the typi...

  2. Statistical techniques to find similar objects in images

    Fodor, I K


    One problem in similarity-based object retrieval (SBOR) is how to define and estimate the similarity between two objects. In this paper we present a shape similarity measure based on thin-plate splines, and compare its performance with several other measures used in SBOR. We evaluate the methods on both artificial and real images.

  3. Similarity searching in large combinatorial chemistry spaces

    Rarey, Matthias; Stahl, Martin


    We present a novel algorithm, called Ftrees-FS, for similarity searching in large chemistry spaces based on dynamic programming. Given a query compound, the algorithm generates sets of compounds from a given chemistry space that are similar to the query. The similarity search is based on the feature tree similarity measure representing molecules by tree structures. This descriptor allows handling combinatorial chemistry spaces as a whole instead of looking at subsets of enumerated compounds. Within few minutes of computing time, the algorithm is able to find the most similar compound in very large spaces as well as sets of compounds at an arbitrary similarity level. In addition, the diversity among the generated compounds can be controlled. A set of 17 000 fragments of known drugs, generated by the RECAP procedure from the World Drug Index, was used as the search chemistry space. These fragments can be combined to more than 1018 compounds of reasonable size. For validation, known antagonists/inhibitors of several targets including dopamine D4, histamine H1, and COX2 are used as queries. Comparison of the compounds created by Ftrees-FS to other known actives demonstrates the ability of the method to jump between structurally unrelated molecule classes.

  4. Self-similarity Driven Demosaicking

    Antoni Buades


    Full Text Available Digital cameras record only one color component per pixel, red, green or blue. Demosaicking is the process by which one can infer a whole color matrix from such a matrix of values, thus interpolating the two missing color values per pixel. In this article we propose a demosaicking method based on the property of non-local self-similarity of images.

  5. Sparse Similarity-Based Fisherfaces

    Fagertun, Jens; Gomez, David Delgado; Hansen, Mads Fogtmann;


    In this work, the effect of introducing Sparse Principal Component Analysis within the Similarity-based Fisherfaces algorithm is examined. The technique aims at mimicking the human ability to discriminate faces by projecting the faces in a highly discriminative and easy interpretative way. Pixel...... obtain the same recognition results as the technique in a dense version using only a fraction of the input data. Furthermore, the presented results suggest that using SPCA in the technique offers robustness to occlusions....

  6. Active browsing using similarity pyramids

    Chen, Jau-Yuen; Bouman, Charles A.; Dalton, John C.


    In this paper, we describe a new approach to managing large image databases, which we call active browsing. Active browsing integrates relevance feedback into the browsing environment, so that users can modify the database's organization to suit the desired task. Our method is based on a similarity pyramid data structure, which hierarchically organizes the database, so that it can be efficiently browsed. At coarse levels, the similarity pyramid allows users to view the database as large clusters of similar images. Alternatively, users can 'zoom into' finer levels to view individual images. We discuss relevance feedback for the browsing process, and argue that it is fundamentally different from relevance feedback for more traditional search-by-query tasks. We propose two fundamental operations for active browsing: pruning and reorganization. Both of these operations depend on a user-defined relevance set, which represents the image or set of images desired by the user. We present statistical methods for accurately pruning the database, and we propose a new 'worm hole' distance metric for reorganizing the database, so that members of the relevance set are grouped together.

  7. Self-Similar Collisionless Shocks

    Katz, B; Waxman, E; Katz, Boaz; Keshet, Uri; Waxman, Eli


    Observations of gamma-ray burst afterglows suggest that the correlation length of magnetic field fluctuations downstream of relativistic non-magnetized collisionless shocks grows with distance from the shock to scales much larger than the plasma skin depth. We argue that this indicates that the plasma properties are described by a self-similar solution, and derive constraints on the scaling properties of the solution. For example, we find that the scaling of the characteristic magnetic field amplitude with distance from the shock is B \\propto D^{s_B} with -1 \\propto x^{2s_B} (for x>>D). We show that the plasma may be approximated as a combination of two self-similar components: a kinetic component of energetic particles and an MHD-like component representing "thermal" particles. We argue that the latter may be considered as infinitely conducting, in which case s_B=0 and the scalings are completely determined (e.g. dn/dE \\propto E^{-2} and B \\propto D^0). Similar claims apply to non- relativistic shocks such a...

  8. A Comparison of Similar Aerosol Measurements made on the NASA P-3B, DC-8 and NSF C-130 Aircraft during TRACE-P and ACE-ASIA, An Overview

    Moore, K. G.; Clarke, A. D.; Kapustin, V.


    Two major aircraft experiments occurred off the Pacific coast of Asia during spring, 2001: the NASA sponsored Transport and Chemical Evolution over the Pacific (TRACE-P) and the NSF sponsored Aerosol Characterization Experiment-Asia (ACE-ASIA). Both experiments studied emissions from the Asian continent (biomass burning, urban/industrial pollution, and dust). TRACE-P focussed on trace gases and aerosol during March/April and was based primarily in Hong Kong and Yokota AFB, Japan and involved two aircraft: the NASA DC-8 and the NASA P3-B. ACE-ASIA focussed on aerosol and radiation during April/May and was based in Iwakuni MCAS, Japan and involved n the NSF C-130. This paper will compare aerosol measurments from these aircraft including aerosol concentrations, size distributions (and integral properties), chemistry, and optical properties. Interagency cooperation helped coordinate five flights (three between the P3-B and DC-8, two between the P3-B and the C-130) where time was devoted to flying "wingtip-to-wingtip" (within 500 m, typically less) for inter-comparison of measurements. These inter-comparisons included 12 horizontal legs and 13 vertical profiles allowing for comparison of data at numerous altitudes and conditions. Time series of various parameters for the inter-comparison portion of each flight showed that even when there was disagreement between the absolute value of a particular measurement, trends in the data were usually duplicated. Best overall agreement was for the CN concentrations, scattering and absorptions coefficients (especially for the C-130 and P3-B), DMA and OPC size distributions, and NH4 concentrations. Largest differences were often for parameters related to the super-micron aerosols, where aircraft sampling has difficulties (inlet losses-each plane had different inlets, losses in plumbing, etc.). Means and variances of comparable measurements for horizontal legs were calculated for each platform and allow for an assessment of

  9. Universal self-similarity of propagating populations

    Eliazar, Iddo; Klafter, Joseph


    This paper explores the universal self-similarity of propagating populations. The following general propagation model is considered: particles are randomly emitted from the origin of a d -dimensional Euclidean space and propagate randomly and independently of each other in space; all particles share a statistically common—yet arbitrary—motion pattern; each particle has its own random propagation parameters—emission epoch, motion frequency, and motion amplitude. The universally self-similar statistics of the particles’ displacements and first passage times (FPTs) are analyzed: statistics which are invariant with respect to the details of the displacement and FPT measurements and with respect to the particles’ underlying motion pattern. Analysis concludes that the universally self-similar statistics are governed by Poisson processes with power-law intensities and by the Fréchet and Weibull extreme-value laws.

  10. Assessing protein kinase target similarity

    Gani, Osman A; Thakkar, Balmukund; Narayanan, Dilip


    : focussed chemical libraries, drug repurposing, polypharmacological design, to name a few. Protein kinase target similarity is easily quantified by sequence, and its relevance to ligand design includes broad classification by key binding sites, evaluation of resistance mutations, and the use of surrogate......" of sequence and crystal structure information, with statistical methods able to identify key correlates to activity but also here, "the devil is in the details." Examples from specific repurposing and polypharmacology applications illustrate these points. This article is part of a Special Issue entitled...

  11. The NuSTAR Extragalactic Survey: First Direct Measurements of the Greater Than Or Similar To 10 Kev X-Ray Luminosity Function For Active Galactic Nuclei At z > 0.1

    Aird, J.; Alexander, D. M.; Ballantyne, D. R.;


    number of sources in our sample, leading to small, systematic differences in our binned estimates of the XLF. Adopting a model with a lower intrinsic fraction of Compton-thick sources and a larger population of sources with column densities NH ∼1023-24 cm-2 or a model with stronger Compton reflection......We present the first direct measurements of the rest-frame 10-40 keV X-ray luminosity function (XLF) of active galactic nuclei (AGNs) based on a sample of 94 sources at 0.1 sources in the Nuclear Spectroscopic Telescope Array (NuSTAR) extragalactic survey...... component (with a relative normalization of R ∼ 2 at all luminosities) can bring extrapolations of the XLF from 2-10 keV into agreement with our NuSTAR sample. Ultimately, X-ray spectral analysis of the NuSTAR sources is required to break this degeneracy between the distribution of absorbing column...

  12. Perceived and actual similarities in biological and adoptive families: does perceived similarity bias genetic inferences?

    Scarr, S; Scarf, E; Weinberg, R A


    Critics of the adoption method to estimate the relative effects of genetic and environmental differences on behavioral development claim that important biases are created by the knowledge of biological relatedness or adoptive status. Since the 1950s, agency policy has led to nearly all adopted children knowing that they are adopted. To test the hypothesis that knowledge of biological or adoptive status influences actual similarity, we correlated absolute differences in objective test scores with ratings of similarity by adolescents and their parents in adoptive and biological families. Although biological family members see themselves as more similar than adoptive family members, there are also important generational and gender differences in perceived similarity that cut across family type. There is moderate agreement among family members on the degree of perceived similarity, but there is no correlation between perceived and actual similarity in intelligence or temperament. However, family members are more accurate about shared social attitudes. Knowledge of adoptive or biological relatedness is related to the degree of perceived similarity, but perceptions of similarity are not related to objective similarities and thus do not constitute a bias in comparisons of measured differences in intelligence or temperament in adoptive and biological families.

  13. Mechanisms for similarity based cooperation

    Traulsen, A.


    Cooperation based on similarity has been discussed since Richard Dawkins introduced the term “green beard” effect. In these models, individuals cooperate based on an aribtrary signal (or tag) such as the famous green beard. Here, two different models for such tag based cooperation are analysed. As neutral drift is important in both models, a finite population framework is applied. The first model, which we term “cooperative tags” considers a situation in which groups of cooperators are formed by some joint signal. Defectors adopting the signal and exploiting the group can lead to a breakdown of cooperation. In this case, conditions are derived under which the average abundance of the more cooperative strategy exceeds 50%. The second model considers a situation in which individuals start defecting towards others that are not similar to them. This situation is termed “defective tags”. It is shown that in this case, individuals using tags to cooperate exclusively with their own kind dominate over unconditional cooperators.

  14. Learning Style Similarity for Searching Infographics

    Saleh, Babak; Dontcheva, Mira; Hertzmann, Aaron; Liu, Zhicheng


    Infographics are complex graphic designs integrating text, images, charts and sketches. Despite the increasing popularity of infographics and the rapid growth of online design portfolios, little research investigates how we can take advantage of these design resources. In this paper we present a method for measuring the style similarity between infographics. Based on human perception data collected from crowdsourced experiments, we use computer vision and machine learning algorithms to learn ...

  15. Interneurons targeting similar layers receive synaptic inputs with similar kinetics.

    Cossart, Rosa; Petanjek, Zdravko; Dumitriu, Dani; Hirsch, June C; Ben-Ari, Yehezkel; Esclapez, Monique; Bernard, Christophe


    GABAergic interneurons play diverse and important roles in controlling neuronal network dynamics. They are characterized by an extreme heterogeneity morphologically, neurochemically, and physiologically, but a functionally relevant classification is still lacking. Present taxonomy is essentially based on their postsynaptic targets, but a physiological counterpart to this classification has not yet been determined. Using a quantitative analysis based on multidimensional clustering of morphological and physiological variables, we now demonstrate a strong correlation between the kinetics of glutamate and GABA miniature synaptic currents received by CA1 hippocampal interneurons and the laminar distribution of their axons: neurons that project to the same layer(s) receive synaptic inputs with similar kinetics distributions. In contrast, the kinetics distributions of GABAergic and glutamatergic synaptic events received by a given interneuron do not depend upon its somatic location or dendritic arborization. Although the mechanisms responsible for this unexpected observation are still unclear, our results suggest that interneurons may be programmed to receive synaptic currents with specific temporal dynamics depending on their targets and the local networks in which they operate.

  16. Semantically enabled image similarity search

    Casterline, May V.; Emerick, Timothy; Sadeghi, Kolia; Gosse, C. A.; Bartlett, Brent; Casey, Jason


    Georeferenced data of various modalities are increasingly available for intelligence and commercial use, however effectively exploiting these sources demands a unified data space capable of capturing the unique contribution of each input. This work presents a suite of software tools for representing geospatial vector data and overhead imagery in a shared high-dimension vector or embedding" space that supports fused learning and similarity search across dissimilar modalities. While the approach is suitable for fusing arbitrary input types, including free text, the present work exploits the obvious but computationally difficult relationship between GIS and overhead imagery. GIS is comprised of temporally-smoothed but information-limited content of a GIS, while overhead imagery provides an information-rich but temporally-limited perspective. This processing framework includes some important extensions of concepts in literature but, more critically, presents a means to accomplish them as a unified framework at scale on commodity cloud architectures.




    Full Text Available act: Similarities between the accounting of companies and territorial administrative units accounting are the following: organizing double entry accounting; accounting method both in terms of fundamental theoretical principles and specific practical tools. The differences between the accounting of companies and of territorial administrative units refer to: the accounting of territorial administrative units includes besides general accounting (financial also budgetary accounting, and the accounts system of the budgetary accounting is completely different from that of companies; financial statements of territorial administrative units to which leaders are not main authorizing officers are submitted to the hierarchically superior body (not at MPF; the accounts of territorial administrative units are opened at treasury and financial institutions, accounts at commercial banks being prohibited; equity accounts in territorial administrative units are structured into groups of funds; long term debts have a specific structure in territorial administrative units (internal local public debt and external local public debt.

  18. Features Based Text Similarity Detection

    Kent, Chow Kok


    As the Internet help us cross cultural border by providing different information, plagiarism issue is bound to arise. As a result, plagiarism detection becomes more demanding in overcoming this issue. Different plagiarism detection tools have been developed based on various detection techniques. Nowadays, fingerprint matching technique plays an important role in those detection tools. However, in handling some large content articles, there are some weaknesses in fingerprint matching technique especially in space and time consumption issue. In this paper, we propose a new approach to detect plagiarism which integrates the use of fingerprint matching technique with four key features to assist in the detection process. These proposed features are capable to choose the main point or key sentence in the articles to be compared. Those selected sentence will be undergo the fingerprint matching process in order to detect the similarity between the sentences. Hence, time and space usage for the comparison process is r...

  19. 基于区间数度量的区间值模糊集合的归一化距离、相似度、模糊度和包含度的关系研究%Relationship among the Normalized Distance, the Similarity Measure,the Entropy and the Inclusion Measure of Interval-valued Fuzzy Sets Based on Interval-number Measurement

    曾文艺; 赵宜宾


    区间值模糊集合的距离、相似度、模糊度和包含度及其关系研究是区间值模糊集合的一个研究热点.考虑到区间值模糊集合所表示信息的丰富性,本文使用区间数而非实数来刻画区间值模糊集合的距离,首先给出基于区间数度量的区间值模糊集合的归一化距离的公理化定义,然后通过五个定理详细研究了基于公理化定义的区间值模糊集合的归一化距离、相似度、模糊度和包含度之间的相互转换关系,最后,给出了若干公式来计算基于区间数度量的区间值模糊集合的相似度、模糊度和包含度.这些结论,一方面丰富了区间值模糊集合的信息测度(距离、相似度、模糊度和包含度)的内容,另一方面也为区间值模糊集合的近似推理、决策分析、模式识别等领域的应用提供了新方法和新理论.%The relationship research among the normalized distance, the similarity measure, the entropy and the inclusion measure of interval-valued fuzzy sets is a hot topic. Considering that the interval-valued fuzzy set includes more information than the ordinary fuzzy set. In this paper, we introduce an axiomatic definition of the normalized distance of the interval-valued fuzzy sets based on the interval-number measurement, investigate the relationship among the normalized distance, the similarity measure, the entropy and the inclusion measure of the interval-valued fuzzy sets in detail, prove five theorems that the normalized distance, the similarity measure, the entropy and the inclusion measure of the interval-valued fuzzy sets can be transformed by each other based on their axiomatic definitions and propose some formulas to calculate the similarity measure, the entropy and the inclusion measure of the interval-valued fuzzy sets. These conclusions can be applied in many fields such as approximate reasoning, decision-making analysis, pattern recognition and so on.

  20. Compressive Sequential Learning for Action Similarity Labeling.

    Qin, Jie; Liu, Li; Zhang, Zhaoxiang; Wang, Yunhong; Shao, Ling


    Human action recognition in videos has been extensively studied in recent years due to its wide range of applications. Instead of classifying video sequences into a number of action categories, in this paper, we focus on a particular problem of action similarity labeling (ASLAN), which aims at verifying whether a pair of videos contain the same type of action or not. To address this challenge, a novel approach called compressive sequential learning (CSL) is proposed by leveraging the compressive sensing theory and sequential learning. We first project data points to a low-dimensional space by effectively exploring an important property in compressive sensing: the restricted isometry property. In particular, a very sparse measurement matrix is adopted to reduce the dimensionality efficiently. We then learn an ensemble classifier for measuring similarities between pairwise videos by iteratively minimizing its empirical risk with the AdaBoost strategy on the training set. Unlike conventional AdaBoost, the weak learner for each iteration is not explicitly defined and its parameters are learned through greedy optimization. Furthermore, an alternative of CSL named compressive sequential encoding is developed as an encoding technique and followed by a linear classifier to address the similarity-labeling problem. Our method has been systematically evaluated on four action data sets: ASLAN, KTH, HMDB51, and Hollywood2, and the results show the effectiveness and superiority of our method for ASLAN.

  1. Gene functional similarity search tool (GFSST

    Russo James J


    Full Text Available Abstract Background With the completion of the genome sequences of human, mouse, and other species and the advent of high throughput functional genomic research technologies such as biomicroarray chips, more and more genes and their products have been discovered and their functions have begun to be understood. Increasing amounts of data about genes, gene products and their functions have been stored in databases. To facilitate selection of candidate genes for gene-disease research, genetic association studies, biomarker and drug target selection, and animal models of human diseases, it is essential to have search engines that can retrieve genes by their functions from proteome databases. In recent years, the development of Gene Ontology (GO has established structured, controlled vocabularies describing gene functions, which makes it possible to develop novel tools to search genes by functional similarity. Results By using a statistical model to measure the functional similarity of genes based on the Gene Ontology directed acyclic graph, we developed a novel Gene Functional Similarity Search Tool (GFSST to identify genes with related functions from annotated proteome databases. This search engine lets users design their search targets by gene functions. Conclusion An implementation of GFSST which works on the UniProt (Universal Protein Resource for the human and mouse proteomes is available at GFSST Web Server. GFSST provides functions not only for similar gene retrieval but also for gene search by one or more GO terms. This represents a powerful new approach for selecting similar genes and gene products from proteome databases according to their functions.




    Full Text Available İnternet üzerinden sanal firmalar aracılığıyla alışveriş yapmak artan ilgi görmektedir. Müşteriler beğenebilecekleri ürünleri zaman ve/veya paralarını boşa harcamadan satın almak isterler. Müşterilerinebu süreçte yardımcı olmak için birçok sanal şirket öneri sistemlerinden yararlanıp müşterilerine en-iyi-N önerileri sunmaktadır. En benzer varlıkları belirlemede kullanılan benzerlik ölçütleri en-iyi-N önerileri hizmetinin genel performansını etkileyebilir. İkili değerler üzerinde işlem yapan birçok benzerlikölçütü bulunmasına rağmen bunların en-iyi-N önerilerinin doğruluğu ve çevrimiçi performansı üzerindeki etkisi detaylı biçimde çalışılmamıştır.Bu çalışmada iyi bilinen yedi adet ikili oy-tabanlı benzerlik ölçütü en-iyi-N önerileri için hem doğruluk hem de çevrimiçi performans kriterleri bakımından irdelendi. Bu ölçütleri doğruluk ve verimlilik açısından karşılaştırabilmek için iyi bilinen iki gerçek veri seti üzerinde birçok deneyler yapıldı. Ayrıca en-iyi-N öneri algoritması en benzer kullanıcıların verisi öneri üretilirken kullanılacakşekilde değiştirildi. Değişen kontrol parametrelerinin performansa olan etkisi araştırıldı. Deneysel sonuçlar doğruluk ve performans açısından analiz edilerek bazı öneriler sunuldu.

  3. Parallel Implementation of Similarity Measures on GPU Architecture using CUDA

    Kuldeep Yadav


    Full Text Available Image processing and pattern recognition algorithms take more time for execution on a single core processor. Graphics Processing Unit (GPU is more popular now-a-days due to their speed, programmability,low cost and more inbuilt execution cores in it. Most of the researchers started work to use GPUs as a processing unit with a single core computer system to speedup execution of algorithms and in the field of Content based medical image retrieval (CBMIR, Euclidean distance and Mahalanobis plays an important role in retrieval of images. Distance formula is important because it plays an important role in matching the images. In this research work, we parallelized Euclidean distance algorithm on CUDA. CPU with Intel® Dual-CoreE5500 @ 2.80GHz and 2.0 GB of main memory which run on Windows XP (SP2. The next step was to convert this code in GPU format i.e. to run this program on GPU NVIDIA GeForce series 9500GT model having 1023MB of video memory of DDR2 type and bus width of 64bit. The graphic driver we used is of 270.81 series of NVIDIA. In this paper both the CPU and GPU version of algorithm is being implemented on the MATLABR2010. The CPU version of the algorithm is being analyzed in simple MATLAB but the GPU version is being implemented with the help of intermediate software Jacket-win-1.3.0. For using Jacket, we have to make some changes in our source code so to make the CPU and GPU to work simultaneously and thus reducing the overall computational acceleration . Our work employs extensive usage of highly multithreaded architecture of multicored GPU. An efficient use of shared memory is required to optimize parallel reduction in Compute Unified Device Architecture (CUDA, Graphic Processing Units (GPUs are emerging as powerful parallel systems at a cheap cost of a few thousand rupees.

  4. Relation based Measuring of Semantic Similarity for Web Documents

    Poonam Chahal; Manjeet Singh; Suresh Kumar


    .... The reason for this is that the information exists on web is in natural language. The layered architecture semantic web is given by Tim Berner Lee to overcome the issues of information retrieval...

  5. Inducing a measure of phonetic similarity from pronunciation variation

    Wieling, Martijn; Margaretha, Eliza; Nerbonne, John


    Structuralists famously observed that language is "un systeme oil tout se tient" (Meillet, 1903, p.407), insisting that the system of relations of linguistic units was more important than their concrete content. This study attempts to derive content from relations, in particular phonetic (acoustic)

  6. An IFS-based similarity measure to index electroencephalograms

    Berrada, Ghita; Zhexue Huang, Joshua; de Keijzer, Ander; Cao, Longbing; Srivastava, Jaideep

    EEG is a very useful neurological diagnosis tool, inasmuch as the EEG exam is easy to perform and relatively cheap. However, it generates large amounts of data, not easily interpreted by a clinician. Several methods have been tried to automate the interpretation of EEG recordings. However, their

  7. Spousal similarity in coping and depressive symptoms over 10 years.

    Holahan, Charles J; Moos, Rudolf H; Moerkbak, Marie L; Cronkite, Ruth C; Holahan, Carole K; Kenney, Brent A


    Following a baseline sample of 184 married couples over 10 years, the present study develops a broadened conceptualization of linkages in spouses' functioning by examining similarity in coping as well as in depressive symptoms. Consistent with hypotheses, results demonstrated (a) similarity in depressive symptoms within couples across 10 years, (b) similarity in coping within couples over 10 years, and (c) the role of coping similarity in strengthening depressive similarity between spouses. Spousal similarity in coping was evident for a composite measure of percent approach coping as well as for component measures of approach and avoidance coping. The role of coping similarity in strengthening depressive symptom similarity was observed for percent approach coping and for avoidance coping. These findings support social contextual models of psychological adjustment that emphasize the importance of dynamic interdependencies between individuals in close relationships.

  8. The Haar Wavelet Transform in the Time Series Similarity Paradigm

    Z.R. Struzik; A.P.J.M. Siebes (Arno)


    textabstractSimilarity measures play an important role in many data mining algorithms. To allow the use of such algorithms on non-standard databases, such as databases of financial time series, their similarity measure has to be defined. We present a simple and powerful technique which allows for

  9. Measuring $\

    Mitchell, Jessica Sarah [Univ. of Cambridge (United Kingdom)


    The MINOS Experiment consists of two steel-scintillator calorimeters, sampling the long baseline NuMI muon neutrino beam. It was designed to make a precise measurement of the ‘atmospheric’ neutrino mixing parameters, Δm2 atm. and sin2 (2 atm.). The Near Detector measures the initial spectrum of the neutrino beam 1km from the production target, and the Far Detector, at a distance of 735 km, measures the impact of oscillations in the neutrino energy spectrum. Work performed to validate the quality of the data collected by the Near Detector is presented as part of this thesis. This thesis primarily details the results of a vμ disappearance analysis, and presents a new sophisticated fitting software framework, which employs a maximum likelihood method to extract the best fit oscillation parameters. The software is entirely decoupled from the extrapolation procedure between the detectors, and is capable of fitting multiple event samples (defined by the selections applied) in parallel, and any combination of energy dependent and independent sources of systematic error. Two techniques to improve the sensitivity of the oscillation measurement were also developed. The inclusion of information on the energy resolution of the neutrino events results in a significant improvement in the allowed region for the oscillation parameters. The degree to which sin2 (2θ )= 1.0 could be disfavoured with the exposure of the current dataset if the true mixing angle was non-maximal, was also investigated, with an improved neutrino energy reconstruction for very low energy events. The best fit oscillation parameters, obtained by the fitting software and incorporating resolution information were: | Δm2| = 2.32+0.12 -0.08×10-3 eV2 and sin2 (2θ ) > 0.90(90% C.L.). The analysis provides the current world best measurement of the atmospheric neutrino mass

  10. Determination of subjective and objective similarity for pairs of masses on mammograms for selection of similar images

    Muramatsu, Chisako; Li, Qiang; Schmidt, Robert A.; Shiraishi, Junji; Suzuki, Kenji; Newstead, Gillian M.; Doi, Kunio


    Presentation of images with known pathology similar to that of a new unknown lesion would be helpful for radiologists in their diagnosis of breast cancer. In order to find images that are really similar and useful to radiologists, we determined the radiologists' subjective similarity ratings for pairs of masses, and investigated objective similarity measures that would agree well with the subjective ratings. Fifty sets of images, each of which included one image in the center and six other images to be compared with the center image, were selected; thus, 300 pairs of images were prepared. Ten breast radiologists provided the subjective similarity ratings for each image pair in terms of the overall impression for diagnosis. The objective similarity measures based on cross-correlation of the images, differences in feature values, and psychophysical measures by use of an artificial neural network were determined. The objective measures based on the cross-correlation were found to be not correlated with the subjective similarity ratings (r 0.40). When several image features were used, the differences-based objective measure was moderately correlated (r = 0.59) with the subjective ratings. The relatively high correlation coefficient (r = 0.74) was obtained for the psychophysical similarity measure. The similar images selected by use of the psychophysical measure can be useful to radiologists in the diagnosis of breast cancer.

  11. Similarly shaped letters evoke similar colors in grapheme-color synesthesia.

    Brang, David; Rouw, Romke; Ramachandran, V S; Coulson, Seana


    Grapheme-color synesthesia is a neurological condition in which viewing numbers or letters (graphemes) results in the concurrent sensation of color. While the anatomical substrates underlying this experience are well understood, little research to date has investigated factors influencing the particular colors associated with particular graphemes or how synesthesia occurs developmentally. A recent suggestion of such an interaction has been proposed in the cascaded cross-tuning (CCT) model of synesthesia, which posits that in synesthetes connections between grapheme regions and color area V4 participate in a competitive activation process, with synesthetic colors arising during the component-stage of grapheme processing. This model more directly suggests that graphemes sharing similar component features (lines, curves, etc.) should accordingly activate more similar synesthetic colors. To test this proposal, we created and regressed synesthetic color-similarity matrices for each of 52 synesthetes against a letter-confusability matrix, an unbiased measure of visual similarity among graphemes. Results of synesthetes' grapheme-color correspondences indeed revealed that more similarly shaped graphemes corresponded with more similar synesthetic colors, with stronger effects observed in individuals with more intense synesthetic experiences (projector synesthetes). These results support the CCT model of synesthesia, implicate early perceptual mechanisms as driving factors in the elicitation of synesthetic hues, and further highlight the relationship between conceptual and perceptual factors in this phenomenon. Copyright © 2011 Elsevier Ltd. All rights reserved.

  12. On quasi-similarity of subnormal operators


    For a compact subset K in the complex plane, let Rat(K) denote the set of the rational functions with poles off K. Given a finite positive measure with support contained in K, let R2(K,v) denote the closure of Rat(K) in L2(v) and let Sv denote the operator of multiplication by the independent variable z on R2(K, v), that is, Svf = zf for every f∈R2(K, v). SupposeΩis a bounded open subset in the complex plane whose complement has finitely many components and suppose Rat(Ω) is dense in the Hardy space H2(Ω). Letσdenote a harmonic measure forΩ. In this work, we characterize all subnormal operators quasi-similar to Sσ, the operators of the multiplication by z on R2(Ω,σ). We show that for a given v supported onΩ, Sv is quasi-similar to Sσif and only if v/■Ω■σ and log(dv/dσ)∈L1(σ). Our result extends a well-known result of Clary on the unit disk.

  13. On quasi-similarity of subnormal operators

    Zhi-jian QIU


    For a compact subset K in the complex plane, let Rat(K) denote the set of the rational functions with poles off K. Given a finite positive measure with support contained in K, let R2(K, ν)denote the closure of Rat(K) in L2(ν) and let Sν denote the operator of multiplication by the independent variable z on R2(K,ν), that is, Sνf = zf for every f ∈ R2(K,ν).Suppose Ω is a bounded open subset in the complex plane whose complement has finitely many components and suppose Rat(Ω) is dense in the Hardy space H2(Ω). Let σ denote a harmonic measure for Ω. In this work, we characterize all subnormal operators quasi-similar to Sσ, the operators of the multiplication by z on R2 (-Ω, σ). We show that for a given ν supported on -Ω, Sν is quasi-similar to Sσ if and only if ν|(a)Ω<σ and log (dν/dσ)∈ L1 (σ). Our result extends a well-known result of Clary on the unit disk.

  14. A toolbox for representational similarity analysis.

    Hamed Nili


    Full Text Available Neuronal population codes are increasingly being investigated with multivariate pattern-information analyses. A key challenge is to use measured brain-activity patterns to test computational models of brain information processing. One approach to this problem is representational similarity analysis (RSA, which characterizes a representation in a brain or computational model by the distance matrix of the response patterns elicited by a set of stimuli. The representational distance matrix encapsulates what distinctions between stimuli are emphasized and what distinctions are de-emphasized in the representation. A model is tested by comparing the representational distance matrix it predicts to that of a measured brain region. RSA also enables us to compare representations between stages of processing within a given brain or model, between brain and behavioral data, and between individuals and species. Here, we introduce a Matlab toolbox for RSA. The toolbox supports an analysis approach that is simultaneously data- and hypothesis-driven. It is designed to help integrate a wide range of computational models into the analysis of multichannel brain-activity measurements as provided by modern functional imaging and neuronal recording techniques. Tools for visualization and inference enable the user to relate sets of models to sets of brain regions and to statistically test and compare the models using nonparametric inference methods. The toolbox supports searchlight-based RSA, to continuously map a measured brain volume in search of a neuronal population code with a specific geometry. Finally, we introduce the linear-discriminant t value as a measure of representational discriminability that bridges the gap between linear decoding analyses and RSA. In order to demonstrate the capabilities of the toolbox, we apply it to both simulated and real fMRI data. The key functions are equally applicable to other modalities of brain-activity measurement. The

  15. A fingerprint identification algorithm by clustering similarity

    TIAN Jie; HE Yuliang; CHEN Hong; YANG Xin


    This paper introduces a fingerprint identification algorithm by clustering similarity with the view to overcome the dilemmas encountered in fingerprint identification.To decrease multi-spectrum noises in a fingerprint, we first use a dyadic scale space (DSS) method for image enhancement. The second step describes the relative features among minutiae by building a minutia-simplex which contains a pair of minutiae and their local associated ridge information, with its transformation-variant and invariant relative features applied for comprehensive similarity measurement and for parameter estimation respectively. The clustering method is employed to estimate the transformation space.Finally, multi-resolution technique is used to find an optimal transformation model for getting the maximal mutual information between the input and the template features. The experimental results including the performance evaluation by the 2nd International Verification Competition in 2002 (FVC2002), over the four fingerprint databases of FVC2002 indicate that our method is promising in an automatic fingerprint identification system (AFIS).

  16. Gait Recognition Using Image Self-Similarity

    Cutler Ross G


    Full Text Available Gait is one of the few biometrics that can be measured at a distance, and is hence useful for passive surveillance as well as biometric applications. Gait recognition research is still at its infancy, however, and we have yet to solve the fundamental issue of finding gait features which at once have sufficient discrimination power and can be extracted robustly and accurately from low-resolution video. This paper describes a novel gait recognition technique based on the image self-similarity of a walking person. We contend that the similarity plot encodes a projection of gait dynamics. It is also correspondence-free, robust to segmentation noise, and works well with low-resolution video. The method is tested on multiple data sets of varying sizes and degrees of difficulty. Performance is best for fronto-parallel viewpoints, whereby a recognition rate of 98% is achieved for a data set of 6 people, and 70% for a data set of 54 people.

  17. Local Community Detection Using Link Similarity

    Ying-Jun Wu; Han Huang; Zhi-Feng Hao; Feng Chen


    Exploring local community structure is an appealing problem that has drawn much recent attention in the area of social network analysis.As the complete information of network is often difficult to obtain,such as networks of web pages,research papers and Facebook users,people can only detect community structure from a certain source vertex with limited knowledge of the entire graph.The existing approaches do well in measuring the community quality,but they are largely dependent on source vertex and putting too strict policy in agglomerating new vertices.Moreover,they have predefined parameters which are difficult to obtain.This paper proposes a method to find local community structure by analyzing link similarity between the community and the vertex.Inspired by the fact that elements in the same community are more likely to share common links,we explore community structure heuristically by giving priority to vertices which have a high link similarity with the community.A three-phase process is also used for the sake of improving quality of community structure.Experimental results prove that our method performs effectively not only in computer-generated graphs but also in real-world graphs.

  18. Cultural similarity, cultural competence, and nurse workforce diversity.

    McGinnis, Sandra L; Brush, Barbara L; Moore, Jean


    Proponents of health workforce diversity argue that increasing the number of minority health care providers will enhance cultural similarity between patients and providers as well as the health system's capacity to provide culturally competent care. Measuring cultural similarity has been difficult, however, given that current benchmarks of workforce diversity categorize health workers by major racial/ethnic classifications rather than by cultural measures. This study examined the use of national racial/ethnic categories in both patient and registered nurse (RN) populations and found them to be a poor indicator of cultural similarity. Rather, we found that cultural similarity between RN and patient populations needs to be established at the level of local labor markets and broadened to include other cultural parameters such as country of origin, primary language, and self-identified ancestry. Only then can the relationship between cultural similarity and cultural competence be accurately determined and its outcomes measured.

  19. A Minimum Spanning Tree Representation of Anime Similarities

    Wibowo, Canggih Puspo


    In this work, a new way to represent Japanese animation (anime) is presented. We applied a minimum spanning tree to show the relation between anime. The distance between anime is calculated through three similarity measurements, namely crew, score histogram, and topic similarities. Finally the centralities are also computed to reveal the most significance anime. The result shows that the minimum spanning tree can be used to determine the similarity anime. Furthermore, by using centralities ca...

  20. Similarity Predicts Liking in 3-Year-Old Children

    Fawcett, Christine A.; Markson, Lori


    Two studies examined the influence of similarity on 3-year-old children's initial liking of their peers. Children were presented with pairs of childlike puppets who were either similar or dissimilar to them on a specified dimension and then were asked to choose one of the puppets to play with as a measure of liking. Children selected the puppet…

  1. Similarity Predicts Liking in 3-Year-Old Children

    Fawcett, Christine A.; Markson, Lori


    Two studies examined the influence of similarity on 3-year-old children's initial liking of their peers. Children were presented with pairs of childlike puppets who were either similar or dissimilar to them on a specified dimension and then were asked to choose one of the puppets to play with as a measure of liking. Children selected the puppet…

  2. The integration of similar clinical research data collection instruments.

    Cohen, Dorothy B; Frawley, Sandra J; Shifman, Mark A; Miller, Perry L; Brandt, Cynthia


    We devised an algorithm for integrating similar clinical research data collection instruments to create a common measurement instrument. We tested this algorithm using questions from several similar surveys. We encountered differing levels of granularity among questions and responses across surveys resulting in either the loss of granularity or data. This algorithm may make survey integration more systematic and efficient.

  3. Using Information Content to Evaluate Semantic Similarity in a Taxonomy

    Resnik, P


    This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content. Experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r = 0.90 for human subjects performing the same task), and significantly better than the traditional edge counting approach (r = 0.66).

  4. Reconstructing propagation networks with temporal similarity metrics

    Liao, Hao


    Node similarity is a significant property driving the growth of real networks. In this paper, based on the observed spreading results we apply the node similarity metrics to reconstruct propagation networks. We find that the reconstruction accuracy of the similarity metrics is strongly influenced by the infection rate of the spreading process. Moreover, there is a range of infection rate in which the reconstruction accuracy of some similarity metrics drops to nearly zero. In order to improve the similarity-based reconstruction method, we finally propose a temporal similarity metric to take into account the time information of the spreading. The reconstruction results are remarkably improved with the new method.

  5. Conditional Similarity Solutions of the Boussinesq Equation

    TANG Xiao-Yan; LIN Ji; LOU Sen-Yue


    The direct method proposed by Clarkson and Kruskal is modified to obtain some conditional similarity solutions of a nonlinear physics model. Taking the (1+ 1 )-dimensional Boussinesq equation as a simple example, six types of conditional similarity reductions are obtained.

  6. Average is Boring: How Similarity Kills a Meme's Success

    Coscia, Michele


    Every day we are exposed to different ideas, or memes, competing with each other for our attention. Previous research explained popularity and persistence heterogeneity of memes by assuming them in competition for limited attention resources, distributed in a heterogeneous social network. Little has been said about what characteristics make a specific meme more likely to be successful. We propose a similarity-based explanation: memes with higher similarity to other memes have a significant disadvantage in their potential popularity. We employ a meme similarity measure based on semantic text analysis and computer vision to prove that a meme is more likely to be successful and to thrive if its characteristics make it unique. Our results show that indeed successful memes are located in the periphery of the meme similarity space and that our similarity measure is a promising predictor of a meme success.

  7. Average is Boring: How Similarity Kills a Meme's Success

    Coscia, Michele


    Every day we are exposed to different ideas, or memes, competing with each other for our attention. Previous research explained popularity and persistence heterogeneity of memes by assuming them in competition for limited attention resources, distributed in a heterogeneous social network. Little has been said about what characteristics make a specific meme more likely to be successful. We propose a similarity-based explanation: memes with higher similarity to other memes have a significant disadvantage in their potential popularity. We employ a meme similarity measure based on semantic text analysis and computer vision to prove that a meme is more likely to be successful and to thrive if its characteristics make it unique. Our results show that indeed successful memes are located in the periphery of the meme similarity space and that our similarity measure is a promising predictor of a meme success. PMID:25257730

  8. Average is boring: how similarity kills a meme's success.

    Coscia, Michele


    Every day we are exposed to different ideas, or memes, competing with each other for our attention. Previous research explained popularity and persistence heterogeneity of memes by assuming them in competition for limited attention resources, distributed in a heterogeneous social network. Little has been said about what characteristics make a specific meme more likely to be successful. We propose a similarity-based explanation: memes with higher similarity to other memes have a significant disadvantage in their potential popularity. We employ a meme similarity measure based on semantic text analysis and computer vision to prove that a meme is more likely to be successful and to thrive if its characteristics make it unique. Our results show that indeed successful memes are located in the periphery of the meme similarity space and that our similarity measure is a promising predictor of a meme success.

  9. Homology, similarity, and identity in peptide epitope immunodefinition.

    Kanduc, Darja


    The tendency to use the terms homology, similarity, and identity interchangeably persists in comparative biology. When translated to immunology, overlapping the concepts of homology, similarity, and identity complicates the exact definition of the self-nonself dichotomy and, in particular, affects immunopeptidomics, an emerging field aimed at cataloging and distinguishing immunoreactive peptide epitopes from silent nonreactive amino acid sequences. The definition of similar/dissimilar peptides in immunology is discussed with special attention to the analysis of immunological (dis)similarity between two or more protein sequences that equates to measuring sequence similarity with the use of a proper measurement unit such as a length determinant. Copyright © 2012 European Peptide Society and John Wiley & Sons, Ltd.

  10. On the Relationship between the Posterior and Optimal Similarity

    Breuel, Thomas M


    For a classification problem described by the joint density $P(\\omega,x)$, models of $P(\\omega\\eq\\omega'|x,x')$ (the ``Bayesian similarity measure'') have been shown to be an optimal similarity measure for nearest neighbor classification. This paper analyzes demonstrates several additional properties of that conditional distribution. The paper first shows that we can reconstruct, up to class labels, the class posterior distribution $P(\\omega|x)$ given $P(\\omega\\eq\\omega'|x,x')$, gives a procedure for recovering the class labels, and gives an asymptotically Bayes-optimal classification procedure. It also shows, given such an optimal similarity measure, how to construct a classifier that outperforms the nearest neighbor classifier and achieves Bayes-optimal classification rates. The paper then analyzes Bayesian similarity in a framework where a classifier faces a number of related classification tasks (multitask learning) and illustrates that reconstruction of the class posterior distribution is not possible in...

  11. A Signal Processing Method to Explore Similarity in Protein Flexibility

    Simina Vasilache


    Full Text Available Understanding mechanisms of protein flexibility is of great importance to structural biology. The ability to detect similarities between proteins and their patterns is vital in discovering new information about unknown protein functions. A Distance Constraint Model (DCM provides a means to generate a variety of flexibility measures based on a given protein structure. Although information about mechanical properties of flexibility is critical for understanding protein function for a given protein, the question of whether certain characteristics are shared across homologous proteins is difficult to assess. For a proper assessment, a quantified measure of similarity is necessary. This paper begins to explore image processing techniques to quantify similarities in signals and images that characterize protein flexibility. The dataset considered here consists of three different families of proteins, with three proteins in each family. The similarities and differences found within flexibility measures across homologous proteins do not align with sequence-based evolutionary methods.

  12. Synthetic and Biopolymer Gels - Similarities and Difference.

    Horkay, Ferenc


    Ion exchange plays a central role in a variety of physiological processes, such as nerve excitation, muscle contraction and cell locomotion. Hydrogels can be used as model systems for identifying fundamental chemical and physical interactions that govern structure formation, phase transition, etc. in biopolymer systems. Polyelectrolyte gels are particularly well-suited to study ion-polymer interactions because their structure and physical-chemical properties (charge density, crosslink density, etc) can be carefully controlled. They are sensitive to different external stimuli such as temperature, ionic composition and pH. Surprisingly few investigations have been made on polyelectrolyte gels in salt solutions containing both monovalent and multivalent cations. We have developed an experimental approach that combines small angle neutron scattering and osmotic swelling pressure measurements. The osmotic pressure exerted on a macroscopic scale is a consequence of changes occurring at a molecular level. The intensity of the neutron scattering signal, which provides structural information as a function of spatial resolution, is directly related to the osmotic pressure. We have found a striking similarity in the scattering and osmotic behavior of polyacrylic acid gels and DNA gels swollen in nearly physiological salt solutions. Addition of calcium ions to both systems causes a sudden volume change. This volume transition, which occurs when the majority of the sodium counterions are replaced by calcium ions, is reversible. Such reversibility implies that the calcium ions are not strongly bound by the polyanion, but are free to move along the polymer chain, which allows these ions to form temporary bridges between negative charges on adjacent chains. Mechanical measurements reveal that the elastic modulus is practically unchanged in the calcium-containing gels, i.e., ion bridging is qualitatively different from covalent crosslinks.

  13. Testing Self-Similarity Through Lamperti Transformations

    Lee, Myoungji


    Self-similar processes have been widely used in modeling real-world phenomena occurring in environmetrics, network traffic, image processing, and stock pricing, to name but a few. The estimation of the degree of self-similarity has been studied extensively, while statistical tests for self-similarity are scarce and limited to processes indexed in one dimension. This paper proposes a statistical hypothesis test procedure for self-similarity of a stochastic process indexed in one dimension and multi-self-similarity for a random field indexed in higher dimensions. If self-similarity is not rejected, our test provides a set of estimated self-similarity indexes. The key is to test stationarity of the inverse Lamperti transformations of the process. The inverse Lamperti transformation of a self-similar process is a strongly stationary process, revealing a theoretical connection between the two processes. To demonstrate the capability of our test, we test self-similarity of fractional Brownian motions and sheets, their time deformations and mixtures with Gaussian white noise, and the generalized Cauchy family. We also apply the self-similarity test to real data: annual minimum water levels of the Nile River, network traffic records, and surface heights of food wrappings. © 2016, International Biometric Society.

  14. Similarity spectra analysis of high-performance jet aircraft noise.

    Neilsen, Tracianne B; Gee, Kent L; Wall, Alan T; James, Michael M


    Noise measured in the vicinity of an F-22A Raptor has been compared to similarity spectra found previously to represent mixing noise from large-scale and fine-scale turbulent structures in laboratory-scale jet plumes. Comparisons have been made for three engine conditions using ground-based sideline microphones, which covered a large angular aperture. Even though the nozzle geometry is complex and the jet is nonideally expanded, the similarity spectra do agree with large portions of the measured spectra. Toward the sideline, the fine-scale similarity spectrum is used, while the large-scale similarity spectrum provides a good fit to the area of maximum radiation. Combinations of the two similarity spectra are shown to match the data in between those regions. Surprisingly, a combination of the two is also shown to match the data at the farthest aft angle. However, at high frequencies the degree of congruity between the similarity and the measured spectra changes with engine condition and angle. At the higher engine conditions, there is a systematically shallower measured high-frequency slope, with the largest discrepancy occurring in the regions of maximum radiation.

  15. Molecular quantum similarity using conceptual DFT descriptors

    Patrick Bultinck; Ramon carbó-dorca


    This paper reports a Molecular Quantum Similarity study for a set of congeneric steroid molecules, using as basic similarity descriptors electron density ρ (r), shape function (r), the Fukui functions +(r) and -(r) and local softness +(r) and -(r). Correlations are investigated between similarity indices for each couple of descriptors used and compared to assess whether these different descriptors sample different information and to investigate what information is revealed by each descriptor.

  16. Similarity effects in visual working memory.

    Jiang, Yuhong V; Lee, Hyejin J; Asaad, Anthony; Remington, Roger


    Perceptual similarity is an important property of multiple stimuli. Its computation supports a wide range of cognitive functions, including reasoning, categorization, and memory recognition. It is important, therefore, to determine why previous research has found conflicting effects of inter-item similarity on visual working memory. Studies reporting a similarity advantage have used simple stimuli whose similarity varied along a featural continuum. Studies reporting a similarity disadvantage have used complex stimuli from either a single or multiple categories. To elucidate stimulus conditions for similarity effects in visual working memory, we tested memory for complex stimuli (faces) whose similarity varied along a morph continuum. Participants encoded 3 morphs generated from a single face identity in the similar condition, or 3 morphs generated from different face identities in the dissimilar condition. After a brief delay, a test face appeared at one of the encoding locations for participants to make a same/different judgment. Two experiments showed that similarity enhanced memory accuracy without changing the response criterion. These findings support previous computational models that incorporate featural variance as a component of working memory load. They delineate limitations of models that emphasize cortical resources or response decisions.

  17. A Measurement-track Association Algorithm Based on Similarity to Ideal Gray Correlation Projection of Multiple Criteria Decision-making%逼近理想灰关联投影多目标决策的点迹-航迹关联算法

    孙立炜; 王杰贵


    点迹-航迹数据关联是多目标跟踪的重要组成部分.将多目标决策思想和灰关联分析方法引入点迹-航迹关联的研究,提出了一种基于逼近理想灰关联投影多目标决策的点迹-航迹数据关联算法.仿真实验表明,该算法在密集多目标情况下有一定的实际应用价值.%Measurement-track data association is important in multi-target tracking. This paper introduces multiple criteria decision-making and grey relation analysis into measurement-track association, and proposes a measurement-track association algorithm based on similarity to ideal gray correlation projection of multiple criteria decision-making. The simulation results demonstrate that the algorithm has the value of application in circumstance of dense targets.

  18. Common neighbour structure and similarity intensity in complex networks

    Hou, Lei; Liu, Kecheng


    Complex systems as networks always exhibit strong regularities, implying underlying mechanisms governing their evolution. In addition to the degree preference, the similarity has been argued to be another driver for networks. Assuming a network is randomly organised without similarity preference, the present paper studies the expected number of common neighbours between vertices. A symmetrical similarity index is accordingly developed by removing such expected number from the observed common neighbours. The developed index can not only describe the similarities between vertices, but also the dissimilarities. We further apply the proposed index to measure of the influence of similarity on the wring patterns of networks. Fifteen empirical networks as well as artificial networks are examined in terms of similarity intensity and degree heterogeneity. Results on real networks indicate that, social networks are strongly governed by the similarity as well as the degree preference, while the biological networks and infrastructure networks show no apparent similarity governance. Particularly, classical network models, such as the Barabási-Albert model, the Erdös-Rényi model and the Ring Lattice, cannot well describe the social networks in terms of the degree heterogeneity and similarity intensity. The findings may shed some light on the modelling and link prediction of different classes of networks.

  19. On finding similar items in a stream of transactions

    Campagna, Andrea; Pagh, Rasmus


    While there has been a lot of work on finding frequent itemsets in transaction data streams, none of these solve the problem of finding similar pairs according to standard similarity measures. This paper is a first attempt at dealing with this, arguably more important, problem. We start out......(\\min\\{mb,n^k,(mb/\\varphi)^k\\})$ bits, where $mb$ is the number of items in the stream so far, $n$ is the number of distinct items and $\\varphi$ is a support threshold. To achieve any non-trivial space upper bound we must thus abandon a worst-case assumption on the data stream. We work under the model that the transactions come...... in random order, and show that surprisingly, not only is small-space similarity mining possible for the most common similarity measures, but the mining accuracy {\\em improves\\/} with the length of the stream for any fixed support threshold....

  20. Salient object detection: manifold-based similarity adaptation approach

    Zhou, Jingbo; Ren, Yongfeng; Yan, Yunyang; Gao, Shangbing


    A saliency detection algorithm based on manifold-based similarity adaptation is proposed. The proposed algorithm is divided into three steps. First, we segment an input image into superpixels, which are represented as the nodes in a graph. Second, a new similarity measurement is used in the proposed algorithm. The weight matrix of the graph, which indicates the similarities between the nodes, uses a similarity-based method. It also captures the manifold structure of the image patches, in which the graph edges are determined in a data adaptive manner in terms of both similarity and manifold structure. Then, we use local reconstruction method as a diffusion method to obtain the saliency maps. The objective function in the proposed method is based on local reconstruction, with which estimated weights capture the manifold structure. Experiments on four bench-mark databases demonstrate the accuracy and robustness of the proposed method.

  1. Protein structure similarity from principle component correlation analysis

    Chou James


    Full Text Available Abstract Background Owing to rapid expansion of protein structure databases in recent years, methods of structure comparison are becoming increasingly effective and important in revealing novel information on functional properties of proteins and their roles in the grand scheme of evolutionary biology. Currently, the structural similarity between two proteins is measured by the root-mean-square-deviation (RMSD in their best-superimposed atomic coordinates. RMSD is the golden rule of measuring structural similarity when the structures are nearly identical; it, however, fails to detect the higher order topological similarities in proteins evolved into different shapes. We propose new algorithms for extracting geometrical invariants of proteins that can be effectively used to identify homologous protein structures or topologies in order to quantify both close and remote structural similarities. Results We measure structural similarity between proteins by correlating the principle components of their secondary structure interaction matrix. In our approach, the Principle Component Correlation (PCC analysis, a symmetric interaction matrix for a protein structure is constructed with relationship parameters between secondary elements that can take the form of distance, orientation, or other relevant structural invariants. When using a distance-based construction in the presence or absence of encoded N to C terminal sense, there are strong correlations between the principle components of interaction matrices of structurally or topologically similar proteins. Conclusion The PCC method is extensively tested for protein structures that belong to the same topological class but are significantly different by RMSD measure. The PCC analysis can also differentiate proteins having similar shapes but different topological arrangements. Additionally, we demonstrate that when using two independently defined interaction matrices, comparison of their maximum

  2. Conceptual similarity promotes generalization of higher order fear learning

    Dunsmoor, Joseph E.; White, Allison J.; LaBar, Kevin S.


    We tested the hypothesis that conceptual similarity promotes generalization of conditioned fear. Using a sensory preconditioning procedure, three groups of subjects learned an association between two cues that were conceptually similar, unrelated, or mismatched. Next, one of the cues was paired with a shock. The other cue was then reintroduced to test for fear generalization, as measured by the skin conductance response. Results showed enhanced fear generalization that correlated with trait a...

  3. Mining Diagnostic Assessment Data for Concept Similarity

    Madhyastha, Tara; Hunt, Earl


    This paper introduces a method for mining multiple-choice assessment data for similarity of the concepts represented by the multiple choice responses. The resulting similarity matrix can be used to visualize the distance between concepts in a lower-dimensional space. This gives an instructor a visualization of the relative difficulty of concepts…

  4. Similar methodological analysis involving the user experience.

    Almeida e Silva, Caio Márcio; Okimoto, Maria Lúcia R L; Tanure, Raffaela Leane Zenni


    This article deals with the use of a protocol for analysis of similar methodological analysis related to user experience. For both, were selected articles recounting experiments in the area. They were analyze based on the similar analysis protocol and finally, synthesized and associated.

  5. Outsourced Similarity Search on Metric Data Assets

    Yiu, Man Lung; Assent, Ira; Jensen, Christian S.


    This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example...

  6. Interleaving Helps Students Distinguish among Similar Concepts

    Rohrer, Doug


    When students encounter a set of concepts (or terms or principles) that are similar in some way, they often confuse one with another. For instance, they might mistake one word for another word with a similar spelling (e.g., allusion instead of illusion) or choose the wrong strategy for a mathematics problem because it resembles a different kind of…

  7. Perceived Similarity, Proactive Adjustment, and Organizational Socialization

    Kammeyer-Mueller, John D.; Livingston, Beth A.; Liao, Hui


    The present study explores how perceived demographic and attitudinal similarity can influence proactive behavior among organizational newcomers. We propose that newcomers who perceive themselves as similar to their co-workers will be more willing to seek new information or build relationships, which in turn will lead to better long-term…

  8. Self-Similar Traffic In Wireless Networks

    Jerjomins, R.; Petersons, E.


    Many studies have shown that traffic in Ethernet and other wired networks is self-similar. This paper reveals that wireless network traffic is also self-similar and long-range dependant by analyzing big amount of data captured from the wireless router.

  9. and Models: A Self-Similar Approach

    José Antonio Belinchón


    equations (FEs admit self-similar solutions. The methods employed allow us to obtain general results that are valid not only for the FRW metric, but also for all the Bianchi types as well as for the Kantowski-Sachs model (under the self-similarity hypothesis and the power-law hypothesis for the scale factors.

  10. Similarity Structure of Wave-Collapse

    Rypdal, Kristoffer; Juul Rasmussen, Jens; Thomsen, Kenneth


    Similarity transformations of the cubic Schrödinger equation (CSE) are investigated. The transformations are used to remove the explicit time variation in the CSE and reduce it to differential equations in the spatial variables only. Two different methods for similarity reduction are employed and...

  11. Some Effects of Similarity Self-Disclosure

    Murphy, Kevin C.; Strong, Stanley R.


    College males were interviewed about how college had altered their friendships, values, and plans. The interviewers diclosed experiences and feelings similar to those revealed by the students. Results support Byrne's Law of Similarity in generating interpersonal attraction in the interview and suggest that the timing of self-disclosures is…

  12. Learning deep similarity in fundus photography

    Chudzik, Piotr; Al-Diri, Bashir; Caliva, Francesco; Ometto, Giovanni; Hunter, Andrew


    Similarity learning is one of the most fundamental tasks in image analysis. The ability to extract similar images in the medical domain as part of content-based image retrieval (CBIR) systems has been researched for many years. The vast majority of methods used in CBIR systems are based on hand-crafted feature descriptors. The approximation of a similarity mapping for medical images is difficult due to the big variety of pixel-level structures of interest. In fundus photography (FP) analysis, a subtle difference in e.g. lesions and vessels shape and size can result in a different diagnosis. In this work, we demonstrated how to learn a similarity function for image patches derived directly from FP image data without the need of manually designed feature descriptors. We used a convolutional neural network (CNN) with a novel architecture adapted for similarity learning to accomplish this task. Furthermore, we explored and studied multiple CNN architectures. We show that our method can approximate the similarity between FP patches more efficiently and accurately than the state-of- the-art feature descriptors, including SIFT and SURF using a publicly available dataset. Finally, we observe that our approach, which is purely data-driven, learns that features such as vessels calibre and orientation are important discriminative factors, which resembles the way how humans reason about similarity. To the best of authors knowledge, this is the first attempt to approximate a visual similarity mapping in FP.


    尹永成; 姜海益; 孙业顺


    The authors show that the self-similar set for a finite family of contractive similitudes (sim-ilarities, i.e., |fi(x) - fi(y)| = ai|x - y|, x,y ∈ RN, where 0 < ai < 1) is uniformly perfectexcept the case that it is a singleton. As a corollary, it is proved that this self-similar set haspositive Hausdorff dimension provided that it is not a singleton. And a lower bound of theupper box dimension of the uniformly perfect sets is given. Meanwhile the uniformly perfectset with Hausdorff measure zero in its Hausdorff dimension is given.

  14. Asymmetric similarity-weighted ensembles for image segmentation

    Cheplygina, V.; Van Opbroek, A.; Ikram, M. A.


    the images, thus representative data might not be available. Transfer learning techniques can be used to account for these differences, thus taking advantage of all the available data acquired with different protocols. We investigate the use of classifier ensembles, where each classifier is weighted...... according to the similarity between the data it is trained on, and the data it needs to segment. We examine 3 asymmetric similarity measures that can be used in scenarios where no labeled data from a newly introduced scanner or scanning protocol is available. We show that the asymmetry is informative...... and the direction of measurement needs to be chosen carefully. We also show that a point set similarity measure is robust across different studies, and outperforms state-of-the-art results on a multi-center brain tissue segmentation task....

  15. Efficient Privacy Preserving Protocols for Similarity Join

    Bilal Hawashin


    Full Text Available During the similarity join process, one or more sources may not allow sharing its data with other sources. In this case, a privacy preserving similarity join is required. We showed in our previous work [4] that using long attributes, such as paper abstracts, movie summaries, product descriptions, and user feedbacks, could improve the similarity join accuracy using supervised learning. However, the existing secure protocols for similarity join methods can not be used to join sources using these long attributes. Moreover, the majority of the existing privacy‐preserving protocols do not consider the semantic similarities during the similarity join process. In this paper, we introduce a secure efficient protocol to semantically join sources when the join attributes are long attributes. We provide two secure protocols for both scenarios when a training set exists and when there is no available training set. Furthermore, we introduced the multi‐label supervised secure protocol and the expandable supervised secure protocol. Results show that our protocols can efficiently join sources using the long attributes by considering the semantic relationships among the long string values. Therefore, it improves the overall secure similarity join performance.

  16. Mining Object Similarity for Predicting Next Locations

    Meng Chen; Xiaohui Yu; Yang Liu


    Next location prediction is of great importance for many location-based applications. With the virtue of solid theoretical foundations, Markov-based approaches have gained success along this direction. In this paper, we seek to enhance the prediction performance by understanding the similarity between objects. In particular, we propose a novel method, called weighted Markov model (weighted-MM), which exploits both the sequence of just-passed locations and the object similarity in mining the mobility patterns. To this end, we first train a Markov model for each object with its own trajectory records, and then quantify the similarities between different objects from two aspects: spatial locality similarity and trajectory similarity. Finally, we incorporate the object similarity into the Markov model by considering the similarity as the weight of the probability of reaching each possible next location, and return the top-rankings as results. We have conducted extensive experiments on a real dataset, and the results demonstrate significant improvements in prediction accuracy over existing solutions.

  17. Trajectory similarity join in spatial networks

    Shang, Shuo


    The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider the case of trajectory similarity join (TS-Join), where the objects are trajectories of vehicles moving in road networks. Thus, given two sets of trajectories and a threshold θ, the TS-Join returns all pairs of trajectories from the two sets with similarity above θ. This join targets applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction. With these applications in mind, we provide a purposeful definition of similarity. To enable efficient TS-Join processing on large sets of trajectories, we develop search space pruning techniques and take into account the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer algorithm. For each trajectory, the algorithm first finds similar trajectories. Then it merges the results to achieve a final result. The algorithm exploits an upper bound on the spatiotemporal similarity and a heuristic scheduling strategy for search space pruning. The algorithm\\'s per-trajectory searches are independent of each other and can be performed in parallel, and the merging has constant cost. An empirical study with real data offers insight in the performance of the algorithm and demonstrates that is capable of outperforming a well-designed baseline algorithm by an order of magnitude.

  18. Pollinators show flower colour preferences but flowers with similar colours do not attract similar pollinators.

    Reverté, Sara; Retana, Javier; Gómez, José M; Bosch, Jordi


    Colour is one of the main floral traits used by pollinators to locate flowers. Although pollinators show innate colour preferences, the view that the colour of a flower may be considered an important predictor of its main pollinators is highly controversial because flower choice is highly context-dependent, and initial innate preferences may be overridden by subsequent associative learning. Our objective is to establish whether there is a relationship between flower colour and pollinator composition in natural communities. We measured the flower reflectance spectrum and pollinator composition in four plant communities (85 plant species represented by 109 populations, and 32 305 plant-pollinator interactions in total). Pollinators were divided into six taxonomic groups: bees, ants, wasps, coleopterans, dipterans and lepidopterans. We found consistent associations between pollinator groups and certain colours. These associations matched innate preferences experimentally established for several pollinators and predictions of the pollination syndrome theory. However, flowers with similar colours did not attract similar pollinator assemblages. The explanation for this paradoxical result is that most flower species are pollination generalists. We conclude that although pollinator colour preferences seem to condition plant-pollinator interactions, the selective force behind these preferences has not been strong enough to mediate the appearance and maintenance of tight colour-based plant-pollinator associations. © The Author 2016. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email:

  19. Integrated Semantic Similarity Model Based on Ontology

    LIU Ya-Jun; ZHAO Yun


    To solve the problem of the inadequacy of semantic processing in the intelligent question answering system, an integrated semantic similarity model which calculates the semantic similarity using the geometric distance and information content is presented in this paper.With the help of interrelationship between concepts, the information content of concepts and the strength of the edges in the ontology network, we can calculate the semantic similarity between two concepts and provide information for the further calculation of the semantic similarity between user's question and answers in knowlegdge base.The results of the experiments on the prototype have shown that the semantic problem in natural language processing can also be solved with the help of the knowledge and the abundant semantic information in ontology.More than 90% accuracy with less than 50 ms average searching time in the intelligent question answering prototype system based on ontology has been reached.The result is very satisfied.

  20. Interpersonal Congruency, Attitude Similarity, and Interpersonal Attraction

    Touhey, John C.


    As no experimental study has examined the effects of congruency on attraction, the present investigation orthogonally varied attitude similarity and interpersonal congruency in order to compare the two independent variables as determinants of interpersonal attraction. (Author/RK)

  1. Interpersonal Congruency, Attitude Similarity, and Interpersonal Attraction

    Touhey, John C.


    As no experimental study has examined the effects of congruency on attraction, the present investigation orthogonally varied attitude similarity and interpersonal congruency in order to compare the two independent variables as determinants of interpersonal attraction. (Author/RK)

  2. Correlation between social proximity and mobility similarity

    Fan, Chao; Huang, Junming; Rong, Zhihai; Zhou, Tao


    Human behaviors exhibit ubiquitous correlations in many aspects, such as individual and collective levels, temporal and spatial dimensions, content, social and geographical layers. With rich Internet data of online behaviors becoming available, it attracts academic interests to explore human mobility similarity from the perspective of social network proximity. Existent analysis shows a strong correlation between online social proximity and offline mobility similari- ty, namely, mobile records between friends are significantly more similar than between strangers, and those between friends with common neighbors are even more similar. We argue the importance of the number and diversity of com- mon friends, with a counter intuitive finding that the number of common friends has no positive impact on mobility similarity while the diversity plays a key role, disagreeing with previous studies. Our analysis provides a novel view for better understanding the coupling between human online and offline behaviors, and will...

  3. Similarity Theory of Withdrawn Water Temperature Experiment

    Yunpeng Han


    Full Text Available Selective withdrawal from a thermal stratified reservoir has been widely utilized in managing reservoir water withdrawal. Besides theoretical analysis and numerical simulation, model test was also necessary in studying the temperature of withdrawn water. However, information on the similarity theory of the withdrawn water temperature model remains lacking. Considering flow features of selective withdrawal, the similarity theory of the withdrawn water temperature model was analyzed theoretically based on the modification of governing equations, the Boussinesq approximation, and some simplifications. The similarity conditions between the model and the prototype were suggested. The conversion of withdrawn water temperature between the model and the prototype was proposed. Meanwhile, the fundamental theory of temperature distribution conversion was firstly proposed, which could significantly improve the experiment efficiency when the basic temperature of the model was different from the prototype. Based on the similarity theory, an experiment was performed on the withdrawn water temperature which was verified by numerical method.

  4. Outsourced similarity search on metric data assets

    Yiu, Man Lung


    This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries.

  5. Spherically Symmetric, Self-Similar Spacetimes

    Wagh, S M; Wagh, Sanjay M.; Govinder, Keshlan S.


    Self-similar spacetimes are of importance to cosmology and to gravitational collapse problems. We show that self-similarity or the existence of a homothetic Killing vector field for spherically symmetric spacetimes implies the separability of the spacetime metric in terms of the co-moving coordinates and that the metric is, uniquely, the one recently reported in [cqg1]. The spacetime, in general, has non-vanishing energy-flux and shear. The spacetime admits matter with any equation of state.

  6. Protein structural similarity search by Ramachandran codes

    Chang Chih-Hung


    Full Text Available Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation. SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

  7. Some more similarities between Peirce and Skinner

    Moxley, Roy A.


    C. S. Peirce is noted for pioneering a variety of views, and the case is made here for the similarities and parallels between his views and B. F. Skinner's radical behaviorism. In addition to parallels previously noted, these similarities include an advancement of experimental science, a behavioral psychology, a shift from nominalism to realism, an opposition to positivism, a selectionist account for strengthening behavior, the importance of a community of selves, a recursive approach to meth...

  8. Web Search Results Summarization Using Similarity Assessment

    Sawant V.V.


    Full Text Available Now day’s internet has become part of our life, the WWW is most important service of internet because it allows presenting information such as document, imaging etc. The WWW grows rapidly and caters to a diversified levels and categories of users. For user specified results web search results are extracted. Millions of information pouring online, users has no time to surf the contents completely .Moreover the information available is repeated or duplicated in nature. This issue has created the necessity to restructure the search results that could yield results summarized. The proposed approach comprises of different feature extraction of web pages. Web page visual similarity assessment has been employed to address the problems in different fields including phishing, web archiving, web search engine etc. In this approach, initially by enters user query the number of search results get stored. The Earth Mover's Distance is used to assessment of web page visual similarity, in this technique take the web page as a low resolution image, create signature of that web page image with color and co-ordinate features .Calculate the distance between web pages by applying EMD method. Compute the Layout Similarity value by using tag comparison algorithm and template comparison algorithm. Textual similarity is computed by using cosine similarity, and hyperlink analysis is performed to compute outward links. The final similarity value is calculated by fusion of layout, text, hyperlink and EMD value. Once the similarity matrix is found clustering is employed with the help of connected component. Finally group of similar web pages i.e. summarized results get displayed to user. Experiment conducted to demonstrate the effectiveness of four methods to generate summarized result on different web pages and user queries also.

  9. Interlinguistic similarity and language death dynamics

    Mira, J


    We analyze the time evolution of a system of two coexisting languages (Castillian Spanish and Galician, both spoken in northwest Spain) in the framework of a model given by Abrams and Strogatz [Nature 424, 900 (2003)]. It is shown that, contrary to the model's initial prediction, a stable bilingual situation is possible if the languages in competition are similar enough. Similarity is described with a simple parameter, whose value can be estimated from fits of the data.

  10. Exploiting Data Similarity to Reduce Memory Footprints


    leslie3d Fortran Computational Fluid Dynamics (CFD) application 122. tachyon C Parallel Ray Tracing application 128.GAPgeofem C and Fortran Simulates...benefits most from SBLLmalloc; LAMMPS, which shows moderate similarity from primarily zero pages; and 122. tachyon , a parallel ray- tracing application...similarity across MPI tasks. They primarily are zero- pages although a small fraction (≈10%) are non-zero pages. 122. tachyon is an image rendering

  11. Similarity search processing. Paralelization and indexing technologies.

    Eder Dos Santos


    The next Scientific-Technical Report addresses the similarity search and the implementation of metric structures on parallel environments. It also presents the state of the art related to similarity search on metric structures and parallelism technologies. Comparative analysis are also proposed, seeking to identify the behavior of a set of metric spaces and metric structures over processing platforms multicore-based and GPU-based.

  12. Interpersonal attraction and personality: what is attractive--self similarity, ideal similarity, complementarity or attachment security?

    Klohnen, Eva C; Luo, Shanhong


    Little is known about whether personality characteristics influence initial attraction. Because adult attachment differences influence a broad range of relationship processes, the authors examined their role in 3 experimental attraction studies. The authors tested four major attraction hypotheses--self similarity, ideal-self similarity, complementarity, and attachment security--and examined both actual and perceptual factors. Replicated analyses across samples, designs, and manipulations showed that actual security and self similarity predicted attraction. With regard to perceptual factors, ideal similarity, self similarity, and security all were significant predictors. Whereas perceptual ideal and self similarity had incremental predictive power, perceptual security's effects were subsumed by perceptual ideal similarity. Perceptual self similarity fully mediated actual attachment similarity effects, whereas ideal similarity was only a partial mediator.

  13. Online multiple kernel similarity learning for visual search.

    Xia, Hao; Hoi, Steven C H; Jin, Rong; Zhao, Peilin


    Recent years have witnessed a number of studies on distance metric learning to improve visual similarity search in content-based image retrieval (CBIR). Despite their successes, most existing methods on distance metric learning are limited in two aspects. First, they usually assume the target proximity function follows the family of Mahalanobis distances, which limits their capacity of measuring similarity of complex patterns in real applications. Second, they often cannot effectively handle the similarity measure of multimodal data that may originate from multiple resources. To overcome these limitations, this paper investigates an online kernel similarity learning framework for learning kernel-based proximity functions which goes beyond the conventional linear distance metric learning approaches. Based on the framework, we propose a novel online multiple kernel similarity (OMKS) learning method which learns a flexible nonlinear proximity function with multiple kernels to improve visual similarity search in CBIR. We evaluate the proposed technique for CBIR on a variety of image data sets in which encouraging results show that OMKS outperforms the state-of-the-art techniques significantly.

  14. Hydrological Catchment Similarity Assessment in Geum River Catchments, Korea

    Ko, Ara; Park, Kisoon; Lee, Hyosang


    Similarity measure of catchments is essential for regionalization studies, which provide in depth analysis in hydrological response and flood estimations at ungauged catchments. However, this similarity measure is often biased to the selected catchments and is notclearly explained in hydrological sense. This study applied a type of hydrological similarity distance measure-Flood Estimation Handbook to 25 Geum river catchments, Korea. Three Catchment Characteristics, Area (A)-Annual precipitation (SAAR)-SCS Curve Number (CN), are used in Euclidian distance measures. Furthermore, six index of Flow Duration Curve (ILow:Q275/Q185, IDrought:Q355/Q185, IFlood:Qmax/Q185, IAbundant:Q95/Q185, IFloodDuration:Q10/Q355 and IRiverRegime:Qmax/Qmin) are applied to clustering analysis of SPSS. The catchments' grouping of hydrological similarity measures suggests three groups: H1 (Cheongseong, Gidae, Bukil, Oksan, Seockhwa, Habgang and Sangyeogyo), H2 (Cheongju, Guryong, Ugon, Boksu, Useong and Seokdong) and H3 (Muju, Yangganggyo and YongdamDam). The four catchments (Cheoncheon, Donghyang, DaecheongDam and Indong) are not grouped in this study. The clustering analysis of FDC provides four Groups; CFDC1 (Muju, YongdamDam, Yangganggyo, DaecheongDam, Cheongseong, Gidae, Seokhwa, Bukil, Habgang, Cheongju, Oksan, Yuseong and Guryong), CFDC2 (Cheoncheon, Donghyang, Boksu, Indong, Nonsan, Seokdong, Ugon, Simcheon, Useong and Sangyeogyo), CFDC3 (Songcheon) and CFDC4 (Tanbu). The six catchments (out of seven) of H1 are grouped in CFDC1, while Sangyeogyo is grouped in CFDC2. The four catchments (out of six) of H2 are also grouped in CFDC2, while Cheongju and Guryong are grouped in CFDC1. The catchments of H3 are categorized in CFDC1. The authors examine the results (H1, H2 and H3) of similarity measure based on catchment physical descriptors with results (CFDC1 and CFDC2) of clustering based on catchment hydrological response. The results of hydrological similarity measures are supported by

  15. Identifying mechanistic similarities in drug responses

    Zhao, C.


    Motivation: In early drug development, it would be beneficial to be able to identify those dynamic patterns of gene response that indicate that drugs targeting a particular gene will be likely or not to elicit the desired response. One approach would be to quantitate the degree of similarity between the responses that cells show when exposed to drugs, so that consistencies in the regulation of cellular response processes that produce success or failure can be more readily identified.Results: We track drug response using fluorescent proteins as transcription activity reporters. Our basic assumption is that drugs inducing very similar alteration in transcriptional regulation will produce similar temporal trajectories on many of the reporter proteins and hence be identified as having similarities in their mechanisms of action (MOA). The main body of this work is devoted to characterizing similarity in temporal trajectories/signals. To do so, we must first identify the key points that determine mechanistic similarity between two drug responses. Directly comparing points on the two signals is unrealistic, as it cannot handle delays and speed variations on the time axis. Hence, to capture the similarities between reporter responses, we develop an alignment algorithm that is robust to noise, time delays and is able to find all the contiguous parts of signals centered about a core alignment (reflecting a core mechanism in drug response). Applying the proposed algorithm to a range of real drug experiments shows that the result agrees well with the prior drug MOA knowledge. © The Author 2012. Published by Oxford University Press. All rights reserved.

  16. Exact score distribution computation for ontological similarity searches

    Schulz Marcel H


    Full Text Available Abstract Background Semantic similarity searches in ontologies are an important component of many bioinformatic algorithms, e.g., finding functionally related proteins with the Gene Ontology or phenotypically similar diseases with the Human Phenotype Ontology (HPO. We have recently shown that the performance of semantic similarity searches can be improved by ranking results according to the probability of obtaining a given score at random rather than by the scores themselves. However, to date, there are no algorithms for computing the exact distribution of semantic similarity scores, which is necessary for computing the exact P-value of a given score. Results In this paper we consider the exact computation of score distributions for similarity searches in ontologies, and introduce a simple null hypothesis which can be used to compute a P-value for the statistical significance of similarity scores. We concentrate on measures based on Resnik's definition of ontological similarity. A new algorithm is proposed that collapses subgraphs of the ontology graph and thereby allows fast score distribution computation. The new algorithm is several orders of magnitude faster than the naive approach, as we demonstrate by computing score distributions for similarity searches in the HPO. It is shown that exact P-value calculation improves clinical diagnosis using the HPO compared to approaches based on sampling. Conclusions The new algorithm enables for the first time exact P-value calculation via exact score distribution computation for ontology similarity searches. The approach is applicable to any ontology for which the annotation-propagation rule holds and can improve any bioinformatic method that makes only use of the raw similarity scores. The algorithm was implemented in Java, supports any ontology in OBO format, and is available for non-commercial and academic usage under:

  17. Characteristics of the similarity index in a Korean medical journal


    Background Journal editors have exercised their control over submitted papers having a high similarity index. Despite widespread suspicion of possible plagiarism on a high similarity index, our study focused on the real effect of the similarity index on the value of a scientific paper. Methods This research examined the percent values of the similarity index from 978 submitted (420 published) papers in the Korean Journal of Anesthesiology since 2012. Thus, this study aimed to identify the correlation between the similarity index and the value of a paper. The value of a paper was evaluated in two distinct phases (during a peer-review process vs. after publication), and the value of a published paper was evaluated in two aspects (academic citation vs. social media appearance). Results Yearly mean values of the similarity index ranged from 16% to 19%. There were 254 papers cited at least once and 179 papers appearing at least once in social media. The similarity index affected the acceptance/rejection of a paper in various ways; although the influence was not linear and the cutoff measures were distinctive among the types of papers, both extremes were related to a high rate of rejection. After publication, the similarity index had no effect on academic citation or social media appearance according to the paper. Conclusions The finding suggested that the similarity index no longer had an influence on academic citation or social media appearance according to the paper after publication, while the similarity index affected the acceptance/rejection of a submitted paper. Proofreading and intervention for finalizing the draft by the editors might play a role in achieving uniform quality of the publication. PMID:28580084

  18. Similarity-based denoising of point-sampled surface

    Ren-fang WANG; Wen-zhi CHEN; San-yuan ZHANG; Yin ZHANG; Xiu-zi YE


    A non-local denoising (NLD) algorithm for point-sampled surfaces (PSSs) is presented based on similarities, including geometry intensity and features of sample points. By using the trilateral filtering operator, the differential signal of each sample point is determined and called "geometry intensity". Based on covariance analysis, a regular grid of geometry intensity of a sample point is constructed, and the geometry-intensity similarity of two points is measured according to their grids. Based on mean shift clustering, the PSSs are clustered in terms of the local geometry-features similarity. The smoothed geometry intensity, i.e., offset distance, of the sample point is estimated according to the two similarities. Using the resulting intensity, the noise component from PSSs is finally removed by adjusting the position of each sample point along its own normal direction. Experimental results demonstrate that the algorithm is robust and can produce a more accurate denoising result while having better feature preservation.

  19. Personalized Predictive Modeling and Risk Factor Identification using Patient Similarity.

    Ng, Kenney; Sun, Jimeng; Hu, Jianying; Wang, Fei


    Personalized predictive models are customized for an individual patient and trained using information from similar patients. Compared to global models trained on all patients, they have the potential to produce more accurate risk scores and capture more relevant risk factors for individual patients. This paper presents an approach for building personalized predictive models and generating personalized risk factor profiles. A locally supervised metric learning (LSML) similarity measure is trained for diabetes onset and used to find clinically similar patients. Personalized risk profiles are created by analyzing the parameters of the trained personalized logistic regression models. A 15,000 patient data set, derived from electronic health records, is used to evaluate the approach. The predictive results show that the personalized models can outperform the global model. Cluster analysis of the risk profiles show groups of patients with similar risk factors, differences in the top risk factors for different groups of patients and differences between the individual and global risk factors.

  20. Earthquake detection through computationally efficient similarity search

    Yoon, Clara E.; O’Reilly, Ossian; Bergen, Karianne J.; Beroza, Gregory C.


    Seismology is experiencing rapid growth in the quantity of data, which has outpaced the development of processing algorithms. Earthquake detection—identification of seismic events in continuous data—is a fundamental operation for observational seismology. We developed an efficient method to detect earthquakes using waveform similarity that overcomes the disadvantages of existing detection methods. Our method, called Fingerprint And Similarity Thresholding (FAST), can analyze a week of continuous seismic waveform data in less than 2 hours, or 140 times faster than autocorrelation. FAST adapts a data mining algorithm, originally designed to identify similar audio clips within large databases; it first creates compact “fingerprints” of waveforms by extracting key discriminative features, then groups similar fingerprints together within a database to facilitate fast, scalable search for similar fingerprint pairs, and finally generates a list of earthquake detections. FAST detected most (21 of 24) cataloged earthquakes and 68 uncataloged earthquakes in 1 week of continuous data from a station located near the Calaveras Fault in central California, achieving detection performance comparable to that of autocorrelation, with some additional false detections. FAST is expected to realize its full potential when applied to extremely long duration data sets over a distributed network of seismic stations. The widespread application of FAST has the potential to aid in the discovery of unexpected seismic signals, improve seismic monitoring, and promote a greater understanding of a variety of earthquake processes. PMID:26665176

  1. Multicriteria Similarity-Based Anomaly Detection Using Pareto Depth Analysis.

    Hsiao, Ko-Jen; Xu, Kevin S; Calder, Jeff; Hero, Alfred O


    We consider the problem of identifying patterns in a data set that exhibits anomalous behavior, often referred to as anomaly detection. Similarity-based anomaly detection algorithms detect abnormally large amounts of similarity or dissimilarity, e.g., as measured by the nearest neighbor Euclidean distances between a test sample and the training samples. In many application domains, there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such cases, multiple dissimilarity measures can be defined, including nonmetric measures, and one can test for anomalies by scalarizing using a nonnegative linear combination of them. If the relative importance of the different dissimilarity measures are not known in advance, as in many anomaly detection applications, the anomaly detection algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we propose a method for similarity-based anomaly detection using a novel multicriteria dissimilarity measure, the Pareto depth. The proposed Pareto depth analysis (PDA) anomaly detection algorithm uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach is provably better than using linear combinations of the criteria, and shows superior performance on experiments with synthetic and real data sets.

  2. Category-based induction from similarity of neural activation.

    Weber, Matthew J; Osherson, Daniel


    The idea that similarity might be an engine of inductive inference dates back at least as far as David Hume. However, Hume's thesis is difficult to test without begging the question, since judgments of similarity may be infected by inferential processes. We present a one-parameter model of category-based induction that generates predictions about arbitrary statements of conditional probability over a predicate and a set of items. The prediction is based on the unconditional probabilities and similarities that characterize that predicate and those items. To test Hume's thesis, we collected brain activation from various regions of the ventral visual stream during a categorization task that did not invite comparison of categories. We then calculated the similarity of those activation patterns using a simple measure of vectorwise similarity and supplied those similarities to the model. The model's outputs correlated well with subjects' judgments of conditional probability. Our results represent a promising first step toward confirming Hume's thesis; similarity, assessed without reference to induction, may well drive inductive inference.

  3. Similarity in romantic couples' drinking motivations and drinking behaviors.

    Kehayes, Ivy-Lee L; Mackinnon, Sean P; Sherry, Simon B; Leonard, Kenneth E; Stewart, Sherry H


    Research suggests that enhancement, conformity, social, coping-with-anxiety, and coping-with-depression drinking motives are linked to specific drinking outcomes in a theoretically expected manner. Social learning theory suggests that people who spend more time together emulate each other's behavior to acquire reinforcing outcomes. The present study sought to integrate drinking motives theory and social learning theory to investigate similarity in drinking behaviors and drinking motives in romantic couples. We hypothesized that couples would be more similar than chance in their drinking behaviors and motives. We also hypothesized that demographics reflecting time around and interactions with romantic partners (e.g., days spent drinking together) would positively correlate with similarity in drinking behaviors and motivations. The present study tested hypotheses in 203 romantic couples. Participants completed a Timeline Follow-Back measure and the Modified Drinking Motives Questionnaire-Revised to track their alcohol use and drinking motives. Similarity profiles were calculated using McCrae's (J Pers Assess. 2008;90:105-109) coefficient of profile agreement, rpa. Couples were more similar in their drinking behavioral and motivational profiles than could be explained by chance. Days spent drinking together and days with face-to-face contact predicted increased similarity in drinking behavior profiles, but not similarity in drinking motives profiles. Results are partially consistent with social learning theory and suggest that social influences within couples could be important intervention targets to prevent escalations in drinking.

  4. Autoencoding beyond pixels using a learned similarity metric

    Larsen, Anders Boesen Lindbo; Sønderby, Søren Kaae; Larochelle, Hugo;


    reconstruction objective. Thereby, we replace element-wise errors with feature-wise errors to better capture the data distribution while offering invariance towards e.g. translation. We apply our method to images of faces and show that it outperforms VAEs with element-wise similarity measures in terms of visual...

  5. Visualization of semantic indexing similarity over MeSH.

    Du, Haixia; Yoo, Terry S


    We present an interactive visualization system for the evaluation of indexing results of the MEDLINE data-base over the Medical Subject Headings (MeSH) structure in a graphical radial-tree layout. It displays indexing similarity measurements with 2D color coding and a 3D height field permitting the evaluation of the automatic Medical Text Indexer (MTI), compared with human indexers.

  6. Multifractal Decomposition of Statistically Self-Similar Sets

    Jing Hu YU; Di He HU


    Let K be a statistically self-similar set defined by Graf. In this paper, we construct arandom measure p which is supported by K and study the multifractal decomposition for K with p.Under such a decomposition, we obtain the expression of the spectrum function f(α).

  7. Electroconvective instability of self-similar equilibria

    Demekhin, E; Shtemler, Yury


    Stability of electro-hydrodynamic processes between ion-exchange membranes is investigated. Solutions of the equilibrium problem which represents the balance between diffusion and electro-migration are commonly described in an one-dimensional (1D) steady-state approximation. In the present work a novel class of the 1D unsteady self-similar equilibrium solution is developed asymptotically in small Debye length, epsilon, and large distance between membranes (both made dimensionless with the diffusion-layer thickness). First, the 1D unsteady family of self-similar equilibrium solutions is developed. Then, a linear stability of the self-similar solutions slowly varied with time is investigated in the limit of small epsilonand the marginal stability curves are obtained. Method of matched asymptotics is applied provided that only the outer solution is considered, ignoring the inner solutions. The success of the analysis is provided by transforming the equations to the divergent type (nabla G=0) with the patching co...

  8. Visual Similarity Based Document Layout Analysis

    Di Wen; Xiao-Qing Ding


    In this paper, a visual similarity based document layout analysis (DLA) scheme is proposed, which by using clustering strategy can adaptively deal with documents in different languages, with different layout structures and skew angles. Aiming at a robust and adaptive DLA approach, the authors first manage to find a set of representative filters and statistics to characterize typical texture patterns in document images, which is through a visual similarity testing process.Texture features are then extracted from these filters and passed into a dynamic clustering procedure, which is called visual similarity clustering. Finally, text contents are located from the clustered results. Benefit from this scheme, the algorithm demonstrates strong robustness and adaptability in a wide variety of documents, which previous traditional DLA approaches do not possess.



    Protein fold structure is more conserved than its amino acid sequence and closely associated with biological function,so calculating the similarity of protein structures is a fundamental problem in structural biology and plays a key role in protein fold classification,fold function inference,and protein structure prediction.Large progress has been made in recent years in this field and many methods for considering structural similarity have been proposed,including methods for protein structure compar-ison,retrieval of protein structures from databases,and ligand binding site comparison.Most of those methods can be available on the World Wide Web,but evaluation of all the methods is still a hard problem.This paper summarizes some popular methods and latest methods for structure similarities,including structure alignment,protein structure retrieval,and ligand binding site alignment.

  10. Statistical energy analysis of similarly coupled systems

    ZHANG Jian


    Based on the principle of Statistical Energy Analysis (SEA) for non-conservatively coupled dynamical systems under non-correlative or correlative excitations, energy relationship between two similar SEA systems is established in the paper. The energy relationship is verified theoretically and experimentally from two similar SEA systems i.e., the structure of a coupled panel-beam and that of a coupled panel-sideframe, in the cases of conservative coupling and non-conservative coupling respectively. As an application of the method, relationship between noise power radiated from two similar cutting systems is studied. Results show that there are good agreements between the theory and the experiments, and the method is valuable to analysis of dynamical problems associated with a complicated system from that with a simple one.

  11. Query Language for Complex Similarity Queries

    Budikova, Petra; Zezula, Pavel


    For complex data types such as multimedia, traditional data management methods are not suitable. Instead of attribute matching approaches, access methods based on object similarity are becoming popular. Recently, this resulted in an intensive research of indexing and searching methods for the similarity-based retrieval. Nowadays, many efficient methods are already available, but using them to build an actual search system still requires specialists that tune the methods and build the system manually. Several attempts have already been made to provide a more convenient high-level interface in a form of query languages for such systems, but these are limited to support only basic similarity queries. In this paper, we propose a new language that allows to formulate content-based queries in a flexible way, taking into account the functionality offered by a particular search engine in use. To ensure this, the language is based on a general data model with an abstract set of operations. Consequently, the language s...

  12. Structural similarity and category-specificity

    Gerlach, Christian; Law, Ian; Paulson, Olaf B


    It has been suggested that category-specific recognition disorders for natural objects may reflect that natural objects are more structurally (visually) similar than artefacts and therefore more difficult to recognize following brain damage. On this account one might expect a positive relationship...... between blood flow and structural similarity in areas involved in visual object recognition. Contrary to this expectation we report a negative relationship in that identification of articles of clothing cause more extensive activation than identification of vegetables/fruit and animals even though items...... from the categories of animals and vegetables/fruit are rated as more structurally similar than items from the category of articles of clothing. Given that this pattern cannot be explained in terms of a tradeoff between activation and accuracy, we interpret these findings within a model where...

  13. Large margin classification with indefinite similarities

    Alabdulmohsin, Ibrahim


    Classification with indefinite similarities has attracted attention in the machine learning community. This is partly due to the fact that many similarity functions that arise in practice are not symmetric positive semidefinite, i.e. the Mercer condition is not satisfied, or the Mercer condition is difficult to verify. Examples of such indefinite similarities in machine learning applications are ample including, for instance, the BLAST similarity score between protein sequences, human-judged similarities between concepts and words, and the tangent distance or the shape matching distance in computer vision. Nevertheless, previous works on classification with indefinite similarities are not fully satisfactory. They have either introduced sources of inconsistency in handling past and future examples using kernel approximation, settled for local-minimum solutions using non-convex optimization, or produced non-sparse solutions by learning in Krein spaces. Despite the large volume of research devoted to this subject lately, we demonstrate in this paper how an old idea, namely the 1-norm support vector machine (SVM) proposed more than 15 years ago, has several advantages over more recent work. In particular, the 1-norm SVM method is conceptually simpler, which makes it easier to implement and maintain. It is competitive, if not superior to, all other methods in terms of predictive accuracy. Moreover, it produces solutions that are often sparser than more recent methods by several orders of magnitude. In addition, we provide various theoretical justifications by relating 1-norm SVM to well-established learning algorithms such as neural networks, SVM, and nearest neighbor classifiers. Finally, we conduct a thorough experimental evaluation, which reveals that the evidence in favor of 1-norm SVM is statistically significant.

  14. Inferring Trust Based on Similarity with TILLIT

    Tavakolifard, Mozhgan; Herrmann, Peter; Knapskog, Svein J.

    A network of people having established trust relations and a model for propagation of related trust scores are fundamental building blocks in many of today’s most successful e-commerce and recommendation systems. However, the web of trust is often too sparse to predict trust values between non-familiar people with high accuracy. Trust inferences are transitive associations among users in the context of an underlying social network and may provide additional information to alleviate the consequences of the sparsity and possible cold-start problems. Such approaches are helpful, provided that a complete trust path exists between the two users. An alternative approach to the problem is advocated in this paper. Based on collaborative filtering one can exploit the like-mindedness resp. similarity of individuals to infer trust to yet unknown parties which increases the trust relations in the web. For instance, if one knows that with respect to a specific property, two parties are trusted alike by a large number of different trusters, one can assume that they are similar. Thus, if one has a certain degree of trust to the one party, one can safely assume a very similar trustworthiness of the other one. In an attempt to provide high quality recommendations and proper initial trust values even when no complete trust propagation path or user profile exists, we propose TILLIT — a model based on combination of trust inferences and user similarity. The similarity is derived from the structure of the trust graph and users’ trust behavior as opposed to other collaborative-filtering based approaches which use ratings of items or user’s profile. We describe an algorithm realizing the approach based on a combination of trust inferences and user similarity, and validate the algorithm using a real large-scale data-set.

  15. Similarity Based Semantic Web Service Match

    Peng, Hui; Niu, Wenjia; Huang, Ronghuai

    Semantic web service discovery aims at returning the most matching advertised services to the service requester by comparing the semantic of the request service with an advertised service. The semantic of a web service are described in terms of inputs, outputs, preconditions and results in Ontology Web Language for Service (OWL-S) which formalized by W3C. In this paper we proposed an algorithm to calculate the semantic similarity of two services by weighted averaging their inputs and outputs similarities. Case study and applications show the effectiveness of our algorithm in service match.

  16. Molecular fingerprint similarity search in virtual screening.

    Cereto-Massagué, Adrià; Ojeda, María José; Valls, Cristina; Mulero, Miquel; Garcia-Vallvé, Santiago; Pujadas, Gerard


    Molecular fingerprints have been used for a long time now in drug discovery and virtual screening. Their ease of use (requiring little to no configuration) and the speed at which substructure and similarity searches can be performed with them - paired with a virtual screening performance similar to other more complex methods - is the reason for their popularity. However, there are many types of fingerprints, each representing a different aspect of the molecule, which can greatly affect search performance. This review focuses on commonly used fingerprint algorithms, their usage in virtual screening, and the software packages and online tools that provide these algorithms.

  17. PubChem3D: Similar conformers

    Bolton Evan E


    Full Text Available Abstract Background PubChem is a free and open public resource for the biological activities of small molecules. With many tens of millions of both chemical structures and biological test results, PubChem is a sizeable system with an uneven degree of available information. Some chemical structures in PubChem include a great deal of biological annotation, while others have little to none. To help users, PubChem pre-computes "neighboring" relationships to relate similar chemical structures, which may have similar biological function. In this work, we introduce a "Similar Conformers" neighboring relationship to identify compounds with similar 3-D shape and similar 3-D orientation of functional groups typically used to define pharmacophore features. Results The first two diverse 3-D conformers of 26.1 million PubChem Compound records were compared to each other, using a shape Tanimoto (ST of 0.8 or greater and a color Tanimoto (CT of 0.5 or greater, yielding 8.16 billion conformer neighbor pairs and 6.62 billion compound neighbor pairs, with an average of 253 "Similar Conformers" compound neighbors per compound. Comparing the 3-D neighboring relationship to the corresponding 2-D neighboring relationship ("Similar Compounds" for molecules such as caffeine, aspirin, and morphine, one finds unique sets of related chemical structures, providing additional significant biological annotation. The PubChem 3-D neighboring relationship is also shown to be able to group a set of non-steroidal anti-inflammatory drugs (NSAIDs, despite limited PubChem 2-D similarity. In a study of 4,218 chemical structures of biomedical interest, consisting of many known drugs, using more diverse conformers per compound results in more 3-D compound neighbors per compound; however, the overlap of the compound neighbor lists per conformer also increasingly resemble each other, being 38% identical at three conformers and 68% at ten conformers. Perhaps surprising is that the average

  18. Nuclear markers reveal that inter-lake cichlids' similar morphologies do not reflect similar genealogy.

    Kassam, Daud; Seki, Shingo; Horic, Michio; Yamaoka, Kosaku


    The apparent inter-lake morphological similarity among East African Great Lakes' cichlid species/genera has left evolutionary biologists asking whether such similarity is due to sharing of common ancestor or mere convergent evolution. In order to answer such question, we first used Geometric Morphometrics, GM, to quantify morphological similarity and then subsequently used Amplified Fragment Length Polymorphism, AFLP, to determine if similar morphologies imply shared ancestry or convergent evolution. GM revealed that not all presumed morphological similar pairs were indeed similar, and the dendrogram generated from AFLP data indicated distinct clusters corresponding to each lake and not inter-lake morphological similar pairs. Such results imply that the morphological similarity is due to convergent evolution and not shared ancestry. The congruency of GM and AFLP generated dendrograms imply that GM is capable of picking up phylogenetic signal, and thus GM can be potential tool in phylogenetic systematics.

  19. A new similarity computing method based on concept similarity in Chinese text processing

    PENG Jing; YANG DongQing; TANG ShiWei; WANG TengJiao; GAO Jun


    The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vec-tor space modet al first, and then splits words into a set of concepts. Through computing the inner products between concepts, it obtains the similarity between words. The new method computes the similarity of text based on the similarity of words at last. The contributions of the paper include: 1) propose a new computing formula between words; 2) propose a new text similarity computing method based on words similarity; 3) successfully use the method in the application of similarity computing of WEB news; and 4) prove the validity of the method through extensive experiments.

  20. Unveiling Music Structure Via PLSA Similarity Fusion

    Arenas-García, Jerónimo; Meng, Anders; Petersen, Kaare Brandt


    observed similarities can be satisfactorily explained using the latent semantics. Additionally, this approach significantly simplifies the song retrieval phase, leading to a more practical system implementation. The suitability of the PLSA model for representing music structure is studied in a simplified...

  1. Structural similarity of genetically interacting proteins

    Nussinov Ruth


    Full Text Available Abstract Background The study of gene mutants and their interactions is fundamental to understanding gene function and backup mechanisms within the cell. The recent availability of large scale genetic interaction networks in yeast and worm allows the investigation of the biological mechanisms underlying these interactions at a global scale. To date, less than 2% of the known genetic interactions in yeast or worm can be accounted for by sequence similarity. Results Here, we perform a genome-scale structural comparison among protein pairs in the two species. We show that significant fractions of genetic interactions involve structurally similar proteins, spanning 7–10% and 14% of all known interactions in yeast and worm, respectively. We identify several structural features that are predictive of genetic interactions and show their superiority over sequence-based features. Conclusion Structural similarity is an important property that can explain and predict genetic interactions. According to the available data, the most abundant mechanism for genetic interactions among structurally similar proteins is a common interacting partner shared by two genetically interacting proteins.

  2. The Case of the Similar Trees.

    Meyer, Rochelle Wilson


    A possible logical flaw based on similar triangles is discussed with the Sherlock Holmes mystery, "The Muskgrave Ritual." The possible flaw has to do with the need for two trees to have equal growth rates over a 250-year period in order for the solution presented to work. (MP)

  3. Cultural Similarities and Differences on Idiom Translation

    黄频频; 陈于全


    Both English and Chinese are abound with idioms. Idioms are an important part of the hnguage and culture of a society. English and Chinese idioms carved with cultural characteristics account for a great part in the tramlation. This paper studies the translation of idioms concerning their cultural similarities, cultural differences and transhtion principles.

  4. Recognizing Similarities between Fraction Word Problems.

    Hardiman, Pamela Thibodeau

    Deciding how to approach a word problem for solution is a critical stage of problem solving, and is the stage which frequently presents considerable difficulty for novices. Do novices use the same information that experts do in deciding that two problems would be solved similarly? This set of four studies indicates that novices rely more on…

  5. Cultural similarity and adjustment of expatriate academics

    Selmer, Jan; Lauring, Jakob


    The findings of a number of recent empirical studies of business expatriates, using different samples and methodologies, seem to support the counter-intuitive proposition that cultural similarity may be as difficult to adjust to as cultural dissimilarity. However, it is not obvious that these res...

  6. Cross-kingdom similarities in microbiome functions

    Mendes, R.; Raaijmakers, J.M.


    Recent advances in medical research have revealed how humans rely on their microbiome for diverse traits and functions. Similarly, microbiomes of other higher organisms play key roles in disease, health, growth and development of their host. Exploring microbiome functions across kingdoms holds enorm

  7. Similarity, trust in institutions, affect, and populism

    Scholderer, Joachim; Finucane, Melissa L.

    on affect is a quicker, easier, and a more efficient way of navigating in a complex and uncertain world. Hence, many theorists give affect a direct and primary role in motivating behavior. Taken together, the results provide uncannily strong support for the value-similarity hypothesis, strengthening...

  8. Some Similarity between Contractions and Kannan Mappings

    Tomonari Suzuki


    Full Text Available Contractions are always continuous and Kannan mappings are not necessarily continuous. This is a very big difference between both mappings. However, we know that relaxed both mappings are quite similar. In this paper, we discuss both mappings from a new point of view.

  9. Cross-kingdom similarities in microbiome functions

    Mendes, R.; Raaijmakers, J.M.


    Recent advances in medical research have revealed how humans rely on their microbiome for diverse traits and functions. Similarly, microbiomes of other higher organisms play key roles in disease, health, growth and development of their host. Exploring microbiome functions across kingdoms holds enorm

  10. Similarities in Aegyptopithecus and Afropithecus facial morphology.

    Leakey, M G; Leakey, R E; Richtsmeier, J T; Simons, E L; Walker, A C


    Recently discovered cranial fossils from the Oligocene deposits of the Fayum depression in Egypt provide many details of the facial morphology of Aegyptopithecus zeuxis. Similar features are found in the Miocene hominoid Afropithecus turkanensis. Their presence is the first good evidence of a strong phenetic link between the Oligocene and Miocene hominoids of Africa. A comparison of trait lists emphasizes the similarities of the two fossil species, and leads us to conclude that the two fossil genera share many primitive facial features. In addition, we studied facial morphology using finite-element scaling analysis and found that the two genera show similarities in morphological integration, or the way in which biological landmarks relate to one another in three dimensions to define the form of the organism. Size differences between the two genera are much greater than the relatively minor shape differences. Analysis of variability in landmark location among the four Aegyptopithecus specimens indicates that variability within the sample is not different from that found within two samples of modern macaques. We propose that the shape differences found among the four Aegyptopithecus specimens simply reflect individual variation in facial characteristics, and that the similarities in facial morphology between Aegyptopithecus and Afropithecus probably represent a complex of primitive facial features retained over millions of years.

  11. SEAL: Spatio-Textual Similarity Search

    Fan, Ju; Zhou, Lizhu; Chen, Shanshan; Hu, Jun


    Location-based services (LBS) have become more and more ubiquitous recently. Existing methods focus on finding relevant points-of-interest (POIs) based on users' locations and query keywords. Nowadays, modern LBS applications generate a new kind of spatio-textual data, regions-of-interest (ROIs), containing region-based spatial information and textual description, e.g., mobile user profiles with active regions and interest tags. To satisfy search requirements on ROIs, we study a new research problem, called spatio-textual similarity search: Given a set of ROIs and a query ROI, we find the similar ROIs by considering spatial overlap and textual similarity. Spatio-textual similarity search has many important applications, e.g., social marketing in location-aware social networks. It calls for an efficient search method to support large scales of spatio-textual data in LBS systems. To this end, we introduce a filter-and-verification framework to compute the answers. In the filter step, we generate signatures for ...

  12. Cross-kingdom similarities in microbiome functions

    Mendes, R.; Raaijmakers, J.M.


    Recent advances in medical research have revealed how humans rely on their microbiome for diverse traits and functions. Similarly, microbiomes of other higher organisms play key roles in disease, health, growth and development of their host. Exploring microbiome functions across kingdoms holds

  13. Large-Scale Similarity Joins With Guarantees

    Pagh, Rasmus


    The ability to handle noisy or imprecise data is becoming increasingly important in computing. In the database community the notion of similarity join has been studied extensively, yet existing solutions have offered weak performance guarantees. Either they are based on deterministic filtering te...

  14. Mental Institutions and Similar Phenomena Called Schools

    Fischer, Ronald W.


    Mental institutions and public schools appear to have many similarities, and they often operate in ways that would seem contradictory to their philosophy. This article explores certain atrocities to the self" that result from programs that are intended to be beneficial but, in reality, often result in dehumanization. (Author)

  15. Efficient Similarity Retrieval in Music Databases

    Ruxanda, Maria Magdalena; Jensen, Christian Søndergaard


    Audio music is increasingly becoming available in digital form, and the digital music collections of individuals continue to grow. Addressing the need for effective means of retrieving music from such collections, this paper proposes new techniques for content-based similarity search. Each music ...

  16. Self-similar parabolic plasmonic beams.

    Davoyan, Arthur R; Turitsyn, Sergei K; Kivshar, Yuri S


    We demonstrate that an interplay between diffraction and defocusing nonlinearity can support stable self-similar plasmonic waves with a parabolic profile. Simplicity of a parabolic shape combined with the corresponding parabolic spatial phase distribution creates opportunities for controllable manipulation of plasmons through a combined action of diffraction and nonlinearity.

  17. Large-Scale Similarity Joins With Guarantees

    Pagh, Rasmus


    The ability to handle noisy or imprecise data is becoming increasingly important in computing. In the database community the notion of similarity join has been studied extensively, yet existing solutions have offered weak performance guarantees. Either they are based on deterministic filtering te...

  18. Extending the Similarity-Attraction Effect : The effects of When-Similarity in mediated communication

    Kaptein, M.C.; Castaneda, D.; Fernandez, N.; Nass, C.


    The feeling of connectedness experienced in computer-mediated relationships can be explained by the similarity-attraction effect (SAE). Though SAE is well established in psychology, the effects of some types of similarity have not yet been explored. In 2 studies, we demonstrate similarity-attraction

  19. New similarity search based glioma grading

    Haegler, Katrin; Brueckmann, Hartmut; Linn, Jennifer [Ludwig-Maximilians-University of Munich, Department of Neuroradiology, Munich (Germany); Wiesmann, Martin; Freiherr, Jessica [RWTH Aachen University, Department of Neuroradiology, Aachen (Germany); Boehm, Christian [Ludwig-Maximilians-University of Munich, Department of Computer Science, Munich (Germany); Schnell, Oliver; Tonn, Joerg-Christian [Ludwig-Maximilians-University of Munich, Department of Neurosurgery, Munich (Germany)


    MR-based differentiation between low- and high-grade gliomas is predominately based on contrast-enhanced T1-weighted images (CE-T1w). However, functional MR sequences as perfusion- and diffusion-weighted sequences can provide additional information on tumor grade. Here, we tested the potential of a recently developed similarity search based method that integrates information of CE-T1w and perfusion maps for non-invasive MR-based glioma grading. We prospectively included 37 untreated glioma patients (23 grade I/II, 14 grade III gliomas), in whom 3T MRI with FLAIR, pre- and post-contrast T1-weighted, and perfusion sequences was performed. Cerebral blood volume, cerebral blood flow, and mean transit time maps as well as CE-T1w images were used as input for the similarity search. Data sets were preprocessed and converted to four-dimensional Gaussian Mixture Models that considered correlations between the different MR sequences. For each patient, a so-called tumor feature vector (= probability-based classifier) was defined and used for grading. Biopsy was used as gold standard, and similarity based grading was compared to grading solely based on CE-T1w. Accuracy, sensitivity, and specificity of pure CE-T1w based glioma grading were 64.9%, 78.6%, and 56.5%, respectively. Similarity search based tumor grading allowed differentiation between low-grade (I or II) and high-grade (III) gliomas with an accuracy, sensitivity, and specificity of 83.8%, 78.6%, and 87.0%. Our findings indicate that integration of perfusion parameters and CE-T1w information in a semi-automatic similarity search based analysis improves the potential of MR-based glioma grading compared to CE-T1w data alone. (orig.)




    Extinction event may profoundly disturb palaeobiogeographic patterns. There have been few palaeobiogeographic studies of the particular survival-recovery interval after the Late Ordovician mass extinction. Here we analyse global brachio-pod occurrences, based on the revision of published information, including new material of brachiopods of South China, for the early and late Rhuddanian, the basal stage of the Silurian immediately following the end Ordovician mass extinction. The data-set consists of 137 occurrences, 72 genera, and 13 localities in the early Rhuddanian ( survival interval), and 272 occurrences, 91 genera, and 26 localities in the late Rhuddanian (early recovery interval). The data are analysed using Cluster Anal-ysis, Nonmetric Multidimensional Scaling and Minimum Spanning Tree methods using the Yule's Y coefficient and the RC coefficient (probabilistic index of similarity). The results display palaeolati-tudinal distribution patterns for brachiopod survival and recovery. Frequency analysis of the data indicates that the cosmopolitan taxa before the mass extinction showed a decrease in distributions of brachiopods in the survival interval, and expanded temporarily in the recovery interval. Furthermore, discussion on six similarity measures by data frequency analysis indicates that there exists a relationship between the data structure and the applicability of particular similarity measures. We suggest that Cluster Analysis is supplemented by u-sing other statistic methods (e. G. NMDS) in palaeobiogeographic studies to improve objectivity and accuracy.%灭绝事件对古生物地理格局的影响已引起关注,近期研究表明奥陶纪末大灭绝事件后多样性显著高于传统认识,而全球该时期腕足动物的古生物地理分布情况尚未见报道。本文基于已发表的和最新的资料及所掌握新数据的整理,建立全球腕足动物志留纪初鲁丹(Rhuddanian)早期(残存期)13个产地72属137个出现