WorldWideScience

Sample records for greatest sequence similarity

  1. FRESCO: Referential compression of highly similar sequences.

    Science.gov (United States)

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.

  2. Exploring the relationship between sequence similarity and accurate phylogenetic trees.

    Science.gov (United States)

    Cantarel, Brandi L; Morrison, Hilary G; Pearson, William

    2006-11-01

    We have characterized the relationship between accurate phylogenetic reconstruction and sequence similarity, testing whether high levels of sequence similarity can consistently produce accurate evolutionary trees. We generated protein families with known phylogenies using a modified version of the PAML/EVOLVER program that produces insertions and deletions as well as substitutions. Protein families were evolved over a range of 100-400 point accepted mutations; at these distances 63% of the families shared significant sequence similarity. Protein families were evolved using balanced and unbalanced trees, with ancient or recent radiations. In families sharing statistically significant similarity, about 60% of multiple sequence alignments were 95% identical to true alignments. To compare recovered topologies with true topologies, we used a score that reflects the fraction of clades that were correctly clustered. As expected, the accuracy of the phylogenies was greatest in the least divergent families. About 88% of phylogenies clustered over 80% of clades in families that shared significant sequence similarity, using Bayesian, parsimony, distance, and maximum likelihood methods. However, for protein families with short ancient branches (ancient radiation), only 30% of the most divergent (but statistically significant) families produced accurate phylogenies, and only about 70% of the second most highly conserved families, with median expectation values better than 10(-60), produced accurate trees. These values represent upper bounds on expected tree accuracy for sequences with a simple divergence history; proteins from 700 Giardia families, with a similar range of sequence similarities but considerably more gaps, produced much less accurate trees. For our simulated insertions and deletions, correct multiple sequence alignments did not perform much better than those produced by T-COFFEE, and including sequences with expressed sequence tag-like sequencing errors did not

  3. Interference effects in learning similar sequences of discrete movements

    NARCIS (Netherlands)

    Koedijker, J.M.; Oudejans, R.R.D.; Beek, P.J.

    2010-01-01

    Three experiments were conducted to examine proactive and retroactive interference effects in learning two similar sequences of discrete movements. In each experiment, the participants in the experimental group practiced two movement sequences on consecutive days (1 on each day, order

  4. BLAST and FASTA similarity searching for multiple sequence alignment.

    Science.gov (United States)

    Pearson, William R

    2014-01-01

    BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

  5. Investigating Correlation between Protein Sequence Similarity and Semantic Similarity Using Gene Ontology Annotations.

    Science.gov (United States)

    Ikram, Najmul; Qadir, Muhammad Abdul; Afzal, Muhammad Tanvir

    2018-01-01

    Sequence similarity is a commonly used measure to compare proteins. With the increasing use of ontologies, semantic (function) similarity is getting importance. The correlation between these measures has been applied in the evaluation of new semantic similarity methods, and in protein function prediction. In this research, we investigate the relationship between the two similarity methods. The results suggest absence of a strong correlation between sequence and semantic similarities. There is a large number of proteins with low sequence similarity and high semantic similarity. We observe that Pearson's correlation coefficient is not sufficient to explain the nature of this relationship. Interestingly, the term semantic similarity values above 0 and below 1 do not seem to play a role in improving the correlation. That is, the correlation coefficient depends only on the number of common GO terms in proteins under comparison, and the semantic similarity measurement method does not influence it. Semantic similarity and sequence similarity have a distinct behavior. These findings are of significant effect for future works on protein comparison, and will help understand the semantic similarity between proteins in a better way.

  6. Using SQL Databases for Sequence Similarity Searching and Analysis.

    Science.gov (United States)

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  7. The HMMER Web Server for Protein Sequence Similarity Search.

    Science.gov (United States)

    Prakash, Ananth; Jeffryes, Matt; Bateman, Alex; Finn, Robert D

    2017-12-08

    Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share some degree of similarity, and sequence-search algorithms use this principle to identify homologs. The requirement for a fast and sensitive sequence search method led to the development of the HMMER software, which in the latest version (v3.1) uses a combination of sophisticated acceleration heuristics and mathematical and computational optimizations to enable the use of profile hidden Markov models (HMMs) for sequence analysis. The HMMER Web server provides a common platform by linking the HMMER algorithms to databases, thereby enabling the search for homologs, as well as providing sequence and functional annotation by linking external databases. This unit describes three basic protocols and two alternate protocols that explain how to use the HMMER Web server using various input formats and user defined parameters. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  8. Sequence Similarity Presenter: a tool for the graphic display of similarities of long sequences for use in presentations.

    Science.gov (United States)

    Fröhlich, K U

    1994-04-01

    A new method for the presentation of alignments of long sequences is described. The degree of identity for the aligned sequences is averaged for sections of a fixed number of residues. The resulting values are converted to shades of gray, with white corresponding to lack of identity and black corresponding to perfect identity. A sequence alignment is represented as a bar filled with varying shades of gray. The display is compact and allows for a fast and intuitive recognition of the distribution of regions with a high similarity. It is well suited for the presentation of alignments of long sequences, e.g. of protein superfamilies, in plenary lectures. The method is implemented as a HyperCard stack for Apple Macintosh computers. Several options for the modification of the output are available (e.g. background reduction, size of the summation window, consideration of amino acid similarity, inclusion of graphic markers to indicate specific domains). The output is a PostScript file which can be printed, imported as EPS or processed further with Adobe Illustrator.

  9. The Greatest Mathematical Discovery?

    Energy Technology Data Exchange (ETDEWEB)

    Bailey, David H.; Borwein, Jonathan M.

    2010-05-12

    What mathematical discovery more than 1500 years ago: (1) Is one of the greatest, if not the greatest, single discovery in the field of mathematics? (2) Involved three subtle ideas that eluded the greatest minds of antiquity, even geniuses such as Archimedes? (3) Was fiercely resisted in Europe for hundreds of years after its discovery? (4) Even today, in historical treatments of mathematics, is often dismissed with scant mention, or else is ascribed to the wrong source? Answer: Our modern system of positional decimal notation with zero, together with the basic arithmetic computational schemes, which were discovered in India about 500 CE.

  10. Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%

    DEFF Research Database (Denmark)

    Havgaard, Jakob Hull; Lyngsø, Rune B.; Stormo, Gary D.

    2005-01-01

    detect two genes with low sequence similarity, where the genes are part of a larger genomic region. Results: Here we present such an approach for pairwise local alignment which is based on FILDALIGN and the Sankoff algorithm for simultaneous structural alignment of multiple sequences. We include...... the ability to conduct mutual scans of two sequences of arbitrary length while searching for common local structural motifs of some maximum length. This drastically reduces the complexity of the algorithm. The scoring scheme includes structural parameters corresponding to those available for free energy....... The structure prediction performance for a family is typically around 0.7 using Matthews correlation coefficient. In case (2), the algorithm is successful at locating RNA families with an average sensitivity of 0.8 and a positive predictive value of 0.9 using a BLAST-like hit selection scheme. Availability...

  11. Nature's Greatest Puzzles

    International Nuclear Information System (INIS)

    Quigg, Chris

    2005-01-01

    It is a pleasure to be part of the SLAC Summer Institute again, not simply because it is one of the great traditions in our field, but because this is a moment of great promise for particle physics. I look forward to exploring many opportunities with you over the course of our two weeks together. My first task in talking about Nature's Greatest Puzzles, the title of this year's Summer Institute, is to deconstruct the premise a little bit

  12. Similar Ratios of Introns to Intergenic Sequence across Animal Genomes.

    Science.gov (United States)

    Francis, Warren R; Wörheide, Gert

    2017-06-01

    One central goal of genome biology is to understand how the usage of the genome differs between organisms. Our knowledge of genome composition, needed for downstream inferences, is critically dependent on gene annotations, yet problems associated with gene annotation and assembly errors are usually ignored in comparative genomics. Here, we analyze the genomes of 68 species across 12 animal phyla and some single-cell eukaryotes for general trends in genome composition and transcription, taking into account problems of gene annotation. We show that, regardless of genome size, the ratio of introns to intergenic sequence is comparable across essentially all animals, with nearly all deviations dominated by increased intergenic sequence. Genomes of model organisms have ratios much closer to 1:1, suggesting that the majority of published genomes of nonmodel organisms are underannotated and consequently omit substantial numbers of genes, with likely negative impact on evolutionary interpretations. Finally, our results also indicate that most animals transcribe half or more of their genomes arguing against differences in genome usage between animal groups, and also suggesting that the transcribed portion is more dependent on genome size than previously thought. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  13. Computational Physics' Greatest Hits

    Science.gov (United States)

    Bug, Amy

    2011-03-01

    The digital computer, has worked its way so effectively into our profession that now, roughly 65 years after its invention, it is virtually impossible to find a field of experimental or theoretical physics unaided by computational innovation. It is tough to think of another device about which one can make that claim. In the session ``What is computational physics?'' speakers will distinguish computation within the field of computational physics from this ubiquitous importance across all subfields of physics. This talk will recap the invited session ``Great Advances...Past, Present and Future'' in which five dramatic areas of discovery (five of our ``greatest hits'') are chronicled: The physics of many-boson systems via Path Integral Monte Carlo, the thermodynamic behavior of a huge number of diverse systems via Monte Carlo Methods, the discovery of new pharmaceutical agents via molecular dynamics, predictive simulations of global climate change via detailed, cross-disciplinary earth system models, and an understanding of the formation of the first structures in our universe via galaxy formation simulations. The talk will also identify ``greatest hits'' in our field from the teaching and research perspectives of other members of DCOMP, including its Executive Committee.

  14. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    Science.gov (United States)

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  15. Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity.

    Science.gov (United States)

    King, Brian R; Aburdene, Maurice; Thompson, Alex; Warres, Zach

    2014-01-01

    Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.

  16. Correlation between protein sequence similarity and x-ray diffraction quality in the protein data bank.

    Science.gov (United States)

    Lu, Hui-Meng; Yin, Da-Chuan; Ye, Ya-Jing; Luo, Hui-Min; Geng, Li-Qiang; Li, Hai-Sheng; Guo, Wei-Hong; Shang, Peng

    2009-01-01

    As the most widely utilized technique to determine the 3-dimensional structure of protein molecules, X-ray crystallography can provide structure of the highest resolution among the developed techniques. The resolution obtained via X-ray crystallography is known to be influenced by many factors, such as the crystal quality, diffraction techniques, and X-ray sources, etc. In this paper, the authors found that the protein sequence could also be one of the factors. We extracted information of the resolution and the sequence of proteins from the Protein Data Bank (PDB), classified the proteins into different clusters according to the sequence similarity, and statistically analyzed the relationship between the sequence similarity and the best resolution obtained. The results showed that there was a pronounced correlation between the sequence similarity and the obtained resolution. These results indicate that protein structure itself is one variable that may affect resolution when X-ray crystallography is used.

  17. Sequence diversity, cytotoxicity and antigenic similarities of the leukotoxin of isolates of Mannheimia species from mastitis in domestic sheep.

    Science.gov (United States)

    Omaleki, Lida; Browning, Glenn F; Barber, Stuart R; Allen, Joanne L; Srikumaran, Subramaniam; Markham, Philip F

    2014-11-07

    Species within the genus Mannheimia are among the most important causes of ovine mastitis. Isolates of these species can express leukotoxin A (LktA), a primary virulence factor of these bacteria. To examine the significance of variation in the LktA, the sequences of the lktA genes in a panel of isolates from cases of ovine mastitis were compared. The cross-neutralising capacities of rat antisera raised against LktA of one Mannheimia glucosida, one haemolytic Mannheimia ruminalis, and two Mannheimia haemolytica isolates were also examined to assess the effect that variation in the lktA gene can have on protective immunity against leukotoxins with differing sequences. The lktA nucleotide distance between the M. haemolytica isolates was greater than between the M. glucosida isolates, with the M. haemolytica isolates divisible into two groups based on their lktA sequences. Comparison of the topology of phylogenetic trees of 16S rDNA and lktA sequences revealed differences in the relationships between some isolates, suggesting horizontal gene transfer. Cross neutralisation data obtained with monospecific anti-LktA rat sera were used to derive antigenic similarity coefficients for LktA from the four Mannheimia species isolates. Similarity coefficients indicated that LktA of the two M. haemolytica isolates were least similar, while LktA from M. glucosida was most similar to those for one of the M. haemolytica isolates and the haemolytic M. ruminalis isolate. The results suggested that vaccination with the M. glucosida leukotoxin would generate the greatest cross-protection against ovine mastitis caused by Mannheimia species with these alleles. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. Model-free aftershock forecasts constructed from similar sequences in the past

    Science.gov (United States)

    van der Elst, N.; Page, M. T.

    2017-12-01

    The basic premise behind aftershock forecasting is that sequences in the future will be similar to those in the past. Forecast models typically use empirically tuned parametric distributions to approximate past sequences, and project those distributions into the future to make a forecast. While parametric models do a good job of describing average outcomes, they are not explicitly designed to capture the full range of variability between sequences, and can suffer from over-tuning of the parameters. In particular, parametric forecasts may produce a high rate of "surprises" - sequences that land outside the forecast range. Here we present a non-parametric forecast method that cuts out the parametric "middleman" between training data and forecast. The method is based on finding past sequences that are similar to the target sequence, and evaluating their outcomes. We quantify similarity as the Poisson probability that the observed event count in a past sequence reflects the same underlying intensity as the observed event count in the target sequence. Event counts are defined in terms of differential magnitude relative to the mainshock. The forecast is then constructed from the distribution of past sequences outcomes, weighted by their similarity. We compare the similarity forecast with the Reasenberg and Jones (RJ95) method, for a set of 2807 global aftershock sequences of M≥6 mainshocks. We implement a sequence-specific RJ95 forecast using a global average prior and Bayesian updating, but do not propagate epistemic uncertainty. The RJ95 forecast is somewhat more precise than the similarity forecast: 90% of observed sequences fall within a factor of two of the median RJ95 forecast value, whereas the fraction is 85% for the similarity forecast. However, the surprise rate is much higher for the RJ95 forecast; 10% of observed sequences fall in the upper 2.5% of the (Poissonian) forecast range. The surprise rate is less than 3% for the similarity forecast. The similarity

  19. Using sequence similarity networks for visualization of relationships across diverse protein superfamilies.

    Directory of Open Access Journals (Sweden)

    Holly J Atkinson

    Full Text Available The dramatic increase in heterogeneous types of biological data--in particular, the abundance of new protein sequences--requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity--GPCRs and kinases from humans, and the crotonase superfamily of enzymes--we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.

  20. Using sequence similarity networks for visualization of relationships across diverse protein superfamilies.

    Science.gov (United States)

    Atkinson, Holly J; Morris, John H; Ferrin, Thomas E; Babbitt, Patricia C

    2009-01-01

    The dramatic increase in heterogeneous types of biological data--in particular, the abundance of new protein sequences--requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity--GPCRs and kinases from humans, and the crotonase superfamily of enzymes--we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.

  1. K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

    Science.gov (United States)

    Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

    2018-05-15

    Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.

  2. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM

    Directory of Open Access Journals (Sweden)

    Yunyun Liang

    2015-01-01

    Full Text Available Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM. Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS, segmented PsePSSM, and segmented autocovariance transformation (ACT based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640 are adopted in this paper. Then a 700-dimensional (700D feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA. To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  3. Adhesive proteins of stalked and acorn barnacles display homology with low sequence similarities.

    Directory of Open Access Journals (Sweden)

    Jaimie-Leigh Jonker

    Full Text Available Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins 'sticky' has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes. It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa. Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7-16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes. Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18-26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa are more conserved within barnacles than others (20 kDa.

  4. Detecting atypical examples of known domain types by sequence similarity searching: the SBASE domain library approach.

    Science.gov (United States)

    Dhir, Somdutta; Pacurar, Mircea; Franklin, Dino; Gáspári, Zoltán; Kertész-Farkas, Attila; Kocsor, András; Eisenhaber, Frank; Pongor, Sándor

    2010-11-01

    SBASE is a project initiated to detect known domain types and predicting domain architectures using sequence similarity searching (Simon et al., Protein Seq Data Anal, 5: 39-42, 1992, Pongor et al, Nucl. Acids. Res. 21:3111-3115, 1992). The current approach uses a curated collection of domain sequences - the SBASE domain library - and standard similarity search algorithms, followed by postprocessing which is based on a simple statistics of the domain similarity network (http://hydra.icgeb.trieste.it/sbase/). It is especially useful in detecting rare, atypical examples of known domain types which are sometimes missed even by more sophisticated methodologies. This approach does not require multiple alignment or machine learning techniques, and can be a useful complement to other domain detection methodologies. This article gives an overview of the project history as well as of the concepts and principles developed within this the project.

  5. Similar representations of sequence knowledge in young and older adults: A study of effector independent transfer

    Directory of Open Access Journals (Sweden)

    Jonathan Sebastiaan Barnhoorn

    2016-08-01

    Full Text Available Older adults show reduced motor performance and changes in motor skill development. To better understand these changes, we studied differences in sequence knowledge representations between young and older adults using a transfer task. Transfer, or the ability to apply motor skills flexibly, is highly relevant in day-to-day motor activity and facilitates generalization of learning to new contexts. By using movement types that are completely unrelated in terms of muscle activation and response location, we focused on transfer facilitated by the early, visuospatial system.We tested 32 right-handed older adults (65 – 74 and 32 young adults (18 – 30. During practice of a discrete sequence production task, participants learned two 6-element sequences using either unimanual key-presses (KPs or by moving a lever with lower arm flexion-extension (FE movements. Each sequence was performed 144 times. They then performed a test phase consisting of familiar and random sequences performed with the type of movements not used during practice. Both age groups displayed transfer from FE to KP movements as indicated by faster performance on the familiar sequences in the test phase. Only young adults transferred their sequence knowledge from KP to FE movements. In both directions, the young showed higher transfer than older adults. These results suggest that the older participants, like the young, represented their sequences in an abstract visuospatial manner. Transfer was asymmetric in both age groups: there was more transfer from FE to KP movements than vice versa. This similar asymmetry is a further indication that the types of representations that older adults develop are comparable to those that young adults develop. We furthermore found that older adults improved less during FE practice, gained less explicit knowledge, displayed a smaller visuospatial working memory capacity and had lower processing speed than young adults. Despite the many differences

  6. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Soichirou Satoh

    Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  7. On the Power and Limits of Sequence Similarity Based Clustering of Proteins Into Families

    DEFF Research Database (Denmark)

    Wiwie, Christian; Röttger, Richard

    2017-01-01

    Over the last decades, we have observed an ongoing tremendous growth of available sequencing data fueled by the advancements in wet-lab technology. The sequencing information is only the beginning of the actual understanding of how organisms survive and prosper. It is, for instance, equally...... important to also unravel the proteomic repertoire of an organism. A classical computational approach for detecting protein families is a sequence-based similarity calculation coupled with a subsequent cluster analysis. In this work we have intensively analyzed various clustering tools on a large scale. We...... used the data to investigate the behavior of the tools' parameters underlining the diversity of the protein families. Furthermore, we trained regression models for predicting the expected performance of a clustering tool for an unknown data set and aimed to also suggest optimal parameters...

  8. Bidirectional gene sequences with similar homology to functional proteins of alkane degrading bacterium pseudomonas fredriksbergensis DNA

    International Nuclear Information System (INIS)

    Megeed, A.A.

    2011-01-01

    The potential for two overlapping fragments of DNA from a clone of newly isolated alkanes degrading bacterium Pseudomonas frederiksbergensis encoding sequences with similar homology to two parts of functional proteins is described. One strand contains a sequence with high homology to alkanes monooxygenase (alkB), a member of the alkanes hydroxylase family, and the other strand contains a sequence with some homology to alcohol dehydrogenase gene (alkJ). Overlapping of the genes on opposite strands has been reported in eukaryotic species, and is now reported in a bacterial species. The sequence comparisons and ORFS results revealed that the regulation and the genes organization involved in alkane oxidation represented in Pseudomonas frederiksberghensis varies among the different known alkane degrading bacteria. The alk gene cluster containing homologues to the known alkane monooxygenase (alkB), and rubredoxin (alkG) are oriented in the same direction, whereas alcohol dehydrogenase (alkJ) is oriented in the opposite direction. Such genomes encode messages on both strands of the DNA, or in an overlapping but different reading frames, of the same strand of DNA. The possibility of creating novel genes from pre-existing sequences, known as overprinting, which is a widespread phenomenon in small viruses. Here, the origin and evolution of the gene overlap to bacteriophages belonging to the family Microviridae have been investigated. Such a phenomenon is most widely described in extremely small genomes such as those of viruses or small plasmids, yet here is a unique phenomenon. (author)

  9. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    Energy Technology Data Exchange (ETDEWEB)

    Ovacik, Meric A. [Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States); Androulakis, Ioannis P., E-mail: yannis@rci.rutgers.edu [Chemical and Biochemical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States); Biomedical Engineering Department, Rutgers University, Piscataway, NJ 08854 (United States)

    2013-09-15

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy.

  10. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    International Nuclear Information System (INIS)

    Ovacik, Meric A.; Androulakis, Ioannis P.

    2013-01-01

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy

  11. Testing statistical significance scores of sequence comparison methods with structure similarity

    Directory of Open Access Journals (Sweden)

    Leunissen Jack AM

    2006-10-01

    Full Text Available Abstract Background In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. Results All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. Conclusion The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons.

  12. Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.

    Science.gov (United States)

    Tan, Yen Hock; Huang, He; Kihara, Daisuke

    2006-08-15

    Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.

  13. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment

    Directory of Open Access Journals (Sweden)

    Manzini Giovanni

    2007-07-01

    Full Text Available Abstract Background Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity, NCD (Normalized Compression Dissimilarity and CD (Compression Dissimilarity. Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. Results We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC

  14. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment.

    Science.gov (United States)

    Ferragina, Paolo; Giancarlo, Raffaele; Greco, Valentina; Manzini, Giovanni; Valiente, Gabriel

    2007-07-13

    Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at

  15. Scaling Relations of Local Magnitude versus Moment Magnitude for Sequences of Similar Earthquakes in Switzerland

    KAUST Repository

    Bethmann, F.

    2011-03-22

    Theoretical considerations and empirical regressions show that, in the magnitude range between 3 and 5, local magnitude, ML, and moment magnitude, Mw, scale 1:1. Previous studies suggest that for smaller magnitudes this 1:1 scaling breaks down. However, the scatter between ML and Mw at small magnitudes is usually large and the resulting scaling relations are therefore uncertain. In an attempt to reduce these uncertainties, we first analyze the ML versus Mw relation based on 195 events, induced by the stimulation of a geothermal reservoir below the city of Basel, Switzerland. Values of ML range from 0.7 to 3.4. From these data we derive a scaling of ML ~ 1:5Mw over the given magnitude range. We then compare peak Wood-Anderson amplitudes to the low-frequency plateau of the displacement spectra for six sequences of similar earthquakes in Switzerland in the range of 0:5 ≤ ML ≤ 4:1. Because effects due to the radiation pattern and to the propagation path between source and receiver are nearly identical at a particular station for all events in a given sequence, the scatter in the data is substantially reduced. Again we obtain a scaling equivalent to ML ~ 1:5Mw. Based on simulations using synthetic source time functions for different magnitudes and Q values estimated from spectral ratios between downhole and surface recordings, we conclude that the observed scaling can be explained by attenuation and scattering along the path. Other effects that could explain the observed magnitude scaling, such as a possible systematic increase of stress drop or rupture velocity with moment magnitude, are masked by attenuation along the path.

  16. Defining reference sequences for Nocardia species by similarity and clustering analyses of 16S rRNA gene sequence data.

    Directory of Open Access Journals (Sweden)

    Manal Helal

    Full Text Available BACKGROUND: The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. METHODS: A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. RESULTS: The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52% corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as 'centroids' in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578. CONCLUSION: The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra

  17. The genomic sequence of cowpea aphid-borne mosaic virus and its similarities with other potyviruses

    NARCIS (Netherlands)

    Mlotshwa, S.; Verver, J.; Sithole-Niang, I.; Kampen, van T.; Kammen, van A.; Wellink, J.

    2002-01-01

    The genomic sequence of a Zimbabwe isolate of Cowpea aphid-borne mosaic virus (CABMV-Z) was determined by sequencing overlapping viral cDNA clones generated by RT-PCR using degenerate and/or specific primers. The sequence is 9465 nucleotides in length excluding the 3' terminal poly (A) tail and

  18. Characterization of CG6178 gene product with high sequence similarity to firefly luciferase in Drosophila melanogaster.

    Science.gov (United States)

    Oba, Yuichi; Ojika, Makoto; Inouye, Satoshi

    2004-03-31

    This is the first identification of a long-chain fatty acyl-CoA synthetase in Drosophila by enzymatic characterization. The gene product of CG6178 (CG6178) in Drosophila melanogaster genome, which has a high sequence similarity to firefly luciferase, has been expressed and characterized. CG6178 showed long-chain fatty acyl-CoA synthetic activity in the presence of ATP, CoA and Mg(2+), suggesting a fatty acyl adenylate is an intermediate. Recently, it was revealed that firefly luciferase has two catalytic functions, monooxygenase (luciferase) and AMP-mediated CoA ligase (fatty acyl-CoA synthetase). However, unlike firefly luciferase, CG6178 did not show luminescence activity in the presence of firefly luciferin, ATP, CoA and Mg(2+). The enzymatic properties of CG6178 including substrate specificity, pH dependency and optimal temperature were close to those of firefly luciferase and rat fatty acyl-CoA synthetase. Further, phylogenic analyses strongly suggest that the firefly luciferase gene may have evolved from a fatty acyl-CoA synthetase gene as a common ancestral gene.

  19. CLONING AND SEQUENCING OF THE GENE FOR A LACTOCOCCAL ENDOPEPTIDASE, AN ENZYME WITH SEQUENCE SIMILARITY TO MAMMALIAN ENKEPHALINASE

    NARCIS (Netherlands)

    Mierau, Igor; Tan, Paris S.T.; Haandrikman, Alfred J.; Kok, Jan; Leenhouts, Kees J.; Konings, Wil N.; Venema, Gerard

    The gene specifying an endopeptidase of Lactococcus lactis, named pepO, was cloned from a genomic library of L. lactis subsp. cremoris P8-247 in lambdaEMBL3 and was subsequently sequenced. pepO is probably the last gene of an operon encoding the binding-protein-dependent oligopeptide transport

  20. Simultaneous identification of long similar substrings in large sets of sequences

    Directory of Open Access Journals (Sweden)

    Wittig Burghardt

    2007-05-01

    Full Text Available Abstract Background Sequence comparison faces new challenges today, with many complete genomes and large libraries of transcripts known. Gene annotation pipelines match these sequences in order to identify genes and their alternative splice forms. However, the software currently available cannot simultaneously compare sets of sequences as large as necessary especially if errors must be considered. Results We therefore present a new algorithm for the identification of almost perfectly matching substrings in very large sets of sequences. Its implementation, called ClustDB, is considerably faster and can handle 16 times more data than VMATCH, the most memory efficient exact program known today. ClustDB simultaneously generates large sets of exactly matching substrings of a given minimum length as seeds for a novel method of match extension with errors. It generates alignments of maximum length with a considered maximum number of errors within each overlapping window of a given size. Such alignments are not optimal in the usual sense but faster to calculate and often more appropriate than traditional alignments for genomic sequence comparisons, EST and full-length cDNA matching, and genomic sequence assembly. The method is used to check the overlaps and to reveal possible assembly errors for 1377 Medicago truncatula BAC-size sequences published at http://www.medicago.org/genome/assembly_table.php?chr=1. Conclusion The program ClustDB proves that window alignment is an efficient way to find long sequence sections of homogenous alignment quality, as expected in case of random errors, and to detect systematic errors resulting from sequence contaminations. Such inserts are systematically overlooked in long alignments controlled by only tuning penalties for mismatches and gaps. ClustDB is freely available for academic use.

  1. Ultra-fast sequence clustering from similarity networks with SiLiX

    Directory of Open Access Journals (Sweden)

    Duret Laurent

    2011-04-01

    Full Text Available Abstract Background The number of gene sequences that are available for comparative genomics approaches is increasing extremely quickly. A current challenge is to be able to handle this huge amount of sequences in order to build families of homologous sequences in a reasonable time. Results We present the software package SiLiX that implements a novel method which reconsiders single linkage clustering with a graph theoretical approach. A parallel version of the algorithms is also presented. As a demonstration of the ability of our software, we clustered more than 3 millions sequences from about 2 billion BLAST hits in 7 minutes, with a high clustering quality, both in terms of sensitivity and specificity. Conclusions Comparing state-of-the-art software, SiLiX presents the best up-to-date capabilities to face the problem of clustering large collections of sequences. SiLiX is freely available at http://lbbe.univ-lyon1.fr/SiLiX.

  2. Similar Representations of Sequence Knowledge in Young and Older Adults: A Study of Effector Independent Transfer

    NARCIS (Netherlands)

    Barnhoorn, Jonathan Sebastiaan; Döhring, Falko R.; van Asseldonk, Edwin H.F.; Verwey, Willem B.

    2016-01-01

    Older adults show reduced motor performance and changes in motor skill development. To better understand these changes, we studied differences in sequence knowledge representations between young and older adults using a transfer task. Transfer, or the ability to apply motor skills flexibly, is

  3. A behavioral similarity measure between labeled Petri nets based on principal transition sequences

    NARCIS (Netherlands)

    Wang, J.; He, T.; Wen, L.; Wu, N.; Hofstede, ter A.H.M.; Su, J.; Meersman, R.; Dillon, T.S.; Herrero, P.

    2010-01-01

    Being able to determine the degree of similarity between process models is important for management, reuse, and analysis of business process models. In this paper we propose a novel method to determine the degree of similarity between process models, which exploits their semantics. Our approach is

  4. Single nucleus genome sequencing reveals high similarity among nuclei of an endomycorrhizal fungus.

    Directory of Open Access Journals (Sweden)

    Kui Lin

    2014-01-01

    Full Text Available Nuclei of arbuscular endomycorrhizal fungi have been described as highly diverse due to their asexual nature and absence of a single cell stage with only one nucleus. This has raised fundamental questions concerning speciation, selection and transmission of the genetic make-up to next generations. Although this concept has become textbook knowledge, it is only based on studying a few loci, including 45S rDNA. To provide a more comprehensive insight into the genetic makeup of arbuscular endomycorrhizal fungi, we applied de novo genome sequencing of individual nuclei of Rhizophagus irregularis. This revealed a surprisingly low level of polymorphism between nuclei. In contrast, within a nucleus, the 45S rDNA repeat unit turned out to be highly diverged. This finding demystifies a long-lasting hypothesis on the complex genetic makeup of arbuscular endomycorrhizal fungi. Subsequent genome assembly resulted in the first draft reference genome sequence of an arbuscular endomycorrhizal fungus. Its length is 141 Mbps, representing over 27,000 protein-coding gene models. We used the genomic sequence to reinvestigate the phylogenetic relationships of Rhizophagus irregularis with other fungal phyla. This unambiguously demonstrated that Glomeromycota are more closely related to Mucoromycotina than to its postulated sister Dikarya.

  5. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    Science.gov (United States)

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Identification of similar regions of protein structures using integrated sequence and structure analysis tools

    Directory of Open Access Journals (Sweden)

    Heiland Randy

    2006-03-01

    Full Text Available Abstract Background Understanding protein function from its structure is a challenging problem. Sequence based approaches for finding homology have broad use for annotation of both structure and function. 3D structural information of protein domains and their interactions provide a complementary view to structure function relationships to sequence information. We have developed a web site http://www.sblest.org/ and an API of web services that enables users to submit protein structures and identify statistically significant neighbors and the underlying structural environments that make that match using a suite of sequence and structure analysis tools. To do this, we have integrated S-BLEST, PSI-BLAST and HMMer based superfamily predictions to give a unique integrated view to prediction of SCOP superfamilies, EC number, and GO term, as well as identification of the protein structural environments that are associated with that prediction. Additionally, we have extended UCSF Chimera and PyMOL to support our web services, so that users can characterize their own proteins of interest. Results Users are able to submit their own queries or use a structure already in the PDB. Currently the databases that a user can query include the popular structural datasets ASTRAL 40 v1.69, ASTRAL 95 v1.69, CLUSTER50, CLUSTER70 and CLUSTER90 and PDBSELECT25. The results can be downloaded directly from the site and include function prediction, analysis of the most conserved environments and automated annotation of query proteins. These results reflect both the hits found with PSI-BLAST, HMMer and with S-BLEST. We have evaluated how well annotation transfer can be performed on SCOP ID's, Gene Ontology (GO ID's and EC Numbers. The method is very efficient and totally automated, generally taking around fifteen minutes for a 400 residue protein. Conclusion With structural genomics initiatives determining structures with little, if any, functional characterization

  7. When Does Between-Sequence Phonological Similarity Promote Irrelevant Sound Disruption?

    Science.gov (United States)

    Marsh, John E.; Vachon, Francois; Jones, Dylan M.

    2008-01-01

    Typically, the phonological similarity between to-be-recalled items and TBI auditory stimuli has no impact if recall in serial order is required. However, in the present study, the authors have shown that the free recall, but not serial recall, of lists of phonologically related to-be-remembered items was disrupted by an irrelevant sound stream…

  8. Protein sequences and redox titrations indicate that the electron acceptors in reaction centers from heliobacteria are similar to Photosystem I

    Science.gov (United States)

    Trost, J. T.; Brune, D. C.; Blankenship, R. E.

    1992-01-01

    Photosynthetic reaction centers isolated from Heliobacillus mobilis exhibit a single major protein on SDS-PAGE of 47 000 Mr. Attempts to sequence the reaction center polypeptide indicated that the N-terminus is blocked. After enzymatic and chemical cleavage, four peptide fragments were sequenced from the Heliobacillus mobilis apoprotein. Only one of these sequences showed significant specific similarity to any of the protein and deduced protein sequences in the GenBank data base. This fragment is identical with 56% of the residues, including both cysteines, found in highly conserved region that is proposed to bind iron-sulfur center Fx in the Photosystem I reaction center peptide that is the psaB gene product. The similarity to the psaA gene product in this region is 48%. Redox titrations of laser-flash-induced photobleaching with millisecond decay kinetics on isolated reaction centers from Heliobacterium gestii indicate a midpoint potential of -414 mV with n = 2 titration behavior. In membranes, the behavior is intermediate between n = 1 and n = 2, and the apparent midpoint potential is -444 mV. This is compared to the behavior in Photosystem I, where the intermediate electron acceptor A1, thought to be a phylloquinone molecule, has been proposed to undergo a double reduction at low redox potentials in the presence of viologen redox mediators. These results strongly suggest that the acceptor side electron transfer system in reaction centers from heliobacteria is indeed analogous to that found in Photosystem I. The sequence similarities indicate that the divergence of the heliobacteria from the Photosystem I line occurred before the gene duplication and subsequent divergence that lead to the heterodimeric protein core of the Photosystem I reaction center.

  9. Musicians' and nonmusicians' short-term memory for verbal and musical sequences: comparing phonological similarity and pitch proximity.

    Science.gov (United States)

    Williamson, Victoria J; Baddeley, Alan D; Hitch, Graham J

    2010-03-01

    Language-music comparative studies have highlighted the potential for shared resources or neural overlap in auditory short-term memory. However, there is a lack of behavioral methodologies for comparing verbal and musical serial recall. We developed a visual grid response that allowed both musicians and nonmusicians to perform serial recall of letter and tone sequences. The new method was used to compare the phonological similarity effect with the impact of an operationalized musical equivalent-pitch proximity. Over the course of three experiments, we found that short-term memory for tones had several similarities to verbal memory, including limited capacity and a significant effect of pitch proximity in nonmusicians. Despite being vulnerable to phonological similarity when recalling letters, however, musicians showed no effect of pitch proximity, a result that we suggest might reflect strategy differences. Overall, the findings support a limited degree of correspondence in the way that verbal and musical sounds are processed in auditory short-term memory.

  10. A Novel Phytase with Sequence Similarity to Purple Acid Phosphatases Is Expressed in Cotyledons of Germinating Soybean Seedlings 1

    Science.gov (United States)

    Hegeman, Carla E.; Grabau, Elizabeth A.

    2001-01-01

    Phytic acid (myo-inositol hexakisphosphate) is the major storage form of phosphorus in plant seeds. During germination, stored reserves are used as a source of nutrients by the plant seedling. Phytic acid is degraded by the activity of phytases to yield inositol and free phosphate. Due to the lack of phytases in the non-ruminant digestive tract, monogastric animals cannot utilize dietary phytic acid and it is excreted into manure. High phytic acid content in manure results in elevated phosphorus levels in soil and water and accompanying environmental concerns. The use of phytases to degrade seed phytic acid has potential for reducing the negative environmental impact of livestock production. A phytase was purified to electrophoretic homogeneity from cotyledons of germinated soybeans (Glycine max L. Merr.). Peptide sequence data generated from the purified enzyme facilitated the cloning of the phytase sequence (GmPhy) employing a polymerase chain reaction strategy. The introduction of GmPhy into soybean tissue culture resulted in increased phytase activity in transformed cells, which confirmed the identity of the phytase gene. It is surprising that the soybean phytase was unrelated to previously characterized microbial or maize (Zea mays) phytases, which were classified as histidine acid phosphatases. The soybean phytase sequence exhibited a high degree of similarity to purple acid phosphatases, a class of metallophosphoesterases. PMID:11500558

  11. "Venom" of the slow loris: sequence similarity of prosimian skin gland protein and Fel d 1 cat allergen.

    Science.gov (United States)

    Krane, Sonja; Itagaki, Yasuhiro; Nakanishi, Koji; Weldon, Paul J

    2003-02-01

    Bites inflicted on humans by the slow loris (Nycticebus coucang), a prosimian from Indonesia, are painful and elicit anaphylaxis. Toxins from N. coucang are thought to originate in the brachial organ, a naked, gland-laden area of skin situated on the flexor surface of the arm that is licked during grooming. We isolated a major component of the brachial organ secretions from N. coucang, an approximately 18 kDa protein composed of two 70-90 amino-acid chains linked by one or more disulfide bonds. The N-termini of these peptide chains exhibit nearly 70% sequence similarity (37% identity, chain 1; 54% identity, chain 2) with the two chains of Fel d 1, the major allergen from the domestic cat (Felis catus). The extensive sequence similarity between the brachial organ component of N. coucang and the cat allergen suggests that they exhibit immunogenic cross-reactivity. This work clarifies the chemical nature of the brachial organ exudate and suggests a possible mode of action underlying the noxious effects of slow loris bites.

  12. Location of the redox-active thiols of ribonucleotide reductase: sequences similarity between the Escherichia coli and Lactobacillus leichmannii enzymes

    International Nuclear Information System (INIS)

    Lin, A.N.I.; Ashley, G.W.; Stubbe, J.

    1987-01-01

    The redox-active thiols of Escherichia coli ribonucleoside diphosphate reductase and of Lactobacillus leichmannii ribonucleoside triphosphate reductase have been located by a procedure involving (1) prereduction of enzyme with dithiothreitol, (2) specific oxidation of the redox-active thiols by treatment with substrate in the absence of exogenous reductant, (3) alkylation of other thiols with iodoacetamide, and (4) reduction of the disulfides with dithiothreitol and alkylation with [1- 14 C]iodoacetamide. The dithiothreitol-reduce E. coli B1 subunit is able to convert 3 equiv of CDP to dCDP and is labeled with 5.4 equiv of 14 C. Sequencing of tryptic peptides shows that 2.8 equiv of 14 C is on cysteines-752 and -757 at the C-terminus of B1, while 1.0-1.5 equiv of 14 C is on cysteines-222 and -227. It thus appears that two sets of redox-active dithiols are involved in substrate reduction. The L. leichmannii reductase is able to convert 1.1 equiv of CTP to dCTP and is labeled with 2.1 equiv of 14 C. Sequencing of tryptic peptides shows that 1.4 equiv of 14 C is located on the two cysteines of C-E-G-G-A-C-P-I-K. This peptide shows remarkable and unexpected similarity to the thiol-containing region of the C-terminal peptide of E. coli B1, C-E-S-G-A-C-K-I

  13. Nature's Greatest Puzzles

    Energy Technology Data Exchange (ETDEWEB)

    Quigg, Chris; /Fermilab

    2005-02-01

    It is a pleasure to be part of the SLAC Summer Institute again, not simply because it is one of the great traditions in our field, but because this is a moment of great promise for particle physics. I look forward to exploring many opportunities with you over the course of our two weeks together. My first task in talking about Nature's Greatest Puzzles, the title of this year's Summer Institute, is to deconstruct the premise a little bit.

  14. Structural and Sequence Similarities of Hydra Xeroderma Pigmentosum A Protein to Human Homolog Suggest Early Evolution and Conservation

    Directory of Open Access Journals (Sweden)

    Apurva Barve

    2013-01-01

    Full Text Available Xeroderma pigmentosum group A (XPA is a protein that binds to damaged DNA, verifies presence of a lesion, and recruits other proteins of the nucleotide excision repair (NER pathway to the site. Though its homologs from yeast, Drosophila, humans, and so forth are well studied, XPA has not so far been reported from protozoa and lower animal phyla. Hydra is a fresh-water cnidarian with a remarkable capacity for regeneration and apparent lack of organismal ageing. Cnidarians are among the first metazoa with a defined body axis, tissue grade organisation, and nervous system. We report here for the first time presence of XPA gene in hydra. Putative protein sequence of hydra XPA contains nuclear localization signal and bears the zinc-finger motif. It contains two conserved Pfam domains and various characterized features of XPA proteins like regions for binding to excision repair cross-complementing protein-1 (ERCC1 and replication protein A 70 kDa subunit (RPA70 proteins. Hydra XPA shows a high degree of similarity with vertebrate homologs and clusters with deuterostomes in phylogenetic analysis. Homology modelling corroborates the very close similarity between hydra and human XPA. The protein thus most likely functions in hydra in the same manner as in other animals, indicating that it arose early in evolution and has been conserved across animal phyla.

  15. In vitro identification and in silico utilization of interspecies sequence similarities using GeneChip® technology

    Directory of Open Access Journals (Sweden)

    Ye Shui Q

    2005-05-01

    Full Text Available Abstract Background Genomic approaches in large animal models (canine, ovine etc are challenging due to insufficient genomic information for these species and the lack of availability of corresponding microarray platforms. To address this problem, we speculated that conserved interspecies genetic sequences can be experimentally detected by cross-species hybridization. The Affymetrix platform probe redundancy offers flexibility in selecting individual probes with high sequence similarities between related species for gene expression analysis. Results Gene expression profiles of 40 canine samples were generated using the human HG-U133A GeneChip (U133A. Due to interspecies genetic differences, only 14 ± 2% of canine transcripts were detected by U133A probe sets whereas profiling of 40 human samples detected 49 ± 6% of human transcripts. However, when these probe sets were deconstructed into individual probes and examined performance of each probe, we found that 47% of human probes were able to find their targets in canine tissues and generate a detectable hybridization signal. Therefore, we restricted gene expression analysis to these probes and observed the 60% increase in the number of identified canine transcripts. These results were validated by comparison of transcripts identified by our restricted analysis of cross-species hybridization with transcripts identified by hybridization of total lung canine mRNA to new Affymetrix Canine GeneChip®. Conclusion The experimental identification and restriction of gene expression analysis to probes with detectable hybridization signal drastically increases transcript detection of canine-human hybridization suggesting the possibility of broad utilization of cross-hybridizations of related species using GeneChip technology.

  16. Pulmonary parenchyma segmentation in thin CT image sequences with spectral clustering and geodesic active contour model based on similarity

    Science.gov (United States)

    He, Nana; Zhang, Xiaolong; Zhao, Juanjuan; Zhao, Huilan; Qiang, Yan

    2017-07-01

    While the popular thin layer scanning technology of spiral CT has helped to improve diagnoses of lung diseases, the large volumes of scanning images produced by the technology also dramatically increase the load of physicians in lesion detection. Computer-aided diagnosis techniques like lesions segmentation in thin CT sequences have been developed to address this issue, but it remains a challenge to achieve high segmentation efficiency and accuracy without much involvement of human manual intervention. In this paper, we present our research on automated segmentation of lung parenchyma with an improved geodesic active contour model that is geodesic active contour model based on similarity (GACBS). Combining spectral clustering algorithm based on Nystrom (SCN) with GACBS, this algorithm first extracts key image slices, then uses these slices to generate an initial contour of pulmonary parenchyma of un-segmented slices with an interpolation algorithm, and finally segments lung parenchyma of un-segmented slices. Experimental results show that the segmentation results generated by our method are close to what manual segmentation can produce, with an average volume overlap ratio of 91.48%.

  17. On universal common ancestry, sequence similarity, and phylogenetic structure: the sins of P-values and the virtues of Bayesian evidence

    Directory of Open Access Journals (Sweden)

    Theobald Douglas L

    2011-11-01

    Full Text Available Abstract Background The universal common ancestry (UCA of all known life is a fundamental component of modern evolutionary theory, supported by a wide range of qualitative molecular evidence. Nevertheless, recently both the status and nature of UCA has been questioned. In earlier work I presented a formal, quantitative test of UCA in which model selection criteria overwhelmingly choose common ancestry over independent ancestry, based on a dataset of universally conserved proteins. These model-based tests are founded in likelihoodist and Bayesian probability theory, in opposition to classical frequentist null hypothesis tests such as Karlin-Altschul E-values for sequence similarity. In a recent comment, Koonin and Wolf (K&W claim that the model preference for UCA is "a trivial consequence of significant sequence similarity". They support this claim with a computational simulation, derived from universally conserved proteins, which produces similar sequences lacking phylogenetic structure. The model selection tests prefer common ancestry for this artificial data set. Results For the real universal protein sequences, hierarchical phylogenetic structure (induced by genealogical history is the overriding reason for why the tests choose UCA; sequence similarity is a relatively minor factor. First, for cases of conflicting phylogenetic structure, the tests choose independent ancestry even with highly similar sequences. Second, certain models, like star trees and K&W's profile model (corresponding to their simulation, readily explain sequence similarity yet lack phylogenetic structure. However, these are extremely poor models for the real proteins, even worse than independent ancestry models, though they explain K&W's artificial data well. Finally, K&W's simulation is an implementation of a well-known phylogenetic model, and it produces sequences that mimic homologous proteins. Therefore the model selection tests work appropriately with the artificial

  18. Phylogenetic similarity of the canine parvovirus wild-type isolates on the basis of VP1/VP2 gene fragment sequence analysis.

    Science.gov (United States)

    Rypul, K; Chmielewski, R; Smielewska-Loś, E; Klimentowski, S

    2002-04-01

    Biological material was taken from dogs with diarrhoea. Faecal samples were taken from within live animals and intestinal tract fragments (i.e. small intestine, and stomach) were taken from dead animals. In total, 18 specimens were investigated from dogs housed alone or in large groups. To test for the presence of the virus, latex (On Site Biotech, Uppsala, Sweden) and direct immunofluorescence tests were performed. At the same time, polymerase chain reaction (PCR) with primers complementary to a conservative region of VP1/VP2 was carried out. The products of amplification were analysed on 2% agarose gel. The purified products were cloned with the Template Generation System (Finnzymes, Espoo, Finland) using a transposition reaction and positive clones were searched using the 'colony screening by PCR' method. The sequencing gave 12 sequences of VP1/VP2 gene fragments that were of high similarity. Among the 12 analysed sequences, six exhibited 88% similarity, four exhibited 100% similarity and two exhibited 71% similarity.

  19. An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids.

    Science.gov (United States)

    Li, Yushuang; Song, Tian; Yang, Jiasheng; Zhang, Yi; Yang, Jialiang

    2016-01-01

    In this paper, we have proposed a novel alignment-free method for comparing the similarity of protein sequences. We first encode a protein sequence into a 440 dimensional feature vector consisting of a 400 dimensional Pseudo-Markov transition probability vector among the 20 amino acids, a 20 dimensional content ratio vector, and a 20 dimensional position ratio vector of the amino acids in the sequence. By evaluating the Euclidean distances among the representing vectors, we compare the similarity of protein sequences. We then apply this method into the ND5 dataset consisting of the ND5 protein sequences of 9 species, and the F10 and G11 datasets representing two of the xylanases containing glycoside hydrolase families, i.e., families 10 and 11. As a result, our method achieves a correlation coefficient of 0.962 with the canonical protein sequence aligner ClustalW in the ND5 dataset, much higher than those of other 5 popular alignment-free methods. In addition, we successfully separate the xylanases sequences in the F10 family and the G11 family and illustrate that the F10 family is more heat stable than the G11 family, consistent with a few previous studies. Moreover, we prove mathematically an identity equation involving the Pseudo-Markov transition probability vector and the amino acids content ratio vector.

  20. Remarkable sequence similarity between the dinoflagellate-infecting marine girus and the terrestrial pathogen African swine fever virus

    Directory of Open Access Journals (Sweden)

    Claverie Jean-Michel

    2009-10-01

    Full Text Available Abstract Heterocapsa circularisquama DNA virus (HcDNAV; previously designated as HcV is a giant virus (girus with a ~356-kbp double-stranded DNA (dsDNA genome. HcDNAV lytically infects the bivalve-killing marine dinoflagellate H. circularisquama, and currently represents the sole DNA virus isolated from dinoflagellates, one of the most abundant protists in marine ecosystems. Its morphological features, genome type, and host range previously suggested that HcDNAV might be a member of the family Phycodnaviridae of Nucleo-Cytoplasmic Large DNA Viruses (NCLDVs, though no supporting sequence data was available. NCLDVs currently include two families found in aquatic environments (Phycodnaviridae, Mimiviridae, one mostly infecting terrestrial animals (Poxviridae, another isolated from fish, amphibians and insects (Iridoviridae, and the last one (Asfarviridae exclusively represented by the animal pathogen African swine fever virus (ASFV, the agent of a fatal hemorrhagic disease in domestic swine. In this study, we determined the complete sequence of the type B DNA polymerase (PolB gene of HcDNAV. The viral PolB was transcribed at least from 6 h post inoculation (hpi, suggesting its crucial function for viral replication. Most unexpectedly, the HcDNAV PolB sequence was found to be closely related to the PolB sequence of ASFV. In addition, the amino acid sequence of HcDNAV PolB showed a rare amino acid substitution within a motif containing highly conserved motif: YSDTDS was found in HcDNAV PolB instead of YGDTDS in most dsDNA viruses. Together with the previous observation of ASFV-like sequences in the Sorcerer II Global Ocean Sampling metagenomic datasets, our results further reinforce the ideas that the terrestrial ASFV has its evolutionary origin in marine environments.

  1. Structural and sequence variants in patients with Silver-Russell syndrome or similar features-Curation of a disease database

    DEFF Research Database (Denmark)

    Tümer, Zeynep; López-Hernández, Julia Angélica; Netchine, Irène

    2018-01-01

    data of these patients. The clinical features are scored according to the Netchine-Harbison clinical scoring system (NH-CSS), which has recently been accepted as standard by consensus. The structural and sequence variations are reviewed and where necessary redescribed according to recent...

  2. Einstein's greatest mistake abandonment of the aether

    CERN Document Server

    Deutsch, Sid

    2006-01-01

    If a child wants proof, we can think of 10 different ways to show that we are surrounded by air, but we are, of course, normally unaware that we live at the bottom of an “ocean” of air. It is claimed, in this book, that we are unaware, similarly, that we are surrounded by an atmosphere of aether. There is one major difference, however: We have not been able to detect the aether. Nevertheless, the aether provides a solution to the following mystery: How can light, or any electromagnetic wave, travel for billions of years across the vastness of the Universe, without losing any energy? The answer is that the Universe is filled with a light-transmitting medium, The Aether. The proof that there is an aether is the subject of the present book. An intriguing…exploration of a fringe scientific theory. Luminiferous aether—or "light-bearing aether," a theory first postulated by Isaac Newton in the 18th century, later refined by James Clerk Maxwell in the 19th century and ultimately replaced by Albert Einstein'...

  3. Analysis of HIV-1 intersubtype recombination breakpoints suggests region with high pairing probability may be a more fundamental factor than sequence similarity affecting HIV-1 recombination.

    Science.gov (United States)

    Jia, Lei; Li, Lin; Gui, Tao; Liu, Siyang; Li, Hanping; Han, Jingwan; Guo, Wei; Liu, Yongjian; Li, Jingyun

    2016-09-21

    With increasing data on HIV-1, a more relevant molecular model describing mechanism details of HIV-1 genetic recombination usually requires upgrades. Currently an incomplete structural understanding of the copy choice mechanism along with several other issues in the field that lack elucidation led us to perform an analysis of the correlation between breakpoint distributions and (1) the probability of base pairing, and (2) intersubtype genetic similarity to further explore structural mechanisms. Near full length sequences of URFs from Asia, Europe, and Africa (one sequence/patient), and representative sequences of worldwide CRFs were retrieved from the Los Alamos HIV database. Their recombination patterns were analyzed by jpHMM in detail. Then the relationships between breakpoint distributions and (1) the probability of base pairing, and (2) intersubtype genetic similarities were investigated. Pearson correlation test showed that all URF groups and the CRF group exhibit the same breakpoint distribution pattern. Additionally, the Wilcoxon two-sample test indicated a significant and inexplicable limitation of recombination in regions with high pairing probability. These regions have been found to be strongly conserved across distinct biological states (i.e., strong intersubtype similarity), and genetic similarity has been determined to be a very important factor promoting recombination. Thus, the results revealed an unexpected disagreement between intersubtype similarity and breakpoint distribution, which were further confirmed by genetic similarity analysis. Our analysis reveals a critical conflict between results from natural HIV-1 isolates and those from HIV-1-based assay vectors in which genetic similarity has been shown to be a very critical factor promoting recombination. These results indicate the region with high-pairing probabilities may be a more fundamental factor affecting HIV-1 recombination than sequence similarity in natural HIV-1 infections. Our

  4. StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase

    Energy Technology Data Exchange (ETDEWEB)

    Zemla, A; Lang, D; Kostova, T; Andino, R; Zhou, C

    2010-11-29

    Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitate the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected

  5. An alignment-free method to find similarity among protein sequences via the general form of Chou's pseudo amino acid composition.

    Science.gov (United States)

    Gupta, M K; Niyogi, R; Misra, M

    2013-01-01

    In this paper, we propose a method to create the 60-dimensional feature vector for protein sequences via the general form of pseudo amino acid composition. The construction of the feature vector is based on the contents of amino acids, total distance of each amino acid from the first amino acid in the protein sequence and the distribution of 20 amino acids. The obtained cosine distance metric (also called the similarity matrix) is used to construct the phylogenetic tree by the neighbour joining method. In order to show the applicability of our approach, we tested it on three proteins: 1) ND5 protein sequences from nine species, 2) ND6 protein sequences from eight species, and 3) 50 coronavirus spike proteins. The results are in agreement with known history and the output from the multiple sequence alignment program ClustalW, which is widely used. We have also compared our phylogenetic results with six other recently proposed alignment-free methods. These comparisons show that our proposed method gives a more consistent biological relationship than the others. In addition, the time complexity is linear and space required is less as compared with other alignment-free methods that use graphical representation. It should be noted that the multiple sequence alignment method has exponential time complexity.

  6. An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

    Science.gov (United States)

    Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

    2016-02-18

    The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through

  7. A protein-tyrosine phosphatase with sequence similarity to the SH2 domain of the protein-tyrosine kinases.

    Science.gov (United States)

    Shen, S H; Bastien, L; Posner, B I; Chrétien, P

    1991-08-22

    The phosphorylation of proteins at tyrosine residues is critical in cellular signal transduction, neoplastic transformation and control of the mitotic cycle. These mechanisms are regulated by the activities of both protein-tyrosine kinases (PTKs) and protein-tyrosine phosphatases (PTPases). As in the PTKs, there are two classes of PTPases: membrane associated, receptor-like enzymes and soluble proteins. Here we report the isolation of a complementary DNA clone encoding a new form of soluble PTPase, PTP1C. The enzyme possesses a large noncatalytic region at the N terminus which unexpectedly contains two adjacent copies of the Src homology region 2 (the SH2 domain) found in various nonreceptor PTKs and other cytoplasmic signalling proteins. As with other SH2 sequences, the SH2 domains of PTP1C formed high-affinity complexes with the activated epidermal growth factor receptor and other phosphotyrosine-containing proteins. These results suggest that the SH2 regions in PTP1C may interact with other cellular components to modulate its own phosphatase activity against interacting substrates. PTPase activity may thus directly link growth factor receptors and other signalling proteins through protein-tyrosine phosphorylation.

  8. Fold-recognition and comparative modeling of human α2,3-sialyltransferases reveal their sequence and structural similarities to CstII from Campylobacter jejuni

    Directory of Open Access Journals (Sweden)

    Balaji Petety V

    2006-04-01

    Full Text Available Abstract Background The 3-D structure of none of the eukaryotic sialyltransferases (SiaTs has been determined so far. Sequence alignment algorithms such as BLAST and PSI-BLAST could not detect a homolog of these enzymes from the protein databank. SiaTs, thus, belong to the hard/medium target category in the CASP experiments. The objective of the current work is to model the 3-D structures of human SiaTs which transfer the sialic acid in α2,3-linkage viz., ST3Gal I, II, III, IV, V, and VI, using fold-recognition and comparative modeling methods. The pair-wise sequence similarity among these six enzymes ranges from 41 to 63%. Results Unlike the sequence similarity servers, fold-recognition servers identified CstII, a α2,3/8 dual-activity SiaT from Campylobacter jejuni as the homolog of all the six ST3Gals; the level of sequence similarity between CstII and ST3Gals is only 15–20% and the similarity is restricted to well-characterized motif regions of ST3Gals. Deriving template-target sequence alignments for the entire ST3Gal sequence was not straightforward: the fold-recognition servers could not find a template for the region preceding the L-motif and that between the L- and S-motifs. Multiple structural templates were identified to model these regions and template identification-modeling-evaluation had to be performed iteratively to choose the most appropriate templates. The modeled structures have acceptable stereochemical properties and are also able to provide qualitative rationalizations for some of the site-directed mutagenesis results reported in literature. Apart from the predicted models, an unexpected but valuable finding from this study is the sequential and structural relatedness of family GT42 and family GT29 SiaTs. Conclusion The modeled 3-D structures can be used for docking and other modeling studies and for the rational identification of residues to be mutated to impart desired properties such as altered stability, substrate

  9. Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L. reveals lower substitution rates, but similar selective constraints in gymnosperms and angiosperms

    Directory of Open Access Journals (Sweden)

    Chen Jun

    2012-11-01

    Full Text Available Abstract Background A detailed knowledge about spatial and temporal gene expression is important for understanding both the function of genes and their evolution. For the vast majority of species, transcriptomes are still largely uncharacterized and even in those where substantial information is available it is often in the form of partially sequenced transcriptomes. With the development of next generation sequencing, a single experiment can now simultaneously identify the transcribed part of a species genome and estimate levels of gene expression. Results mRNA from actively growing needles of Norway spruce (Picea abies was sequenced using next generation sequencing technology. In total, close to 70 million fragments with a length of 76 bp were sequenced resulting in 5 Gbp of raw data. A de novo assembly of these reads, together with publicly available expressed sequence tag (EST data from Norway spruce, was used to create a reference transcriptome. Of the 38,419 PUTs (putative unique transcripts longer than 150 bp in this reference assembly, 83.5% show similarity to ESTs from other spruce species and of the remaining PUTs, 3,704 show similarity to protein sequences from other plant species, leaving 4,167 PUTs with limited similarity to currently available plant proteins. By predicting coding frames and comparing not only the Norway spruce PUTs, but also PUTs from the close relatives Picea glauca and Picea sitchensis to both Pinus taeda and Taxus mairei, we obtained estimates of synonymous and non-synonymous divergence among conifer species. In addition, we detected close to 15,000 SNPs of high quality and estimated gene expression differences between samples collected under dark and light conditions. Conclusions Our study yielded a large number of single nucleotide polymorphisms as well as estimates of gene expression on transcriptome scale. In agreement with a recent study we find that the synonymous substitution rate per year (0.6 × 10

  10. Characterization of human MMTV-like (HML) elements similar to a sequence that was highly expressed in a human breast cancer: further definition of the HML-6 group.

    Science.gov (United States)

    Yin, H; Medstrand, P; Kristofferson, A; Dietrich, U; Aman, P; Blomberg, J

    1999-03-30

    Previously, we found a retroviral sequence, HML-6.2BC1, to be expressed at high levels in a multifocal ductal breast cancer from a 41-year-old woman who also developed ovarian carcinoma. The sequence of a human genomic clone (HML-6.28) selected by high-stringency hybridization with HML-6.2BC1 is reported here. It was 99% identical to HML-6.2BC1 and gave the same restriction fragments as total DNA. HML-6.28 is a 4.7-kb provirus with a 5'LTR, truncated in RT. Data from two similar genomic clones and sequences found in GenBank are also reported. Overlaps between them gave a rather complete picture of the HML-6.2BC1-like human endogenous retroviral elements. Work with somatic cell hybrids and FISH localized HML-6.28 to chromosome 6, band p21, close to the MHC region. The causal role of HML-6.28 in breast cancer remains unclear. Nevertheless, the ca. 20 Myr old HML-6 sequences enabled the definition of common and unique features of type A, B, and D (ABD) retroviruses. In Gag, HML-6 has no intervening sequences between matrix and capsid proteins, unlike extant exogenous ABD viruses, possibly an ancestral feature. Alignment of the dUTPase showed it to be present in all ABD viruses, but gave a phylogenetic tree different from trees made from other ABD genes, indicating a distinct phylogeny of dUTPase. A conserved 24-mer sequence in the amino terminus of some ABD envelope genes suggested a conserved function. Copyright 1999 Academic Press.

  11. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity.

    Science.gov (United States)

    Li, Ying Hong; Xu, Jing Yu; Tao, Lin; Li, Xiao Feng; Li, Shuang; Zeng, Xian; Chen, Shang Ying; Zhang, Peng; Qin, Chu; Zhang, Cheng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong

    2016-01-01

    Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.

  12. Molecular phylogeny and species separation of five morphologically similar Holosticha-complex ciliates (Protozoa, Ciliophora) using ARDRA riboprinting and multigene sequence data

    Science.gov (United States)

    Gao, Feng; Yi, Zhenzhen; Gong, Jun; Al-Rasheid Khaled, A. S.; Song, Weibo

    2010-05-01

    To separate and redefine the ambiguous Holosticha-complex, a confusing group of hypotrichous ciliates, six strains belonging to five morphospecies of three genera, Holosticha heterofoissneri, Anteholosticha sp. pop1, Anteholosticha sp. pop2, A. manca, A. gracilis and Nothoholosticha fasciola, were analyzed using 12 restriction enzymes on the basis of amplified ribosomal DNA restriction analysis. Nine of the 12 enzymes could digest the DNA products, four ( Hinf I, Hind III, Msp I, Taq I) yielded species-specific restriction patterns, and Hind III and Taq I produced different patterns for two Anteholosticha sp. populations. Distinctly different restriction digestion haplotypes and similarity indices can be used to separate the species. The secondary structures of the five species were predicted based on the ITS2 transcripts and there were several minor differences among species, while two Anteholosticha sp. populations were identical. In addition, phylogenies based on the SSrRNA gene sequences were reconstructed using multiple algorithms, which grouped them generally into four clades, and exhibited that the genus Anteholosticha should be a convergent assemblage. The fact that Holosticha species clustered with the oligotrichs and choreotrichs, though with very low support values, indicated that the topology may be very divergent and unreliable when the number of sequence data used in the analyses is too low.

  13. Was ocean acidification responsible for history's greatest extinction?

    Science.gov (United States)

    Schultz, Colin

    2011-11-01

    Two hundred fifty million years ago, the world suffered the greatest recorded extinction of all time. More than 90% of marine animals and a majority of terrestrial species disappeared, yet the cause of the Permian-Triassic boundary (PTB) dieoff remains unknown. Various theories abound, with most focusing on rampant Siberian volcanism and its potential consequences: global warming, carbon dioxide poisoning, ocean acidification, or the severe drawdown of oceanic dissolved oxygen levels, also known as anoxia. To narrow the range of possible causes, Montenegro et al. ran climate simulations for PTB using the University of Victoria Earth System Climate Model, a carbon cycle-climate coupled general circulation model.

  14. Massive the Higgs boson and the greatest hunt in science

    CERN Document Server

    Sample, Ian

    2013-01-01

    Now fully updated -- this is the dramatic and gripping account of the greatest scientific discovery of our time. In the early 1960s, three groups of physicists, working independently in different countries, stumbled upon an idea that would change physics and fuel the imagination of scientists for decades. That idea was the Higgs boson -- to find it would be to finally understand the origins of mass -- the last building block of life itself. Now, almost 50 years later, that particle has finally been discovered.

  15. Human Treponema pallidum 11q/j isolate belongs to subsp. endemicum but contains two loci with a sequence in TP0548 and TP0488 similar to subsp. pertenue and subsp. pallidum, respectively.

    Directory of Open Access Journals (Sweden)

    Lenka Mikalová

    2017-03-01

    Full Text Available Treponema pallidum subsp. endemicum (TEN is the causative agent of endemic syphilis (bejel. An unusual human TEN 11q/j isolate was obtained from a syphilis-like primary genital lesion from a patient that returned to France from Pakistan.The TEN 11q/j isolate was characterized using nested PCR followed by Sanger sequencing and/or direct Illumina sequencing. Altogether, 44 chromosomal regions were analyzed. Overall, the 11q/j isolate clustered with TEN strains Bosnia A and Iraq B as expected from previous TEN classification of the 11q/j isolate. However, the 11q/j sequence in a 505 bp-long region at the TP0488 locus was similar to Treponema pallidum subsp. pallidum (TPA strains, but not to TEN Bosnia A and Iraq B sequences, suggesting a recombination event at this locus. Similarly, the 11q/j sequence in a 613 bp-long region at the TP0548 locus was similar to Treponema pallidum subsp. pertenue (TPE strains, but not to TEN sequences.A detailed analysis of two recombinant loci found in the 11q/j clinical isolate revealed that the recombination event occurred just once, in the TP0488, with the donor sequence originating from a TPA strain. Since TEN Bosnia A and Iraq B were found to contain TPA-like sequences at the TP0548 locus, the recombination at TP0548 took place in a treponeme that was an ancestor to both TEN Bosnia A and Iraq B. The sequence of 11q/j isolate in TP0548 represents an ancestral TEN sequence that is similar to yaws-causing treponemes. In addition to the importance of the 11q/j isolate for reconstruction of the TEN phylogeny, this case emphasizes the possible role of TEN strains in development of syphilis-like lesions.

  16. A putative carbohydrate-binding domain of the lactose-binding Cytisus sessilifolius anti-H(O) lectin has a similar amino acid sequence to that of the L-fucose-binding Ulex europaeus anti-H(O) lectin.

    Science.gov (United States)

    Konami, Y; Yamamoto, K; Osawa, T; Irimura, T

    1995-04-01

    The complete amino acid sequence of a lactose-binding Cytisus sessilifolius anti-H(O) lectin II (CSA-II) was determined using a protein sequencer. After digestion of CSA-II with endoproteinase Lys-C or Asp-N, the resulting peptides were purified by reversed-phase high performance liquid chromatography (HPLC) and then subjected to sequence analysis. Comparison of the complete amino acid sequence of CSA-II with the sequences of other leguminous seed lectins revealed regions of extensive homology. The amino acid sequence of a putative carbohydrate-binding domain of CSA-II was found to be similar to those of several anti-H(O) leguminous lectins, especially to that of the L-fucose-binding Ulex europaeus lectin I (UEA-I).

  17. Penicillin: the medicine with the greatest impact on therapeutic outcomes.

    Science.gov (United States)

    Kardos, Nelson; Demain, Arnold L

    2011-11-01

    The principal point of this paper is that the discovery of penicillin and the development of the supporting technologies in microbiology and chemical engineering leading to its commercial scale production represent it as the medicine with the greatest impact on therapeutic outcomes. Our nomination of penicillin for the top therapeutic molecule rests on two lines of evidence concerning the impact of this event: (1) the magnitude of the therapeutic outcomes resulting from the clinical application of penicillin and the subsequent widespread use of antibiotics and (2) the technologies developed for production of penicillin, including both microbial strain selection and improvement plus chemical engineering methods responsible for successful submerged fermentation production. These became the basis for production of all subsequent antibiotics in use today. These same technologies became the model for the development and production of new types of bioproducts (i.e., anticancer agents, monoclonal antibodies, and industrial enzymes). The clinical impact of penicillin was large and immediate. By ushering in the widespread clinical use of antibiotics, penicillin was responsible for enabling the control of many infectious diseases that had previously burdened mankind, with subsequent impact on global population demographics. Moreover, the large cumulative public effect of the many new antibiotics and new bioproducts that were developed and commercialized on the basis of the science and technology after penicillin demonstrates that penicillin had the greatest therapeutic impact event of all times. © Springer-Verlag 2011

  18. Greatest Happiness Principle in a Complex System Approach

    Directory of Open Access Journals (Sweden)

    Katalin Martinás

    2012-06-01

    Full Text Available The principle of greatest happiness was the basis of ethics in Plato’s and Aristotle’s work, it served as the basis of utility principle in economics, and the happiness research has become a hot topic in social sciences in Western countries in particular in economics recently. Nevertheless there is a considerable scientific pessimism over whether it is even possible to affect sustainable increases in happiness.In this paper we outline an economic theory of decision based on the greatest happiness principle (GHP. Modern equilibrium economics is a simple system simplification of the GHP, the complex approach outlines a non-equilibrium economic theory. The comparison of the approaches reveals the fact that the part of the results – laws of modern economics – follow from the simplifications and they are against the economic nature. The most important consequence is that within the free market economy one cannot be sure that the path found by it leads to a beneficial economic system.

  19. Effects of circulating member B of the family with sequence similarity 3 on the risk of developing metabolic syndrome and its components: A 5-year prospective study.

    Science.gov (United States)

    Wang, Haoyu; Yu, Fadong; Zhang, Zhuo; Hou, Yuanyuan; Teng, Weiping; Shan, Zhongyan; Lai, Yaxin

    2017-11-27

    Member B of the family with sequence similarity 3 (FAM3B), also known as pancreatic-derived factor, is mainly synthesized and secreted by islet β-cells, and plays a role in abnormal metabolism of glucose and lipids. However, the prospective association of FAM3B with metabolic disorders remains unclear. The present study aimed to reveal the predictive relationship between pancreas-specific cytokine and metabolic syndrome (MetS). A total of 210 adults (88 men and 122 women) without MetS, aged between 40 and 65 years, were recruited and received a comprehensive health examination. Baseline serum FAM3B levels were determined by sandwich enzyme-linked immunosorbent assay. Subsequently, all participants underwent a follow-up examination after 5 years. MetS was identified in accordance with the International Diabetes Federation criteria. During follow up, 35.7% participants developed MetS. In comparison with the non-MetS group, participants with MetS had an increased serum FAM3B at baseline (21.85 ng/mL [19.38, 24.17 ng/mL] vs 28.56 ng/mL [25.32, 38.10 ng/mL], P < 0.001). Moreover, serum FAM3B was significantly associated with variations in fasting plasma insulin (r = -0.306, P < 0.001), homeostasis model assessment of β-cell function (r = -0.328, P < 0.001) and homeostasis model assessment of insulin resistance (r = -0.191, P = 0.006). Furthermore, a positive correlation between baseline FAM3B and the incidence of MetS was observed, even after multivariable adjustment (relative risk 1.23 [1.15, 1.31], P < 0.001). Furthermore, the optimal cut-off values of FAM3B was 23.98 ng/mL for predicting MetS based on the Youden Index. Elevated circulating FAM3B might be considered as a predictor of newly-onset MetS and its progression. © 2017 The Authors. Journal of Diabetes Investigation published by Asian Association for the Study of Diabetes (AASD) and John Wiley & Sons Australia, Ltd.

  20. Masses of galaxies and the greatest redshifts of quasars

    Energy Technology Data Exchange (ETDEWEB)

    Hills, J G [Illinois Univ., Urbana (USA)

    1977-04-01

    The outer parts of a typical galaxy follows an R/sup -2/ density distribution which results in the collapse time of its protogalaxy being proportional to its mass. Since quasars probably occur in the nuclei of galaxies which can only form after the collapse of their parent galaxies, their greatest observed redshift, Zsub(max), is largely determined by the mass, Msub(t), of a typical protogalaxy. The observed Zsub(max) of quasars indicates that Msub(t) = 1 x 10/sup 12/ solar masses. This mass is consistent with the masses of galaxies found in recent dynamical studies. It indicates that most of the mass in a typical galaxy is in the halo lying beyond the familiar optically-bright core, but the mass of a standard galaxy is still only 0.3 of that required for galaxies alone to close the universe.

  1. Sequence similarity between the erythrocyte binding domain 1 of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals binding residues for the Duffy Antigen Receptor for Chemokines

    OpenAIRE

    Bolton, Michael J; Garry, Robert F

    2011-01-01

    Abstract Background The surface glycoprotein (SU, gp120) of the human immunodeficiency virus (HIV) must bind to a chemokine receptor, CCR5 or CXCR4, to invade CD4+ cells. Plasmodium vivax uses the Duffy Binding Protein (DBP) to bind the Duffy Antigen Receptor for Chemokines (DARC) and invade reticulocytes. Results Variable loop 3 (V3) of HIV-1 SU and domain 1 of the Plasmodium vivax DBP share a sequence similarity. The site of amino acid sequence similarity was necessary, but not sufficient, ...

  2. Fusion protein gene nucleotide sequence similarities, shared antigenic sites and phylogenetic analysis suggest that phocid distemper virus 2 and canine distemper virus belong to the same virus entity.

    NARCIS (Netherlands)

    I.K.G. Visser (Ilona); R.W.J. van der Heijden (Roger); M.W.G. van de Bildt (Marco); M.J.H. Kenter (Marcel); C. Örvell; A.D.M.E. Osterhaus (Albert)

    1993-01-01

    textabstractNucleotide sequencing of the fusion protein (F) gene of phocid distemper virus-2 (PDV-2), recently isolated from Baikal seals (Phoca sibirica), revealed an open reading frame (nucleotides 84 to 2075) with two potential in-frame ATG translation initiation codons. We suggest that the

  3. Intraspecific sequence comparisons reveal similar rates of non-collinear gene insertion in the B and D genomes of bread wheat

    Czech Academy of Sciences Publication Activity Database

    Bartoš, Jan; Vlček, Čestmír; Choulet, F.; Džunková, Mária; Cviková, Kateřina; Šafář, Jan; Šimková, Hana; Pačes, Jan; Strnad, Hynek; Sourdille, P.; Berges, H.; Cattonaro, F.; Feuillet, C.; Doležel, Jaroslav

    2012-01-01

    Roč. 12, č. 155 (2012), s. 1-10 ISSN 1471-2229 R&D Projects: GA ČR GAP501/10/1778 Grant - others:GA MŠk(CZ) ED0007/01/01 Program:ED Institutional research plan: CEZ:AV0Z50380511; CEZ:AV0Z50520514 Keywords : Wheat * BAC sequencing * Homoeologous genomes Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 4.354, year: 2012

  4. Sequence similarity between the cp gene and the transgene in transgenic papayas = Similaridade de seqüência entre o gene cp do vírus e do transgene presente em mamoeiros transgênicos

    NARCIS (Netherlands)

    Souza, M.T.; Teixeira, M.; Gonsalves, D.

    2005-01-01

    The Papaya ringspot virus (PRSV) coat protein transgene present in 'Rainbow' and 'SunUp' papayas disclose high sequence similarity (>89%) to the cp gene from PRSV BR and TH. Despite this, both isolates are able to break down the resistance in 'Rainbow', while only the latter is able to do so in

  5. Judgments of brand similarity

    NARCIS (Netherlands)

    Bijmolt, THA; Wedel, M; Pieters, RGM; DeSarbo, WS

    This paper provides empirical insight into the way consumers make pairwise similarity judgments between brands, and how familiarity with the brands, serial position of the pair in a sequence, and the presentation format affect these judgments. Within the similarity judgment process both the

  6. Whole-exome sequencing identified a homozygous FNBP4 mutation in a family with a condition similar to microphthalmia with limb anomalies.

    Science.gov (United States)

    Kondo, Yukiko; Koshimizu, Eriko; Megarbane, Andre; Hamanoue, Haruka; Okada, Ippei; Nishiyama, Kiyomi; Kodera, Hirofumi; Miyatake, Satoko; Tsurusaki, Yoshinori; Nakashima, Mitsuko; Doi, Hiroshi; Miyake, Noriko; Saitsu, Hirotomo; Matsumoto, Naomichi

    2013-07-01

    Microphthalmia with limb anomalies (MLA), also known as Waardenburg anophthalmia syndrome or ophthalmoacromelic syndrome, is a rare autosomal recessive disorder. Recently, we and others successfully identified SMOC1 as the causative gene for MLA. However, there are several MLA families without SMOC1 abnormality, suggesting locus heterogeneity in MLA. We aimed to identify a pathogenic mutation in one Lebanese family having an MLA-like condition without SMOC1 mutation by whole-exome sequencing (WES) combined with homozygosity mapping. A c.683C>T (p.Thr228Met) in FNBP4 was found as a primary candidate, drawing the attention that FNBP4 and SMOC1 may potentially modulate BMP signaling. Copyright © 2013 Wiley Periodicals, Inc.

  7. The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction.

    Science.gov (United States)

    Li, Hongjian; Peng, Jiangjun; Leung, Yee; Leung, Kwong-Sak; Wong, Man-Hon; Lu, Gang; Ballester, Pedro J

    2018-03-14

    It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future.

  8. Sequence similarity between the erythrocyte binding domain of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals a functional heparin binding motif involved in binding to the Duffy antigen receptor for chemokines

    OpenAIRE

    Bolton, Michael J; Garry, Robert F

    2011-01-01

    Abstract Background The HIV surface glycoprotein gp120 (SU, gp120) and the Plasmodium vivax Duffy binding protein (PvDBP) bind to chemokine receptors during infection and have a site of amino acid sequence similarity in their binding domains that often includes a heparin binding motif (HBM). Infection by either pathogen has been found to be inhibited by polyanions. Results Specific polyanions that inhibit HIV infection and bind to the V3 loop of X4 strains also inhibited DBP-mediated infectio...

  9. Sequence similarity between the erythrocyte binding domain 1 of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals binding residues for the Duffy Antigen Receptor for Chemokines

    Directory of Open Access Journals (Sweden)

    Garry Robert F

    2011-01-01

    Full Text Available Abstract Background The surface glycoprotein (SU, gp120 of the human immunodeficiency virus (HIV must bind to a chemokine receptor, CCR5 or CXCR4, to invade CD4+ cells. Plasmodium vivax uses the Duffy Binding Protein (DBP to bind the Duffy Antigen Receptor for Chemokines (DARC and invade reticulocytes. Results Variable loop 3 (V3 of HIV-1 SU and domain 1 of the Plasmodium vivax DBP share a sequence similarity. The site of amino acid sequence similarity was necessary, but not sufficient, for DARC binding and contained a consensus heparin binding site essential for DARC binding. Both HIV-1 and P. vivax can be blocked from binding to their chemokine receptors by the chemokine, RANTES and its analog AOP-RANTES. Site directed mutagenesis of the heparin binding motif in members of the DBP family, the P. knowlesi alpha, beta and gamma proteins abrogated their binding to erythrocytes. Positively charged residues within domain 1 are required for binding of P. vivax and P. knowlesi erythrocyte binding proteins. Conclusion A heparin binding site motif in members of the DBP family may form part of a conserved erythrocyte receptor binding pocket.

  10. Sequence similarity between the erythrocyte binding domain of the Plasmodium vivax Duffy binding protein and the V3 loop of HIV-1 strain MN reveals a functional heparin binding motif involved in binding to the Duffy antigen receptor for chemokines

    Directory of Open Access Journals (Sweden)

    Bolton Michael J

    2011-11-01

    Full Text Available Abstract Background The HIV surface glycoprotein gp120 (SU, gp120 and the Plasmodium vivax Duffy binding protein (PvDBP bind to chemokine receptors during infection and have a site of amino acid sequence similarity in their binding domains that often includes a heparin binding motif (HBM. Infection by either pathogen has been found to be inhibited by polyanions. Results Specific polyanions that inhibit HIV infection and bind to the V3 loop of X4 strains also inhibited DBP-mediated infection of erythrocytes and DBP binding to the Duffy Antigen Receptor for Chemokines (DARC. A peptide including the HBM of PvDBP had similar affinity for heparin as RANTES and V3 loop peptides, and could be specifically inhibited from heparin binding by the same polyanions that inhibit DBP binding to DARC. However, some V3 peptides can competitively inhibit RANTES binding to heparin, but not the PvDBP HBM peptide. Three other members of the DBP family have an HBM sequence that is necessary for erythrocyte binding, however only the protein which binds to DARC, the P. knowlesi alpha protein, is inhibited by heparin from binding to erythrocytes. Heparitinase digestion does not affect the binding of DBP to erythrocytes. Conclusion The HBMs of DBPs that bind to DARC have similar heparin binding affinities as some V3 loop peptides and chemokines, are responsible for specific sulfated polysaccharide inhibition of parasite binding and invasion of red blood cells, and are more likely to bind to negative charges on the receptor than cell surface glycosaminoglycans.

  11. Sequence similarity between the viral cp gene and the transgene in transgenic papayas Similaridade de seqüência entre o gene cp do vírus e do transgene presente em mamoeiros transgênicos

    Directory of Open Access Journals (Sweden)

    Manoel Teixeira Souza Júnior

    2005-05-01

    Full Text Available The Papaya ringspot virus (PRSV coat protein transgene present in 'Rainbow' and 'SunUp' papayas disclose high sequence similarity (>89% to the cp gene from PRSV BR and TH. Despite this, both isolates are able to break down the resistance in 'Rainbow', while only the latter is able to do so in 'SunUp'. The objective of this work was to evaluate the degree of sequence similarity between the cp gene in the challenge isolate and the cp transgene in transgenic papayas resistant to PRSV. The production of a hybrid virus containing the genome backbone of PRSV HA up to the Apa I site in the NIb gene, and downstream from there, the sequence of PRSV TH was undertaken. This hybrid virus, PRSV HA/TH, was obtained and used to challenge 'Rainbow', 'SunUp', and an R2 population derived from line 63-1, all resistant to PRSV HA. PRSV HA/TH broke down the resistance in both papaya varieties and in the 63-1 population, demonstrating that sequence similarity is a major factor in the mechanism of resistance used by transgenic papayas expressing the cp gene. A comparative analysis of the cp gene present in line 55-1 and 63-1-derived transgenic plants and in PRSV HA, BR, and TH was also performed.O gene da capa protéica (cp do vírus da mancha anelar do mamoeiro (Papaya ringspot virus, PRSV, presente nos mamoeiros 'Rainbow' e 'SunUp', tem alta similaridade de seqüência (>89% com o gene cp dos isolados PRSV BR e TH. Apesar deste alto grau de similaridade, ambos isolados são capazes de quebrar a resistência observada em 'Rainbow', ao passo que TH quebra a resistência em 'SunUp'. O objetivo deste trabalho foi avaliar o grau de similaridade de seqüência entre o gene cp do vírus desafiante e do transgene em mamoeiros transgênicos resistentes a PRSV. Produziu-se um vírus híbrido contendo o genoma do isolado PRSV HA até o sítio de restrição Apa I no gene NIb, e, a partir deste ponto, este vírus continha o genoma do isolado PRSV TH. PRSV HA/TH foi utilizado

  12. The greatest hydroelectric power plant in the world. Itaipu Hydroelectric Power Plant

    International Nuclear Information System (INIS)

    Andonov - Chento, Ilija

    2004-01-01

    Details to demonstrate the size and engineering achievements of one of the world's greatest hydroelectric power plant are given. Principal technical features of construction and operation of the Itaipu Dam are tabulated and discussed

  13. Family with sequence similarity 83, member B is a predictor of poor prognosis and a potential therapeutic target for lung adenocarcinoma expressing wild-type epidermal growth factor receptor.

    Science.gov (United States)

    Yamaura, Takumi; Ezaki, Junji; Okabe, Naoyuki; Takagi, Hironori; Ozaki, Yuki; Inoue, Takuya; Watanabe, Yuzuru; Fukuhara, Mitsuro; Muto, Satoshi; Matsumura, Yuki; Hasegawa, Takeo; Hoshino, Mika; Osugi, Jun; Shio, Yutaka; Waguri, Satoshi; Tamura, Hirosumi; Imai, Jun-Ichi; Ito, Emi; Yanagisawa, Yuka; Honma, Reiko; Watanabe, Shinya; Suzuki, Hiroyuki

    2018-02-01

    Lung adenocarcinoma (ADC) patients with tumors that harbor no targetable driver gene mutation, such as epidermal growth factor receptor ( EGFR ) gene mutations, have unfavorable prognosis, and thus, novel therapeutic targets are required. Family with sequence similarity 83, member B ( FAM83B ) is a biomarker for squamous cell lung cancer. FAM83B has also recently been shown to serve an important role in the EGFR signaling pathway. In the present study, the molecular and clinical impact of FAM83B in lung ADC was investigated. Matched tumor and adjacent normal tissue samples were obtained from 216 patients who underwent complete lung resection for primary lung ADC and were examined for FAM83B expression using cDNA microarray analysis. The associations between FAM83B expression and clinicopathological parameters, including patient survival, were examined. FAM83B was highly expressed in tumors from males, smokers and in tumors with wild-type EGFR . Multivariate analyses further confirmed that wild-type EGFR tumors were significantly positively associated with FAM83B expression. In survival analysis, FAM83B expression was associated with poor outcomes in disease-free survival and overall survival, particularly when stratified against tumors with wild-type EGFR . Furthermore, FAM83B knockdown was performed to investigate its phenotypic effect on lung ADC cell lines. Gene silencing by FAM83B RNA interference induced growth suppression in the HLC-1 and H1975 lung ADC cell lines. FAM83B may be involved in lung ADC tumor proliferation and can be a predictor of poor survival. FAM83B is also a potential novel therapeutic target for ADC with wild-type EGFR .

  14. School Issues Under [Section] 504 and the ADA: The Latest and Greatest.

    Science.gov (United States)

    Aleman, Steven R.

    This paper highlights recent guidance and rulings from the Office of Civil Rights (OCR) of interest to administrators, advocates, and attorneys. It is a companion piece to Student Issues on SectionNB504/ADA: The Latest and Greatest. Compliance with SectionNB504 and the Americans with Disabilities Act (ADA) continues to involve debate and dialog on…

  15. Stigma and Discrimination in HIV/AIDS; The greatest Challenge to ...

    African Journals Online (AJOL)

    The greatest challenge to the efforts of the various agencies and governments in the care, support and treatment of people living with HIV/AIDS, appears to be stigma and discrimination. Stigma and discrimination has to be addressed through public education, legislation to protect people living with HIV/AIDS and also by ...

  16. FedWeb Greatest Hits: Presenting the New Test Collection for Federated Web Search

    NARCIS (Netherlands)

    Demeester, Thomas; Trieschnigg, Rudolf Berend; Zhou, Ke; Nguyen, Dong-Phuong; Hiemstra, Djoerd

    This paper presents 'FedWeb Greatest Hits', a large new test collection for research in web information retrieval. As a combination and extension of the datasets used in the TREC Federated Web Search Track, this collection opens up new research possibilities on federated web search challenges, as

  17. Domain similarity based orthology detection.

    Science.gov (United States)

    Bitard-Feildel, Tristan; Kemena, Carsten; Greenwood, Jenny M; Bornberg-Bauer, Erich

    2015-05-13

    Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationally feasible in a reasonable amount of time. We propose to speed up the detection of orthologous proteins by using strings of domains to characterize the proteins. We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain content similarity, and new software, named porthoDom. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain content similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDom, the wrapper developed for proteinortho. porthoDom makes use of domain content similarity measures to group proteins together before searching for orthologs. By using domains instead of amino acid sequences, the reduction of the search space decreases the computational complexity of an all-against-all sequence comparison. We demonstrate that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. porthoDom has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho. The implementation of porthoDom is released using python and C++ languages and is available under the GNU GPL licence 3 at http://www.bornberglab.org/pages/porthoda .

  18. Greatest Happiness Principle in a Complex System: Maximisation versus Driving Force

    Directory of Open Access Journals (Sweden)

    Katalin Martinás

    2012-06-01

    Full Text Available From philosophical point of view, micro-founded economic theories depart from the principle of the pursuit of the greatest happiness. From mathematical point of view, micro-founded economic theories depart from the utility maximisation program. Though economists are aware of the serious limitations of the equilibrium analysis, they remain in that framework. We show that the maximisation principle, which implies the equilibrium hypothesis, is responsible for this impasse. We formalise the pursuit of the greatest happiness principle by the help of the driving force postulate: the volumes of activities depend on the expected wealth increase. In that case we can get rid of the equilibrium hypothesis and have new insights into economic theory. For example, in what extent standard economic results depend on the equilibrium hypothesis?

  19. Social Media - DoD’s Greatest Information Sharing Tool or Weakest Security Link?

    Science.gov (United States)

    2010-04-15

    or position of the Department of the Army, Department of Defense, or the U.S. Government. SOCIAL MEDIA – DOD’S GREATEST INFORMATION SHARING TOOL...appropriateness and effectiveness of these policies in securing the information network. 15. SUBJECT TERMS Social media , information...TYPE Civilian Research Paper 3. DATES COVERED (From - To) August 2009-April 2010 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Social Media

  20. The conditions for attaining the greatest degree of system stability with strict generator excitation control

    Energy Technology Data Exchange (ETDEWEB)

    Gruzdev, I.A.; Ekimova, M.M.; Truspekova, G.A.

    1982-01-01

    Expressions are derived for an idealized model of a complex electric power system; these expressions define the greatest level of stability of an electric power system and the optimum combination of stabilization factors with automatic excitation control in a single power system. The possibility of increasing the level of stability of an electric power system with simultaneous strict automatic excitation control of the synychronous generators in several power systems is analyzed.

  1. Asymptotics for the greatest zeros of solutions of a particular O.D.E.

    Directory of Open Access Journals (Sweden)

    Silvia Noschese

    1994-05-01

    Full Text Available This paper deals with the Liouville-Stekeloff method for approximating solutions of homogeneous linear ODE and a general result due to Tricomi which provides estimates for the zeros of functions by means of the knowledge of an asymptotic representation. From the classical tools we deduce information about the asymptotics of the greatest zeros of a class of solutions of a particular ODE, including the classical Hermite polynomials.

  2. Molecular characterisation and similarity relationships among iranian basil (Ocimum basilicum L. accessions using inter simple sequence repeat markers Caracterização molecular de acessos de Ocimum basilicum L. por meio de marcadores ISSR

    Directory of Open Access Journals (Sweden)

    Mohammad Aghaei

    2012-06-01

    Full Text Available The study of genetic relationships is a prerequisite for plant breeding activities as well as for conservation of genetic resources. In the present study, genetic diversity among 50 Iranian basil (Ocimum basilicum L. accessions was determined using inter simple sequence repeat (ISSR markers. Thirty-eight alleles were generated at 12 ISSR loci. The number of alleles per locus ranged from 1 to 5 with an average of 3.17. The maximum number of alleles was observed at the A7, 818, 825 and 849 loci, and their size ranged from 300 to 2500 bp. A similarity matrix based on Jaccard's coefficient for all 50 basil accessions gave values from 1.00-0.60. The maximum similarity (1.00 was observed between the "Urmia" and "Shahr-e-Rey II" accessions as well as between the "Urmia" and "Qazvin II" accessions. The lowest similarity (0.60 was observed between the "Tuyserkan I" and "Gom II" accessions. The unweighted pair- group method using arithmetique average UPGMA clustering algorithm classified the studied accessions into three distinct groups. All of the basil accessions, with the exception of "Babol III", "Ahvaz II", "Yazd II" and "Ardebil I", were placed in groups I and II. Leaf colour was a specific characteristic that influenced the clustering of Iranian basil accessions. Because of this relationship, the results of the principal coordinate analysis (PCoA approximately corresponded to those obtained through cluster analysis. Our results revealed that the geographical distribution of genotypes could not be used as a basis for crossing parents to obtain high heterosis, and therefore, it must be carried out by genetic studies.O estudo das relações genéticas é um pré-requisito para atividades em reprodução de plantas assim como para conservação de recursos genéticos. Neste trabalho a diversidade genética entre 50 acessos de Manejericão Iraniano (Ocimum basilicum L. foram determinadas usando marcadores de Seqüência Simples Repetida Interna (ISSR

  3. Northeast and Midwest regional species and habitats at greatest risk and most vulnerable to climate impacts

    Science.gov (United States)

    Staudinger, Michelle D.; Hilberg, Laura; Janowiak, Maria; Swanton, C.O.

    2016-01-01

    The objectives of this Chapter are to describe climate change vulnerability, it’s components, the range of assessment methods being implemented regionally, and examples of training resources and tools. Climate Change Vulnerability Assessments (CCVAs) have already been conducted for numerous Regional Species of Greatest Conservation Need and their dependent 5 habitats across the Northeast and Midwest. This chapter provides a synthesis of different assessment frameworks, information on the locations (e.g., States) where vulnerability assessments were conducted, lists of individual species and habitats with their respective vulnerability rankings, and a comparison of how vulnerability rankings were determined among studies.

  4. When the most potent combination of antibiotics selects for the greatest bacterial load: the smile-frown transition.

    Directory of Open Access Journals (Sweden)

    Rafael Pena-Miller

    Full Text Available Conventional wisdom holds that the best way to treat infection with antibiotics is to 'hit early and hit hard'. A favoured strategy is to deploy two antibiotics that produce a stronger effect in combination than if either drug were used alone. But are such synergistic combinations necessarily optimal? We combine mathematical modelling, evolution experiments, whole genome sequencing and genetic manipulation of a resistance mechanism to demonstrate that deploying synergistic antibiotics can, in practice, be the worst strategy if bacterial clearance is not achieved after the first treatment phase. As treatment proceeds, it is only to be expected that the strength of antibiotic synergy will diminish as the frequency of drug-resistant bacteria increases. Indeed, antibiotic efficacy decays exponentially in our five-day evolution experiments. However, as the theory of competitive release predicts, drug-resistant bacteria replicate fastest when their drug-susceptible competitors are eliminated by overly-aggressive treatment. Here, synergy exerts such strong selection for resistance that an antagonism consistently emerges by day 1 and the initially most aggressive treatment produces the greatest bacterial load, a fortiori greater than if just one drug were given. Whole genome sequencing reveals that such rapid evolution is the result of the amplification of a genomic region containing four drug-resistance mechanisms, including the acrAB efflux operon. When this operon is deleted in genetically manipulated mutants and the evolution experiment repeated, antagonism fails to emerge in five days and antibiotic synergy is maintained for longer. We therefore conclude that unless super-inhibitory doses are achieved and maintained until the pathogen is successfully cleared, synergistic antibiotics can have the opposite effect to that intended by helping to increase pathogen load where, and when, the drugs are found at sub-inhibitory concentrations.

  5. When the most potent combination of antibiotics selects for the greatest bacterial load: the smile-frown transition.

    Science.gov (United States)

    Pena-Miller, Rafael; Laehnemann, David; Jansen, Gunther; Fuentes-Hernandez, Ayari; Rosenstiel, Philip; Schulenburg, Hinrich; Beardmore, Robert

    2013-01-01

    Conventional wisdom holds that the best way to treat infection with antibiotics is to 'hit early and hit hard'. A favoured strategy is to deploy two antibiotics that produce a stronger effect in combination than if either drug were used alone. But are such synergistic combinations necessarily optimal? We combine mathematical modelling, evolution experiments, whole genome sequencing and genetic manipulation of a resistance mechanism to demonstrate that deploying synergistic antibiotics can, in practice, be the worst strategy if bacterial clearance is not achieved after the first treatment phase. As treatment proceeds, it is only to be expected that the strength of antibiotic synergy will diminish as the frequency of drug-resistant bacteria increases. Indeed, antibiotic efficacy decays exponentially in our five-day evolution experiments. However, as the theory of competitive release predicts, drug-resistant bacteria replicate fastest when their drug-susceptible competitors are eliminated by overly-aggressive treatment. Here, synergy exerts such strong selection for resistance that an antagonism consistently emerges by day 1 and the initially most aggressive treatment produces the greatest bacterial load, a fortiori greater than if just one drug were given. Whole genome sequencing reveals that such rapid evolution is the result of the amplification of a genomic region containing four drug-resistance mechanisms, including the acrAB efflux operon. When this operon is deleted in genetically manipulated mutants and the evolution experiment repeated, antagonism fails to emerge in five days and antibiotic synergy is maintained for longer. We therefore conclude that unless super-inhibitory doses are achieved and maintained until the pathogen is successfully cleared, synergistic antibiotics can have the opposite effect to that intended by helping to increase pathogen load where, and when, the drugs are found at sub-inhibitory concentrations.

  6. Coping and acceptance: the greatest challenge for veterans with intestinal stomas.

    Science.gov (United States)

    Krouse, Robert S; Grant, Marcia; Rawl, Susan M; Mohler, M Jane; Baldwin, Carol M; Coons, Stephen Joel; McCorkle, Ruth; Schmidt, C Max; Ko, Clifford Y

    2009-03-01

    Intestinal stomas (ostomies) create challenges for veterans. The goal of this qualitative analysis was to understand better patients' perspectives regarding their greatest challenge. Ostomates at three Veterans Affairs locations were surveyed using the modified City of Hope Quality of Life-Ostomy questionnaire that contained an open-ended request for respondents to describe their greatest challenge. The response rate was 51% (239 of 467); 68% (163 of 239) completed the open-ended item. Content analysis was performed by an experienced qualitative research team. Coping and acceptance were the most commonly addressed themes. The most frequently expressed issues and advice were related to a need for positive thinking and insight regarding adjustment over time. Coping strategies included the use of humor, recognition of positive changes resulting from the stoma, and normalization of life with an ostomy. Coping and acceptance are common themes described by veterans with an intestinal stoma. Health-care providers can assist veterans by utilizing ostomate self-management strategies, experience, and advice.

  7. The greatest challenges reported by long-term colorectal cancer survivors with stomas.

    Science.gov (United States)

    McMullen, Carmit K; Hornbrook, Mark C; Grant, Marcia; Baldwin, Carol M; Wendel, Christopher S; Mohler, M Jane; Altschuler, Andrea; Ramirez, Michelle; Krouse, Robert S

    2008-04-01

    This paper presents a qualitative analysis of the greatest challenges reported by long-term colorectal cancer survivors with ostomies. Surveys that included an open-ended question about challenges of living with an ostomy were administered at three Kaiser Permanente regions: Northern California, Northwest, and Hawaii. The study was coordinated at the Southern Arizona Veterans Affairs Health Care System in Tucson. The City of Hope Quality of Life Model for Ostomy Patients provided a framework for the study's design, measures, data collection, and data analysis. The study's findings may be generalized broadly to community settings across the United States. Results replicate those of previous research among veterans, California members of the United Ostomy Association, Koreans with ostomies, and colorectal cancer survivors with ostomies residing in the United Kingdom. The greatest challenges reported by 178 colorectal cancer survivors with ostomies confirmed the Institute of Medicine's findings that survivorship is a distinct, chronic phase of cancer care and that cancer's effects are broad and pervasive. The challenges reported by study participants should inform the design, testing and integration of targeted education, early interventions, and ongoing support services for colorectal cancer patients with ostomies.

  8. Multimodal sequence learning.

    Science.gov (United States)

    Kemény, Ferenc; Meier, Beat

    2016-02-01

    While sequence learning research models complex phenomena, previous studies have mostly focused on unimodal sequences. The goal of the current experiment is to put implicit sequence learning into a multimodal context: to test whether it can operate across different modalities. We used the Task Sequence Learning paradigm to test whether sequence learning varies across modalities, and whether participants are able to learn multimodal sequences. Our results show that implicit sequence learning is very similar regardless of the source modality. However, the presence of correlated task and response sequences was required for learning to take place. The experiment provides new evidence for implicit sequence learning of abstract conceptual representations. In general, the results suggest that correlated sequences are necessary for implicit sequence learning to occur. Moreover, they show that elements from different modalities can be automatically integrated into one unitary multimodal sequence. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Reducing mortality from childhood pneumonia: The leading priority is also the greatest opportunity

    Directory of Open Access Journals (Sweden)

    Igor Rudan

    2013-06-01

    Full Text Available Pneumonia and diarrhoea have been the leading causes of global child mortality for many decades. The work of Child Health Epidemiology Reference Group (CHERG has been pivotal in raising awareness that the UN's Millennium Development Goal 4 cannot be achieved without increased focus on preventing and treating the two diseases in low– and middle–income countries. Global Action Plan for Pneumonia (GAPP and Diarrhoea Global Action Plan (DGAP groups recently concluded that addressing childhood pneumonia and diarrhoea is not only the leading priority but also the greatest opportunity in global health today: scaling up of existing highly cost–effective interventions could prevent 95% of diarrhoea deaths and 67% of pneumonia deaths in children younger than 5 years by the year 2025. The cost of such effort was estimated at about US$ 6.7 billion.

  10. Current Global Pricing For Human Papillomavirus Vaccines Brings The Greatest Economic Benefits To Rich Countries.

    Science.gov (United States)

    Herlihy, Niamh; Hutubessy, Raymond; Jit, Mark

    2016-02-01

    Vaccinating females against human papillomavirus (HPV) prior to the debut of sexual activity is an effective way to prevent cervical cancer, yet vaccine uptake in low- and middle-income countries has been hindered by high vaccine prices. We created an economic model to estimate the distribution of the economic surplus-the sum of all health and economic benefits of a vaccine, minus the costs of development, production, and distribution-among different country income groups and manufacturers for a cohort of twelve-year-old females in 2012. We found that manufacturers may have received economic returns worth five times their original investment in HPV vaccine development. High-income countries gained the greatest economic surplus of any income category, realizing over five times more economic value per vaccinated female than low-income countries did. Subsidizing vaccine prices in low- and middle-income countries could both reduce financial barriers to vaccine adoption and still allow high-income countries to retain their economic surpluses and manufacturers to retain their profits. Project HOPE—The People-to-People Health Foundation, Inc.

  11. MreB filaments align along greatest principal membrane curvature to orient cell wall synthesis

    Science.gov (United States)

    Szwedziak, Piotr; Wong, Felix; Schaefer, Kaitlin; Izoré, Thierry; Renner, Lars D; Holmes, Matthew J; Sun, Yingjie; Bisson-Filho, Alexandre W; Walker, Suzanne; Amir, Ariel; Löwe, Jan

    2018-01-01

    MreB is essential for rod shape in many bacteria. Membrane-associated MreB filaments move around the rod circumference, helping to insert cell wall in the radial direction to reinforce rod shape. To understand how oriented MreB motion arises, we altered the shape of Bacillus subtilis. MreB motion is isotropic in round cells, and orientation is restored when rod shape is externally imposed. Stationary filaments orient within protoplasts, and purified MreB tubulates liposomes in vitro, orienting within tubes. Together, this demonstrates MreB orients along the greatest principal membrane curvature, a conclusion supported with biophysical modeling. We observed that spherical cells regenerate into rods in a local, self-reinforcing manner: rapidly propagating rods emerge from small bulges, exhibiting oriented MreB motion. We propose that the coupling of MreB filament alignment to shape-reinforcing peptidoglycan synthesis creates a locally-acting, self-organizing mechanism allowing the rapid establishment and stable maintenance of emergent rod shape. PMID:29469806

  12. Covering women's greatest health fear: breast cancer information in consumer magazines.

    Science.gov (United States)

    Walsh-Childers, Kim; Edwards, Heather; Grobmyer, Stephen

    2011-04-01

    Women identify consumer magazines as a key source of information on many health topics, including breast cancer, which continues to rank as women's greatest personal health fear. This study examined the comprehensiveness and accuracy of breast cancer information provided in 555 articles published in 17 consumer magazines from 2002 through 2007. Accuracy of information was determined for 33 key breast cancer facts identified by an expert panel as important information for women to know. The results show that only 7 of 33 key facts were mentioned in at least 5% of the articles. These facts all dealt with breast cancer risk factors, screening, and detection; none of the key facts related to treatment or outcomes appeared in at least 5% of the articles. Other topics (not key facts) mentioned centered around controllable risk factors, support for breast cancer patients, and chemotherapy treatment. The majority of mentions of key facts were coded as fully accurate, although as much as 44% of mentions of some topics (the link between hormone replacement therapy and breast cancer) were coded as inaccurate or only partially accurate. The magazines were most likely to emphasize family history of breast cancer or genetic characteristics as risk factors for breast cancers; family history was twice as likely to be discussed as increasing age, which is in fact the most important risk factor for breast cancer other than being female. Magazine coverage may contribute to women's inaccurate perceptions of their breast cancer risk.

  13. New Similarity Functions

    DEFF Research Database (Denmark)

    Yazdani, Hossein; Ortiz-Arroyo, Daniel; Kwasnicka, Halina

    2016-01-01

    spaces, in addition to their similarity in the vector space. Prioritized Weighted Feature Distance (PWFD) works similarly as WFD, but provides the ability to give priorities to desirable features. The accuracy of the proposed functions are compared with other similarity functions on several data sets....... Our results show that the proposed functions work better than other methods proposed in the literature....

  14. Phoneme Similarity and Confusability

    Science.gov (United States)

    Bailey, T.M.; Hahn, U.

    2005-01-01

    Similarity between component speech sounds influences language processing in numerous ways. Explanation and detailed prediction of linguistic performance consequently requires an understanding of these basic similarities. The research reported in this paper contrasts two broad classes of approach to the issue of phoneme similarity-theoretically…

  15. Molecular similarity measures.

    Science.gov (United States)

    Maggiora, Gerald M; Shanmugasundaram, Veerabahu

    2011-01-01

    Molecular similarity is a pervasive concept in chemistry. It is essential to many aspects of chemical reasoning and analysis and is perhaps the fundamental assumption underlying medicinal chemistry. Dissimilarity, the complement of similarity, also plays a major role in a growing number of applications of molecular diversity in combinatorial chemistry, high-throughput screening, and related fields. How molecular information is represented, called the representation problem, is important to the type of molecular similarity analysis (MSA) that can be carried out in any given situation. In this work, four types of mathematical structure are used to represent molecular information: sets, graphs, vectors, and functions. Molecular similarity is a pairwise relationship that induces structure into sets of molecules, giving rise to the concept of chemical space. Although all three concepts - molecular similarity, molecular representation, and chemical space - are treated in this chapter, the emphasis is on molecular similarity measures. Similarity measures, also called similarity coefficients or indices, are functions that map pairs of compatible molecular representations that are of the same mathematical form into real numbers usually, but not always, lying on the unit interval. This chapter presents a somewhat pedagogical discussion of many types of molecular similarity measures, their strengths and limitations, and their relationship to one another. An expanded account of the material on chemical spaces presented in the first edition of this book is also provided. It includes a discussion of the topography of activity landscapes and the role that activity cliffs in these landscapes play in structure-activity studies.

  16. Pythoscape: A framework for generation of large protein similarity networks

    OpenAIRE

    Babbitt, Patricia; Barber, AE; Babbitt, PC

    2012-01-01

    Pythoscape is a framework implemented in Python for processing large protein similarity networks for visualization in other software packages. Protein similarity networks are graphical representations of sequence, structural and other similarities among pr

  17. Similarity Measure of Graphs

    Directory of Open Access Journals (Sweden)

    Amine Labriji

    2017-07-01

    Full Text Available The topic of identifying the similarity of graphs was considered as highly recommended research field in the Web semantic, artificial intelligence, the shape recognition and information research. One of the fundamental problems of graph databases is finding similar graphs to a graph query. Existing approaches dealing with this problem are usually based on the nodes and arcs of the two graphs, regardless of parental semantic links. For instance, a common connection is not identified as being part of the similarity of two graphs in cases like two graphs without common concepts, the measure of similarity based on the union of two graphs, or the one based on the notion of maximum common sub-graph (SCM, or the distance of edition of graphs. This leads to an inadequate situation in the context of information research. To overcome this problem, we suggest a new measure of similarity between graphs, based on the similarity measure of Wu and Palmer. We have shown that this new measure satisfies the properties of a measure of similarities and we applied this new measure on examples. The results show that our measure provides a run time with a gain of time compared to existing approaches. In addition, we compared the relevance of the similarity values obtained, it appears that this new graphs measure is advantageous and  offers a contribution to solving the problem mentioned above.

  18. Processes of Similarity Judgment

    Science.gov (United States)

    Larkey, Levi B.; Markman, Arthur B.

    2005-01-01

    Similarity underlies fundamental cognitive capabilities such as memory, categorization, decision making, problem solving, and reasoning. Although recent approaches to similarity appreciate the structure of mental representations, they differ in the processes posited to operate over these representations. We present an experiment that…

  19. The scientists A history of science told through the lives of its greatest inventors

    CERN Document Server

    Gribbin, John

    2004-01-01

    By focusing on the scientists themselves, Gribbin has written an anecdotal narrative enlivened with stories of personal drama, success and failure. A bestselling science writer with an international reputation, Gribbin is among the few authors who could even attempt a work of this magnitude. Praised as “a sequence of witty, information-packed tales” and “a terrifi c read” by The Times upon its recent British publication, The Scientists breathes new life into such venerable icons as Galileo, Isaac Newton, Albert Einstein and Linus Pauling, as well as lesser lights whose stories have been undeservedly neglected. Filled with pioneers, visionaries, eccentrics and madmen, this is the history of science as it has never been told before.

  20. The semantic similarity ensemble

    Directory of Open Access Journals (Sweden)

    Andrea Ballatore

    2013-12-01

    Full Text Available Computational measures of semantic similarity between geographic terms provide valuable support across geographic information retrieval, data mining, and information integration. To date, a wide variety of approaches to geo-semantic similarity have been devised. A judgment of similarity is not intrinsically right or wrong, but obtains a certain degree of cognitive plausibility, depending on how closely it mimics human behavior. Thus selecting the most appropriate measure for a specific task is a significant challenge. To address this issue, we make an analogy between computational similarity measures and soliciting domain expert opinions, which incorporate a subjective set of beliefs, perceptions, hypotheses, and epistemic biases. Following this analogy, we define the semantic similarity ensemble (SSE as a composition of different similarity measures, acting as a panel of experts having to reach a decision on the semantic similarity of a set of geographic terms. The approach is evaluated in comparison to human judgments, and results indicate that an SSE performs better than the average of its parts. Although the best member tends to outperform the ensemble, all ensembles outperform the average performance of each ensemble's member. Hence, in contexts where the best measure is unknown, the ensemble provides a more cognitively plausible approach.

  1. Gender similarities and differences.

    Science.gov (United States)

    Hyde, Janet Shibley

    2014-01-01

    Whether men and women are fundamentally different or similar has been debated for more than a century. This review summarizes major theories designed to explain gender differences: evolutionary theories, cognitive social learning theory, sociocultural theory, and expectancy-value theory. The gender similarities hypothesis raises the possibility of theorizing gender similarities. Statistical methods for the analysis of gender differences and similarities are reviewed, including effect sizes, meta-analysis, taxometric analysis, and equivalence testing. Then, relying mainly on evidence from meta-analyses, gender differences are reviewed in cognitive performance (e.g., math performance), personality and social behaviors (e.g., temperament, emotions, aggression, and leadership), and psychological well-being. The evidence on gender differences in variance is summarized. The final sections explore applications of intersectionality and directions for future research.

  2. Symmetry and the Monster: One of the Greatest Quests of Mathematics

    Energy Technology Data Exchange (ETDEWEB)

    Szabo, R J [Colin Maclaurin Building, Heriot-Watt University, Edinburgh EH14 4AS (United Kingdom)

    2007-04-13

    The book Symmetry and the Monster: One of the Greatest Quests of Mathematics describes historical events leading up to the discovery of the Monster sporadic group, the largest simple sporadic group. It also expounds the significance and deep relationships between this group and other areas of mathematics and theoretical physics. It begins, in the prologue, with a nice overview of some of the mathematical drama surrounding the discovery of the Monster and its subsequent relationship to number theory (the so-called Moonshine conjectures). From a historical perspective, the book traces back to the roots of group theory, Galois theory, and steadily runs through time through the many famous mathematicians who contributed to group theory, including Lie, Killing and Cartan. Throughout, the author has provided a very nice and deep insight into the sociological and scientific problems at the time, and gives the reader a very prominent inside view of the real people behind the mathematics. The book should be an enjoyable read to anyone with an interest in the history of mathematics. For the non-mathematician the book makes a good, and mostly successful, attempt at being non-technical. Technical mathematical jargon is replaced with more heuristic, intuitive terminology, making the mathematical descriptions in the book fairly easy going. A glossary/hspace{l_brace}0.25pc{r_brace} of/hspace{l_brace}0.25pc{r_brace} terminology for noindent the more scientifically inclined is included in various footnotes throughout the book and in a comprehensive listing at the end of the book. Some more technical material is also included in the form of appendices at the end of the book. Some aspects of physics are also explained in a simple, intuitive way. The author further attempts at various places to give the non-specialist a glimpse into what mathematical proof is all about, and explains the difficulties and technicalities involved in this very nicely (for instance, he mentions the various

  3. Symmetry and the Monster: One of the Greatest Quests of Mathematics

    International Nuclear Information System (INIS)

    Szabo, R J

    2007-01-01

    The book Symmetry and the Monster: One of the Greatest Quests of Mathematics describes historical events leading up to the discovery of the Monster sporadic group, the largest simple sporadic group. It also expounds the significance and deep relationships between this group and other areas of mathematics and theoretical physics. It begins, in the prologue, with a nice overview of some of the mathematical drama surrounding the discovery of the Monster and its subsequent relationship to number theory (the so-called Moonshine conjectures). From a historical perspective, the book traces back to the roots of group theory, Galois theory, and steadily runs through time through the many famous mathematicians who contributed to group theory, including Lie, Killing and Cartan. Throughout, the author has provided a very nice and deep insight into the sociological and scientific problems at the time, and gives the reader a very prominent inside view of the real people behind the mathematics. The book should be an enjoyable read to anyone with an interest in the history of mathematics. For the non-mathematician the book makes a good, and mostly successful, attempt at being non-technical. Technical mathematical jargon is replaced with more heuristic, intuitive terminology, making the mathematical descriptions in the book fairly easy going. A glossary/hspace{0.25pc} of/hspace{0.25pc} terminology for noindent the more scientifically inclined is included in various footnotes throughout the book and in a comprehensive listing at the end of the book. Some more technical material is also included in the form of appendices at the end of the book. Some aspects of physics are also explained in a simple, intuitive way. The author further attempts at various places to give the non-specialist a glimpse into what mathematical proof is all about, and explains the difficulties and technicalities involved in this very nicely (for instance, he mentions the various 100+ page articles that

  4. BOOK REVIEW: Symmetry and the Monster: One of the Greatest Quests of Mathematics

    Science.gov (United States)

    Szabo, R. J.

    2007-04-01

    The book Symmetry and the Monster: One of the Greatest Quests of Mathematics describes historical events leading up to the discovery of the Monster sporadic group, the largest simple sporadic group. It also expounds the significance and deep relationships between this group and other areas of mathematics and theoretical physics. It begins, in the prologue, with a nice overview of some of the mathematical drama surrounding the discovery of the Monster and its subsequent relationship to number theory (the so-called Moonshine conjectures). From a historical perspective, the book traces back to the roots of group theory, Galois theory, and steadily runs through time through the many famous mathematicians who contributed to group theory, including Lie, Killing and Cartan. Throughout, the author has provided a very nice and deep insight into the sociological and scientific problems at the time, and gives the reader a very prominent inside view of the real people behind the mathematics. The book should be an enjoyable read to anyone with an interest in the history of mathematics. For the non-mathematician the book makes a good, and mostly successful, attempt at being non-technical. Technical mathematical jargon is replaced with more heuristic, intuitive terminology, making the mathematical descriptions in the book fairly easy going. A glossary\\hspace{0.25pc} of\\hspace{0.25pc} terminology for noindent the more scientifically inclined is included in various footnotes throughout the book and in a comprehensive listing at the end of the book. Some more technical material is also included in the form of appendices at the end of the book. Some aspects of physics are also explained in a simple, intuitive way. The author further attempts at various places to give the non-specialist a glimpse into what mathematical proof is all about, and explains the difficulties and technicalities involved in this very nicely (for instance, he mentions the various 100+ page articles that

  5. Similarity or difference?

    DEFF Research Database (Denmark)

    Villadsen, Anders Ryom

    2013-01-01

    While the organizational structures and strategies of public organizations have attracted substantial research attention among public management scholars, little research has explored how these organizational core dimensions are interconnected and influenced by pressures for similarity....... In this paper I address this topic by exploring the relation between expenditure strategy isomorphism and structure isomorphism in Danish municipalities. Different literatures suggest that organizations exist in concurrent pressures for being similar to and different from other organizations in their field......-shaped relation exists between expenditure strategy isomorphism and structure isomorphism in a longitudinal quantitative study of Danish municipalities....

  6. The greatest step in vertebrate history: a paleobiological review of the fish-tetrapod transition.

    Science.gov (United States)

    Long, John A; Gordon, Malcolm S

    2004-01-01

    Recent discoveries of previously unknown fossil forms have dramatically transformed understanding of many aspects of the fish-tetrapod transition. Newer paleobiological approaches have also contributed to changed views of which animals were involved and when, where, and how the transition occurred. This review summarizes major advances made and reevaluates alternative interpretations of important parts of the evidence. We begin with general issues and concepts, including limitations of the Paleozoic fossil record. We summarize important features of paleoclimates, paleoenvironments, paleobiogeography, and taphonomy. We then review the history of Devonian tetrapods and their closest stem group ancestors within the sarcopterygian fishes. It is now widely accepted that the first tetrapods arose from advanced tetrapodomorph stock (the elpistostegalids) in the Late Devonian, probably in Euramerica. However, truly terrestrial forms did not emerge until much later, in geographically far-flung regions, in the Lower Carboniferous. The complete transition occurred over about 25 million years; definitive emergences onto land took place during the most recent 5 million years. The sequence of character acquisition during the transition can be seen as a five-step process involving: (1) higher osteichthyan (tetrapodomorph) diversification in the Middle Devonian (beginning about 380 million years ago [mya]), (2) the emergence of "prototetrapods" (e.g., Elginerpeton) in the Frasnian stage (about 372 mya), (3) the appearance of aquatic tetrapods (e.g., Acanthostega) sometime in the early to mid-Famennian (about 360 mya), (4) the appearance of "eutetrapods" (e.g., Tulerpeton) at the very end of the Devonian period (about 358 mya), and (5) the first truly terrestrial tetrapods (e.g., Pederpes) in the Lower Carboniferous (about 340 mya). We discuss each of these steps with respect to inferred functional utility of acquired character sets. Dissociated heterochrony is seen as the most

  7. Comparing Harmonic Similarity Measures

    NARCIS (Netherlands)

    de Haas, W.B.; Robine, M.; Hanna, P.; Veltkamp, R.C.; Wiering, F.

    2010-01-01

    We present an overview of the most recent developments in polyphonic music retrieval and an experiment in which we compare two harmonic similarity measures. In contrast to earlier work, in this paper we specifically focus on the symbolic chord description as the primary musical representation and

  8. Performance trade-offs and ageing in the 'world's greatest athletes'.

    Science.gov (United States)

    Careau, Vincent; Wilson, Robbie S

    2017-08-16

    The mechanistic foundations of performance trade-offs are clear: because body size and shape constrains movement, and muscles vary in strength and fibre type, certain physical traits should act in opposition with others (e.g. sprint versus endurance). Yet performance trade-offs are rarely detected, and traits are often positively correlated. A potential resolution to this conundrum is that within -individual performance trade-offs can be masked by among -individual variation in 'quality'. Although there is a current debate on how to unambiguously define and account for quality, no previous studies have partitioned trait correlations at the within- and among-individual levels. Here, we evaluate performance trade-offs among and within 1369 elite athletes that performed in a total of 6418 combined-events competitions (decathlon and heptathlon). Controlling for age, experience and wind conditions, we detected strong trade-offs between groups of functionally similar events (throwing versus jumping versus running) occurring at the among-individual level. We further modelled individual (co)variation in age-related plasticity of performance and found previously unseen trade-offs in throwing versus running performance that manifest through ageing. Our results verify that human performance is limited by fundamental genetic, environmental and ageing constraints that preclude the simultaneous improvement of performance in multiple dimensions. Identifying these constraints is fundamental to understanding performance trade-offs and predicting the ageing of motor function. © 2017 The Author(s).

  9. Similar or different?

    DEFF Research Database (Denmark)

    Cornér, Solveig; Pyhältö, Kirsi; Peltonen, Jouni

    2018-01-01

    Previous research has identified researcher community and supervisory support as key determinants of the doctoral journey contributing to students’ persistence and robustness. However, we still know little about cross-cultural variation in the researcher community and supervisory support experien...... counter partners, whereas the Finnish students perceived lower levels of instrumental support than the Danish students. The findings imply that seemingly similar contexts hold valid differences in experienced social support and educational strategies at the PhD level....... experienced by PhD students within the same discipline. This study explores the support experiences of 381 PhD students within the humanities and social sciences from three research-intensive universities in Denmark (n=145) and Finland (n=236). The mixed methods design was utilized. The data were collected...... counter partners. The results also indicated that the only form of support in which the students expressed more matched support than mismatched support was informational support. Further investigation showed that the Danish students reported a high level of mismatch in emotional support than their Finnish...

  10. Protein structural similarity search by Ramachandran codes

    Directory of Open Access Journals (Sweden)

    Chang Chih-Hung

    2007-08-01

    Full Text Available Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation. SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

  11. Repdigits in k-Lucas sequences

    Indian Academy of Sciences (India)

    57(2) 2000 243-254) proved that 11 is the largest number with only one distinct digit (the so-called repdigit) in the sequence ( L n ( 2 ) ) n . In this paper, we address a similar problem in the family of -Lucas sequences. We also show that the -Lucas sequences have similar properties to those of -Fibonacci sequences ...

  12. Domain similarity based orthology detection

    OpenAIRE

    Bitard-Feildel, Tristan; Kemena, Carsten; Greenwood, Jenny M; Bornberg-Bauer, Erich

    2015-01-01

    Background Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationa...

  13. GROUPING WEB ACCESS SEQUENCES uSING SEQUENCE ALIGNMENT METHOD

    OpenAIRE

    BHUPENDRA S CHORDIA; KRISHNAKANT P ADHIYA

    2011-01-01

    In web usage mining grouping of web access sequences can be used to determine the behavior or intent of a set of users. Grouping websessions is how to measure the similarity between web sessions. There are many shortcomings in traditional measurement methods. The taskof grouping web sessions based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-groupsimilarity is done using sequence alignment method. This paper introduces a new method to group we...

  14. Sequence assembly

    DEFF Research Database (Denmark)

    Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria

    2009-01-01

    Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and...... in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html....

  15. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  16. Genome sequencing for obstetricians & gynaecologists | Kent ...

    African Journals Online (AJOL)

    The medical profession has been waiting for a decade to be invigorated by the sequencing of the human genome, arguably the greatest scientific project ever. The technology has been spectacular but the results of the project have yielded more unexpected results than definitive answers – many about the very nature of our ...

  17. The development of a concise questionnaire designed to measure perceived outcomes on the issues of greatest importance to patients.

    Science.gov (United States)

    Busby, M; Burke, F J T; Matthews, R; Cyrta, J; Mullins, A

    2012-04-01

    To develop a concise patient feedback audit instrument designed to inform practice development on those issues of greatest importance to patients. A literature review was used to establish the issues which were of greatest importance to patients. Ten core questions were then designed with the help of an experienced survey and polling organisation. A challenging grading of patient responses was utilised in an attempt to differentiate perceived performance within a practice on the different aspects and between practices. A feasibility study was conducted using the interactive voice response mode with seven volunteer practices in 2009. The instrument was then used in the later part of 2010 by 61 practices mostly in paper-based format. Practices received feedback which is primarily based on a bar chart plotting their percentage of top grades received against a national reference sample (NRS) compiled from the results of other participating practices. A statistical analysis was conducted to establish the level at which an individual practice result becomes statistically significant against the NRS. The 61 participating practices each received an average of 121 responses (total 7,381 responses). Seventy-four percent of responses across all ten questions received the top grade, 'ideal'. Statistical analysis indicated that at the level of 121 responses, a score of around 4-9% difference to the National Reference Sample, depending on the specific question, was statistically significant. In keeping with international experience with dental patient feedback surveys this audit suggests high levels of patient satisfaction with their dental service. Nevertheless, by focusing results on the proportion of highest grades received, this instrument is capable of indicating when perceived performance falls significantly below the average. It can therefore inform practice development.

  18. A space-efficient algorithm for local similarities.

    Science.gov (United States)

    Huang, X Q; Hardison, R C; Miller, W

    1990-10-01

    Existing dynamic-programming algorithms for identifying similar regions of two sequences require time and space proportional to the product of the sequence lengths. Often this space requirement is more limiting than the time requirement. We describe a dynamic-programming local-similarity algorithm that needs only space proportional to the sum of the sequence lengths. The method can also find repeats within a single long sequence. To illustrate the algorithm's potential, we discuss comparison of a 73,360 nucleotide sequence containing the human beta-like globin gene cluster and a corresponding 44,594 nucleotide sequence for rabbit, a problem well beyond the capabilities of other dynamic-programming software.

  19. Pythoscape: a framework for generation of large protein similarity networks.

    Science.gov (United States)

    Barber, Alan E; Babbitt, Patricia C

    2012-11-01

    Pythoscape is a framework implemented in Python for processing large protein similarity networks for visualization in other software packages. Protein similarity networks are graphical representations of sequence, structural and other similarities among proteins for which pairwise all-by-all similarity connections have been calculated. Mapping of biological and other information to network nodes or edges enables hypothesis creation about sequence-structure-function relationships across sets of related proteins. Pythoscape provides several options to calculate pairwise similarities for input sequences or structures, applies filters to network edges and defines sets of similar nodes and their associated data as single nodes (termed representative nodes) for compression of network information and output data or formatted files for visualization.

  20. Nocturia is the Lower Urinary Tract Symptom With Greatest Impact on Quality of Life of Men From a Community Setting

    Directory of Open Access Journals (Sweden)

    Eduardo de Paula Miranda

    2014-06-01

    Full Text Available PurposeLower urinary tract symptoms are numerous, but the specific impact of each of these symptoms on the quality of life (QoL has not been evaluated in community-dwelling men. An assessment of these symptoms and their effects on QoL was the focus of this study.MethodsWe performed a cross-sectional study with 373 men aged >50 years from a community setting. Patients completed the International Prostate Symptom Score questionnaire, which includes questions on each of the specific urinary symptoms and a question addressing health-related QoL that are graded from 0 to 5. We used the Pearson correlation test to assess the impact of each symptom on QoL.ResultsNocturia (58.9% was the most prevalent urinary symptom. The mean score was 0.9±1.4 for incomplete emptying, 1.0±1.5 for frequency, 0.9±1.3 for intermittency, 0.8±1.3 for urgency, 1.0±1.5 for weak stream, 0.5±1.0 for straining, and 2.0±1.6 for nocturia. Nocturia and frequency were the only symptoms associated with poorer QoL, with nocturia showing a stronger association.ConclusionsNocturia affects 50% of community dwelling men aged >50 years, and is the lower urinary tract symptom with the greatest negative impact on QoL.

  1. When the Most Potent Combination of Antibiotics Selects for the Greatest Bacterial Load: The Smile-Frown Transition

    OpenAIRE

    Pena-Miller, Rafael; Laehnemann, David; Jansen, Gunther; Fuentes-Hernandez, Ayari; Rosenstiel, Philip; Schulenburg, Hinrich; Beardmore, Robert

    2013-01-01

    Conventional wisdom holds that the best way to treat infection with antibiotics is to 'hit early and hit hard'. A favoured strategy is to deploy two antibiotics that produce a stronger effect in combination than if either drug were used alone. But are such synergistic combinations necessarily optimal? We combine mathematical modelling, evolution experiments, whole genome sequencing and genetic manipulation of a resistance mechanism to demonstrate that deploying synergistic antibiotics can, in...

  2. A COMPARISON OF SEMANTIC SIMILARITY MODELS IN EVALUATING CONCEPT SIMILARITY

    Directory of Open Access Journals (Sweden)

    Q. X. Xu

    2012-08-01

    Full Text Available The semantic similarities are important in concept definition, recognition, categorization, interpretation, and integration. Many semantic similarity models have been established to evaluate semantic similarities of objects or/and concepts. To find out the suitability and performance of different models in evaluating concept similarities, we make a comparison of four main types of models in this paper: the geometric model, the feature model, the network model, and the transformational model. Fundamental principles and main characteristics of these models are introduced and compared firstly. Land use and land cover concepts of NLCD92 are employed as examples in the case study. The results demonstrate that correlations between these models are very high for a possible reason that all these models are designed to simulate the similarity judgement of human mind.

  3. Renewing the Respect for Similarity

    Directory of Open Access Journals (Sweden)

    Shimon eEdelman

    2012-07-01

    Full Text Available In psychology, the concept of similarity has traditionally evoked a mixture of respect, stemmingfrom its ubiquity and intuitive appeal, and concern, due to its dependence on the framing of the problemat hand and on its context. We argue for a renewed focus on similarity as an explanatory concept, bysurveying established results and new developments in the theory and methods of similarity-preservingassociative lookup and dimensionality reduction — critical components of many cognitive functions, aswell as of intelligent data management in computer vision. We focus in particular on the growing familyof algorithms that support associative memory by performing hashing that respects local similarity, andon the uses of similarity in representing structured objects and scenes. Insofar as these similarity-basedideas and methods are useful in cognitive modeling and in AI applications, they should be included inthe core conceptual toolkit of computational neuroscience.

  4. Method and apparatus for biological sequence comparison

    Science.gov (United States)

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  5. Fish species of greatest conservation need in wadeable Iowa streams: current status and effectiveness of Aquatic Gap Program distribution models

    Science.gov (United States)

    Sindt, Anthony R.; Pierce, Clay; Quist, Michael C.

    2012-01-01

    Effective conservation of fish species of greatest conservation need (SGCN) requires an understanding of species–habitat relationships and distributional trends. Thus, modeling the distribution of fish species across large spatial scales may be a valuable tool for conservation planning. Our goals were to evaluate the status of 10 fish SGCN in wadeable Iowa streams and to test the effectiveness of Iowa Aquatic Gap Analysis Project (IAGAP) species distribution models. We sampled fish assemblages from 86 wadeable stream segments in the Mississippi River drainage of Iowa during 2009 and 2010 to provide contemporary, independent fish species presence–absence data. The frequencies of occurrence in stream segments where species were historically documented varied from 0.0% for redfin shiner Lythrurus umbratilis to 100.0% for American brook lampreyLampetra appendix, with a mean of 53.0%, suggesting that the status of Iowa fish SGCN is highly variable. Cohen's kappa values and other model performance measures were calculated by comparing field-collected presence–absence data with IAGAP model–predicted presences and absences for 12 fish SGCN. Kappa values varied from 0.00 to 0.50, with a mean of 0.15. The models only predicted the occurrences of banded darterEtheostoma zonale, southern redbelly dace Phoxinus erythrogaster, and longnose daceRhinichthys cataractae more accurately than would be expected by chance. Overall, the accuracy of the twelve models was low, with a mean correct classification rate of 58.3%. Poor model performance probably reflects the difficulties associated with modeling the distribution of rare species and the inability of the large-scale habitat variables used in IAGAP models to explain the variation in fish species occurrences. Our results highlight the importance of quantifying the confidence in species distribution model predictions with an independent data set and the need for long-term monitoring to better understand the

  6. Pain is the Greatest Preoperative Concern for Patients and Parents Before Posterior Spinal Fusion for Adolescent Idiopathic Scoliosis.

    Science.gov (United States)

    Chan, Priscella; Skaggs, David L; Sanders, Austin E; Villamor, Gabriela A; Choi, Paul D; Tolo, Vernon T; Andras, Lindsay M

    2017-11-01

    Prospective cross-sectional study. To evaluate patients' and parents' concerns so they can be addressed with appropriate preoperative counseling. Despite much research on outcomes for posterior spinal fusion (PSF) in adolescent idiopathic scoliosis (AIS), little is available about preoperative fears or concerns. Patients with AIS undergoing PSF, their parents, and surgeons were prospectively enrolled and asked to complete a survey on their fears and concerns about surgery at their preoperative appointment. Forty-eight patients and parents completed surveys. Four attending pediatric spine surgeons participated and submitted 48 responses. Mean age of patients was 14.2 years. On a scale of 0 to 10, mean level of concern reported by parents (6.9) was higher than that reported by patients (4.6). Surgeons rated the procedure's complexity on a scale of 0 to 10 and reported a mean of 5.2. Neither patients' nor parents' level of concern correlated with the surgeons' assessment of the procedure's complexity level (R = 0.19 and 0.12, P = 0.20 and P = 0.42, respectively). Top three concerns for patients were pain (25%), ability to return to activities (21%), and neurologic injury (17%). Top three concerns for parents were pain (35%), neurologic injury (21%), and amount of correction (17%). Top three concerns for surgeons were postoperative shoulder balance (44%), neurologic injury (27%), and lowest instrumented vertebrae selection (27%). Patients reported the same concerns 23% of the time as parents, and 17% of the time as surgeons. Parents and surgeons reported the same concerns 21% of the time. Pain was the greatest concern for both patients and parents but was rarely listed as a concern by surgeons. Parent and patient level of concern did not correlate to the surgeon's assessment of the procedure's complexity. Neurologic injury was a top concern for all groups, but otherwise there was little overlap between physician, patient, and parent concerns. 3.

  7. Self-similar cosmological models

    Energy Technology Data Exchange (ETDEWEB)

    Chao, W Z [Cambridge Univ. (UK). Dept. of Applied Mathematics and Theoretical Physics

    1981-07-01

    The kinematics and dynamics of self-similar cosmological models are discussed. The degrees of freedom of the solutions of Einstein's equations for different types of models are listed. The relation between kinematic quantities and the classifications of the self-similarity group is examined. All dust local rotational symmetry models have been found.

  8. Self-similar factor approximants

    International Nuclear Information System (INIS)

    Gluzman, S.; Yukalov, V.I.; Sornette, D.

    2003-01-01

    The problem of reconstructing functions from their asymptotic expansions in powers of a small variable is addressed by deriving an improved type of approximants. The derivation is based on the self-similar approximation theory, which presents the passage from one approximant to another as the motion realized by a dynamical system with the property of group self-similarity. The derived approximants, because of their form, are called self-similar factor approximants. These complement the obtained earlier self-similar exponential approximants and self-similar root approximants. The specific feature of self-similar factor approximants is that their control functions, providing convergence of the computational algorithm, are completely defined from the accuracy-through-order conditions. These approximants contain the Pade approximants as a particular case, and in some limit they can be reduced to the self-similar exponential approximants previously introduced by two of us. It is proved that the self-similar factor approximants are able to reproduce exactly a wide class of functions, which include a variety of nonalgebraic functions. For other functions, not pertaining to this exactly reproducible class, the factor approximants provide very accurate approximations, whose accuracy surpasses significantly that of the most accurate Pade approximants. This is illustrated by a number of examples showing the generality and accuracy of the factor approximants even when conventional techniques meet serious difficulties

  9. Dynamic similarity in erosional processes

    Science.gov (United States)

    Scheidegger, A.E.

    1963-01-01

    A study is made of the dynamic similarity conditions obtaining in a variety of erosional processes. The pertinent equations for each type of process are written in dimensionless form; the similarity conditions can then easily be deduced. The processes treated are: raindrop action, slope evolution and river erosion. ?? 1963 Istituto Geofisico Italiano.

  10. Personalized recommendation with corrected similarity

    International Nuclear Information System (INIS)

    Zhu, Xuzhen; Tian, Hui; Cai, Shimin

    2014-01-01

    Personalized recommendation has attracted a surge of interdisciplinary research. Especially, similarity-based methods in applications of real recommendation systems have achieved great success. However, the computations of similarities are overestimated or underestimated, in particular because of the defective strategy of unidirectional similarity estimation. In this paper, we solve this drawback by leveraging mutual correction of forward and backward similarity estimations, and propose a new personalized recommendation index, i.e., corrected similarity based inference (CSI). Through extensive experiments on four benchmark datasets, the results show a greater improvement of CSI in comparison with these mainstream baselines. And a detailed analysis is presented to unveil and understand the origin of such difference between CSI and mainstream indices. (paper)

  11. Towards Personalized Medicine: Leveraging Patient Similarity and Drug Similarity Analytics

    Science.gov (United States)

    Zhang, Ping; Wang, Fei; Hu, Jianying; Sorrentino, Robert

    2014-01-01

    The rapid adoption of electronic health records (EHR) provides a comprehensive source for exploratory and predictive analytic to support clinical decision-making. In this paper, we investigate how to utilize EHR to tailor treatments to individual patients based on their likelihood to respond to a therapy. We construct a heterogeneous graph which includes two domains (patients and drugs) and encodes three relationships (patient similarity, drug similarity, and patient-drug prior associations). We describe a novel approach for performing a label propagation procedure to spread the label information representing the effectiveness of different drugs for different patients over this heterogeneous graph. The proposed method has been applied on a real-world EHR dataset to help identify personalized treatments for hypercholesterolemia. The experimental results demonstrate the effectiveness of the approach and suggest that the combination of appropriate patient similarity and drug similarity analytics could lead to actionable insights for personalized medicine. Particularly, by leveraging drug similarity in combination with patient similarity, our method could perform well even on new or rarely used drugs for which there are few records of known past performance. PMID:25717413

  12. George W. Bush's Post-September 11 Rhetoric of Covenant Renewal: Upholding the Faith of the Greatest Generation

    Science.gov (United States)

    Bostdorff, Denise M.

    2003-01-01

    The appeal of Bush's post-September 11 discourse lies in its similarities with the Puritan rhetoric of covenant renewal by which ministers brought second- and third-generation Puritans into the church. Through this epideictic discourse, Bush implored younger Americans to uphold the national covenant of their "elders," the World War II generation,…

  13. Maximising the effect of combination HIV prevention through prioritisation of the people and places in greatest need: a modelling study.

    Science.gov (United States)

    Anderson, Sarah-Jane; Cherutich, Peter; Kilonzo, Nduku; Cremin, Ide; Fecht, Daniela; Kimanga, Davies; Harper, Malayah; Masha, Ruth Laibon; Ngongo, Prince Bahati; Maina, William; Dybul, Mark; Hallett, Timothy B

    2014-07-19

    Epidemiological data show substantial variation in the risk of HIV infection between communities within African countries. We hypothesised that focusing appropriate interventions on geographies and key populations at high risk of HIV infection could improve the effect of investments in the HIV response. With use of Kenya as a case study, we developed a mathematical model that described the spatiotemporal evolution of the HIV epidemic and that incorporated the demographic, behavioural, and programmatic differences across subnational units. Modelled interventions (male circumcision, behaviour change communication, early antiretoviral therapy, and pre-exposure prophylaxis) could be provided to different population groups according to their risk behaviours or their location. For a given national budget, we compared the effect of a uniform intervention strategy, in which the same complement of interventions is provided across the country, with a focused strategy that tailors the set of interventions and amount of resources allocated to the local epidemiological conditions. A uniformly distributed combination of HIV prevention interventions could reduce the total number of new HIV infections by 40% during a 15-year period. With no additional spending, this effect could be increased by 14% during the 15 years-almost 100,000 extra infections, and result in 33% fewer new HIV infections occurring every year by the end of the period if the focused approach is used to tailor resource allocation to reflect patterns in local epidemiology. The cumulative difference in new infections during the 15-year projection period depends on total budget and costs of interventions, and could be as great as 150,000 (a cumulative difference as great as 22%) under different assumptions about the unit costs of intervention. The focused approach achieves greater effect than the uniform approach despite exactly the same investment. Through prioritisation of the people and locations at greatest

  14. Posts, pics, or polls? Which post type generates the greatest engagement in a Facebook physical activity intervention?

    Science.gov (United States)

    Edney, Sarah; Looyestyn, Jemma; Ryan, Jillian; Kernot, Jocelyn; Maher, Carol

    2018-04-05

    Social networking websites have attracted considerable attention as a delivery platform for physical activity interventions. Current evidence highlights a need to enhance user engagement with these interventions to actualize their potential. The purpose of this study was to determine which post type generates the most engagement from participants and whether engagement was related to change in physical activity in an intervention delivered via Facebook. Subgroup analysis of the intervention condition of a randomized controlled trial was conducted. The group moderator posted a new message to the private Facebook group each day of the program. The Facebook posts (n = 118) were categorized into the following types: moderator-initiated running program, multimedia, motivational, opinion polls, or discussion question and participant-initiated experience shares, or questions. Four metrics were used to measure volume of engagement with each post type, "likes," "comments," "poll votes," and "photo uploads." One-way ANOVA was used to determine whether engagement differed by post type and an independent samples t-test to determine differences in engagement between moderator and participant-initiated posts. Pearson correlation was used to examine associations between total engagement and change in physical activity. Engagement varied by post type. Polls elicited the greatest engagement (p ≤ .01). The most common form of engagement was "likes," and engagement was higher for moderator-initiated rather than participant-initiated posts (mean = 8.0 [SD 6.8] vs. 5.3 [SD 3.2]; p ≤ .01). Total engagement with the Facebook group was not directly associated with change in physical activity (r = -.13, p = .47). However, engagement was associated with compliance with the running program (r = .37, p = .04) and there was a nonsignificant positive association between compliance and change in physical activity (r = .32, p = .08). Posts requiring a simple response generated the most

  15. Large margin classification with indefinite similarities

    KAUST Repository

    Alabdulmohsin, Ibrahim

    2016-01-07

    Classification with indefinite similarities has attracted attention in the machine learning community. This is partly due to the fact that many similarity functions that arise in practice are not symmetric positive semidefinite, i.e. the Mercer condition is not satisfied, or the Mercer condition is difficult to verify. Examples of such indefinite similarities in machine learning applications are ample including, for instance, the BLAST similarity score between protein sequences, human-judged similarities between concepts and words, and the tangent distance or the shape matching distance in computer vision. Nevertheless, previous works on classification with indefinite similarities are not fully satisfactory. They have either introduced sources of inconsistency in handling past and future examples using kernel approximation, settled for local-minimum solutions using non-convex optimization, or produced non-sparse solutions by learning in Krein spaces. Despite the large volume of research devoted to this subject lately, we demonstrate in this paper how an old idea, namely the 1-norm support vector machine (SVM) proposed more than 15 years ago, has several advantages over more recent work. In particular, the 1-norm SVM method is conceptually simpler, which makes it easier to implement and maintain. It is competitive, if not superior to, all other methods in terms of predictive accuracy. Moreover, it produces solutions that are often sparser than more recent methods by several orders of magnitude. In addition, we provide various theoretical justifications by relating 1-norm SVM to well-established learning algorithms such as neural networks, SVM, and nearest neighbor classifiers. Finally, we conduct a thorough experimental evaluation, which reveals that the evidence in favor of 1-norm SVM is statistically significant.

  16. Clustering and visualizing similarity networks of membrane proteins.

    Science.gov (United States)

    Hu, Geng-Ming; Mai, Te-Lun; Chen, Chi-Ming

    2015-08-01

    We proposed a fast and unsupervised clustering method, minimum span clustering (MSC), for analyzing the sequence-structure-function relationship of biological networks, and demonstrated its validity in clustering the sequence/structure similarity networks (SSN) of 682 membrane protein (MP) chains. The MSC clustering of MPs based on their sequence information was found to be consistent with their tertiary structures and functions. For the largest seven clusters predicted by MSC, the consistency in chain function within the same cluster is found to be 100%. From analyzing the edge distribution of SSN for MPs, we found a characteristic threshold distance for the boundary between clusters, over which SSN of MPs could be properly clustered by an unsupervised sparsification of the network distance matrix. The clustering results of MPs from both MSC and the unsupervised sparsification methods are consistent with each other, and have high intracluster similarity and low intercluster similarity in sequence, structure, and function. Our study showed a strong sequence-structure-function relationship of MPs. We discussed evidence of convergent evolution of MPs and suggested applications in finding structural similarities and predicting biological functions of MP chains based on their sequence information. © 2015 Wiley Periodicals, Inc.

  17. Self-similar analysis of the spherical implosion process

    International Nuclear Information System (INIS)

    Ishiguro, Yukio; Katsuragi, Satoru.

    1976-07-01

    The implosion processes caused by laser-heating ablation has been studied by self-similarity analysis. Attention is paid to the possibility of existence of the self-similar solution which reproduces the implosion process of high compression. Details of the self-similar analysis are reproduced and conclusions are drawn quantitatively on the gas compression by a single shock. The compression process by a sequence of shocks is discussed in self-similarity. The gas motion followed by a homogeneous isentropic compression is represented by a self-similar motion. (auth.)

  18. Similarity measures for face recognition

    CERN Document Server

    Vezzetti, Enrico

    2015-01-01

    Face recognition has several applications, including security, such as (authentication and identification of device users and criminal suspects), and in medicine (corrective surgery and diagnosis). Facial recognition programs rely on algorithms that can compare and compute the similarity between two sets of images. This eBook explains some of the similarity measures used in facial recognition systems in a single volume. Readers will learn about various measures including Minkowski distances, Mahalanobis distances, Hansdorff distances, cosine-based distances, among other methods. The book also summarizes errors that may occur in face recognition methods. Computer scientists "facing face" and looking to select and test different methods of computing similarities will benefit from this book. The book is also useful tool for students undertaking computer vision courses.

  19. Revisiting Inter-Genre Similarity

    DEFF Research Database (Denmark)

    Sturm, Bob L.; Gouyon, Fabien

    2013-01-01

    We revisit the idea of ``inter-genre similarity'' (IGS) for machine learning in general, and music genre recognition in particular. We show analytically that the probability of error for IGS is higher than naive Bayes classification with zero-one loss (NB). We show empirically that IGS does...... not perform well, even for data that satisfies all its assumptions....

  20. Fast business process similarity search

    NARCIS (Netherlands)

    Yan, Z.; Dijkman, R.M.; Grefen, P.W.P.J.

    2012-01-01

    Nowadays, it is common for organizations to maintain collections of hundreds or even thousands of business processes. Techniques exist to search through such a collection, for business process models that are similar to a given query model. However, those techniques compare the query model to each

  1. Glove boxes and similar containments

    International Nuclear Information System (INIS)

    Anon.

    1975-01-01

    According to the present invention a glove box or similar containment is provided with an exhaust system including a vortex amplifier venting into the system, the vortex amplifier also having its main inlet in fluid flow connection with the containment and a control inlet in fluid flow connection with the atmosphere outside the containment. (U.S.)

  2. Similarity of eigenstates in generalized labyrinth tilings

    International Nuclear Information System (INIS)

    Thiem, Stefanie; Schreiber, Michael

    2010-01-01

    The eigenstates of d-dimensional quasicrystalline models with a separable Hamiltonian are studied within the tight-binding model. The approach is based on mathematical sequences, constructed by an inflation rule P = {w → s,s → sws b-1 } describing the weak/strong couplings of atoms in a quasiperiodic chain. Higher-dimensional quasiperiodic tilings are constructed as a direct product of these chains and their eigenstates can be directly calculated by multiplying the energies E or wave functions ψ of the chain, respectively. Applying this construction rule, the grid in d dimensions splits into 2 d-1 different tilings, for which we investigated the characteristics of the wave functions. For the standard two-dimensional labyrinth tiling constructed from the octonacci sequence (b = 2) the lattice breaks up into two identical lattices, which consequently yield the same eigenstates. While this is not the case for b ≠ 2, our numerical results show that the wave functions of the different grids become increasingly similar for large system sizes. This can be explained by the fact that the structure of the 2 d-1 grids mainly differs at the boundaries and thus for large systems the eigenstates approach each other. This property allows us to analytically derive properties of the higher-dimensional generalized labyrinth tilings from the one-dimensional results. In particular participation numbers and corresponding scaling exponents have been determined.

  3. An Alfven eigenmode similarity experiment

    International Nuclear Information System (INIS)

    Heidbrink, W W; Fredrickson, E; Gorelenkov, N N; Hyatt, A W; Kramer, G; Luo, Y

    2003-01-01

    The major radius dependence of Alfven mode stability is studied by creating plasmas with similar minor radius, shape, magnetic field (0.5 T), density (n e ≅3x10 19 m -3 ), electron temperature (1.0 keV) and beam ion population (near-tangential 80 keV deuterium injection) on both NSTX and DIII-D. The major radius of NSTX is half the major radius of DIII-D. The super-Alfvenic beam ions that drive the modes have overlapping values of v f /v A in the two devices. Observed beam-driven instabilities include toroidicity-induced Alfven eigenmodes (TAE). The stability threshold for the TAE is similar in the two devices. As expected theoretically, the most unstable toroidal mode number n is larger in DIII-D

  4. Comparative genomics beyond sequence-based alignments

    DEFF Research Database (Denmark)

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...

  5. Compressional Alfven Eigenmode Similarity Study

    Science.gov (United States)

    Heidbrink, W. W.; Fredrickson, E. D.; Gorelenkov, N. N.; Rhodes, T. L.

    2004-11-01

    NSTX and DIII-D are nearly ideal for Alfven eigenmode (AE) similarity experiments, having similar neutral beams, fast-ion to Alfven speed v_f/v_A, fast-ion pressure, and shape of the plasma, but with a factor of 2 difference in the major radius. Toroidicity-induced AE with ˜100 kHz frequencies were compared in an earlier study [1]; this paper focuses on higher frequency AE with f ˜ 1 MHz. Compressional AE (CAE) on NSTX have a polarization, dependence on the fast-ion distribution function, frequency scaling, and low-frequency limit that are qualitatively consistent with CAE theory [2]. Global AE (GAE) are also observed. On DIII-D, coherent modes in this frequency range are observed during low-field (0.6 T) similarity experiments. Experiments will compare the CAE stability limits on DIII-D with the NSTX stability limits, with the aim of determining if CAE will be excited by alphas in a reactor. Predicted differences in the frequency splitting Δ f between excited modes will also be used. \\vspace0.25em [1] W.W. Heidbrink, et al., Plasmas Phys. Control. Fusion 45, 983 (2003). [2] E.D. Fredrickson, et al., Princeton Plasma Physics Laboratory Report PPPL-3955 (2004).

  6. Phylogenetic studies of transmission dynamics in generalized HIV epidemics: An essential tool where the burden is greatest?

    Science.gov (United States)

    Dennis, Ann M.; Herbeck, Joshua T.; Brown, Andrew Leigh; Kellam, Paul; de Oliveira, Tulio; Pillay, Deenan; Fraser, Christophe; Cohen, Myron S.

    2014-01-01

    Efficient and effective HIV prevention measures for generalized epidemics in sub-Saharan Africa have not yet been validated at the population-level. Design and impact evaluation of such measures requires fine-scale understanding of local HIV transmission dynamics. The novel tools of HIV phylogenetics and molecular epidemiology may elucidate these transmission dynamics. Such methods have been incorporated into studies of concentrated HIV epidemics to identify proximate and determinant traits associated with ongoing transmission. However, applying similar phylogenetic analyses to generalized epidemics, including the design and evaluation of prevention trials, presents additional challenges. Here we review the scope of these methods and present examples of their use in concentrated epidemics in the context of prevention. Next, we describe the current uses for phylogenetics in generalized epidemics, and discuss their promise for elucidating transmission patterns and informing prevention trials. Finally, we review logistic and technical challenges inherent to large-scale molecular epidemiological studies of generalized epidemics, and suggest potential solutions. PMID:24977473

  7. Similarity analysis between quantum images

    Science.gov (United States)

    Zhou, Ri-Gui; Liu, XingAo; Zhu, Changming; Wei, Lai; Zhang, Xiafen; Ian, Hou

    2018-06-01

    Similarity analyses between quantum images are so essential in quantum image processing that it provides fundamental research for the other fields, such as quantum image matching, quantum pattern recognition. In this paper, a quantum scheme based on a novel quantum image representation and quantum amplitude amplification algorithm is proposed. At the end of the paper, three examples and simulation experiments show that the measurement result must be 0 when two images are same, and the measurement result has high probability of being 1 when two images are different.

  8. Similarity flows in relativistic hydrodynamics

    International Nuclear Information System (INIS)

    Blaizot, J.P.; Ollitrault, J.Y.

    1986-01-01

    In ultra-relativistic heavy ion collisions, one expects in particular to observe a deconfinement transition leading to a formation of quark gluon plasma. In the framework of the hydrodynamic model, experimental signatures of such a plasma may be looked for as observable consequences of a first order transition on the evolution of the system. In most of the possible scenario, the phase transition is accompanied with discontinuities in the hydrodynamic flow, such as shock waves. The method presented in this paper has been developed to treat without too much numerical effort such discontinuous flow. It relies heavily on the use of similarity solutions of the hydrodynamic equations

  9. Universal sequence map (USM of arbitrary discrete sequences

    Directory of Open Access Journals (Sweden)

    Almeida Jonas S

    2002-02-01

    Full Text Available Abstract Background For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. Results We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM, is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR. The latter enables the representation of 4 unit type sequences (like DNA as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. Conclusions USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules.

  10. Self-similar gravitational clustering

    International Nuclear Information System (INIS)

    Efstathiou, G.; Fall, S.M.; Hogan, C.

    1979-01-01

    The evolution of gravitational clustering is considered and several new scaling relations are derived for the multiplicity function. These include generalizations of the Press-Schechter theory to different densities and cosmological parameters. The theory is then tested against multiplicity function and correlation function estimates for a series of 1000-body experiments. The results are consistent with the theory and show some dependence on initial conditions and cosmological density parameter. The statistical significance of the results, however, is fairly low because of several small number effects in the experiments. There is no evidence for a non-linear bootstrap effect or a dependence of the multiplicity function on the internal dynamics of condensed groups. Empirical estimates of the multiplicity function by Gott and Turner have a feature near the characteristic luminosity predicted by the theory. The scaling relations allow the inference from estimates of the galaxy luminosity function that galaxies must have suffered considerable dissipation if they originally formed from a self-similar hierarchy. A method is also developed for relating the multiplicity function to similar measures of clustering, such as those of Bhavsar, for the distribution of galaxies on the sky. These are shown to depend on the luminosity function in a complicated way. (author)

  11. Static multiplicities in heterogeneous azeotropic distillation sequences

    DEFF Research Database (Denmark)

    Esbjerg, Klavs; Andersen, Torben Ravn; Jørgensen, Sten Bay

    1998-01-01

    In this paper the results of a bifurcation analysis on heterogeneous azeotropic distillation sequences are given. Two sequences suitable for ethanol dehydration are compared: The 'direct' and the 'indirect' sequence. It is shown, that the two sequences, despite their similarities, exhibit very...... different static behavior. The method of Petlyuk and Avet'yan (1971), Bekiaris et al. (1993), which assumes infinite reflux and infinite number of stages, is extended to and applied on heterogeneous azeotropic distillation sequences. The predictions are substantiated through simulations. The static sequence...

  12. Self-similar pattern formation and continuous mechanics of self-similar systems

    Directory of Open Access Journals (Sweden)

    A. V. Dyskin

    2007-01-01

    Full Text Available In many cases, the critical state of systems that reached the threshold is characterised by self-similar pattern formation. We produce an example of pattern formation of this kind – formation of self-similar distribution of interacting fractures. Their formation starts with the crack growth due to the action of stress fluctuations. It is shown that even when the fluctuations have zero average the cracks generated by them could grow far beyond the scale of stress fluctuations. Further development of the fracture system is controlled by crack interaction leading to the emergence of self-similar crack distributions. As a result, the medium with fractures becomes discontinuous at any scale. We develop a continuum fractal mechanics to model its physical behaviour. We introduce a continuous sequence of continua of increasing scales covering this range of scales. The continuum of each scale is specified by the representative averaging volume elements of the corresponding size. These elements determine the resolution of the continuum. Each continuum hides the cracks of scales smaller than the volume element size while larger fractures are modelled explicitly. Using the developed formalism we investigate the stability of self-similar crack distributions with respect to crack growth and show that while the self-similar distribution of isotropically oriented cracks is stable, the distribution of parallel cracks is not. For the isotropically oriented cracks scaling of permeability is determined. For permeable materials (rocks with self-similar crack distributions permeability scales as cube of crack radius. This property could be used for detecting this specific mechanism of formation of self-similar crack distributions.

  13. Seniority bosons from similarity transformations

    International Nuclear Information System (INIS)

    Geyer, H.B.

    1986-01-01

    The requirement of associating in the boson space seniority with twice the number of non-s bosons defines a similarity transformation which re-expresses the Dyson pair boson images in terms of seniority bosons. In particular the fermion S-pair creation operator is mapped onto an operator which, unlike the pair boson image, does not change the number of non-s bosons. The original results of Otsuka, Arima and Iachello are recovered by this procedure while at the same time they are generalized to include g-bosons or even bosons with J>4 as well as any higher order boson terms. Furthermore the seniority boson images are valid for an arbitrary number of d- or g-bosons - a result which is not readily obtainable within the framework of the usual Marumori- or OAI-method

  14. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  15. Alaska, Gulf spills share similarities

    International Nuclear Information System (INIS)

    Usher, D.

    1991-01-01

    The accidental Exxon Valdez oil spill in Alaska and the deliberate dumping of crude oil into the Persian Gulf as a tactic of war contain both glaring differences and surprising similarities. Public reaction and public response was much greater to the Exxon Valdez spill in pristine Prince William Sound than to the war-related tragedy in the Persian Gulf. More than 12,000 workers helped in the Alaskan cleanup; only 350 have been involved in Kuwait. But in both instances, environmental damages appear to be less than anticipated. Natures highly effective self-cleansing action is primarily responsible for minimizing the damages. One positive action growing out of the two incidents is increased international cooperation and participation in oil-spill clean-up efforts. In 1990, in the aftermath of the Exxon Valdez spill, 94 nations signed an international accord on cooperation in future spills. The spills can be historic environmental landmarks leading to creation of more sophisticated response systems worldwide

  16. Defining a similarity threshold for a functional proteinsequence pattern: The signal peptide cleavage site

    DEFF Research Database (Denmark)

    Nielsen, Henrik; Engelbrecht, Jacob; von Heijne, Gunnar

    1996-01-01

    When preparing data sets of amino acid or nucleotide sequences it is necessary to exclude redundant or homologous sequences in order to avoid overestimating the predictive performance of an algorithm. For some time methods for doing this have been available in the area of protein structure...... prediction. We have developed a similar procedure based on pair-wise alignments for sequences with functional sites. We show how a correlation coefficient between sequence similarity and functional homology can be used to compare the efficiency of different similarity measures and choose a nonarbitrary...

  17. The last Fermat theorem. The story of the riddle that has defied the greatest minds in the world during 358 years

    International Nuclear Information System (INIS)

    Singh, S.

    1998-01-01

    Pierre de Fermat, one of the greatest French mathematician of the seventeenth century, noticed in the margin of his exercise book 'X n + Y n Z n impossible if n upper than 2, i have found a wonderful solution but i am short of place to develop it here'. Only in 1993 a young British man, Andrew Wiles, professor at Princeton, after seven years of work settled this riddle. That is that story that is told here. (N.C.)

  18. The Greatest Show on Earth

    Indian Academy of Sciences (India)

    Darwin and Alfred Russel Wallace: life on earth had evolved ... over long epochs, the pace of change was infinitesimal. ... Thanks to the work of the Japanese theoreti- cian Motoo ... pleasure-minus-expenditure balance is posi- tive. This way of ...

  19. Freedom's Greatest Threat, The Metaterrorist

    National Research Council Canada - National Science Library

    Jones, Gary

    1997-01-01

    The end of the Cold War ushered on to the world scene a new hybrid of terrorist. This new breed of criminal is called the metaterrorist, because his art of instilling terror goes beyond anything we have ever seen in the past...

  20. Climate change: Wilderness's greatest challenge

    Science.gov (United States)

    Nathan L. Stephenson; Connie Millar

    2014-01-01

    Anthropogenic climatic change can no longer be considered an abstract possibility. It is here, its effects are already evident, and changes are expected to accelerate in coming decades, profoundly altering wilderness ecosystems. At the most fundamental level, wilderness stewards will increasingly be confronted with a trade-off between untrammeled wilderness character...

  1. Optimal neighborhood indexing for protein similarity search.

    Science.gov (United States)

    Peterlongo, Pierre; Noé, Laurent; Lavenier, Dominique; Nguyen, Van Hoa; Kucherov, Gregory; Giraud, Mathieu

    2008-12-16

    Similarity inference, one of the main bioinformatics tasks, has to face an exponential growth of the biological data. A classical approach used to cope with this data flow involves heuristics with large seed indexes. In order to speed up this technique, the index can be enhanced by storing additional information to limit the number of random memory accesses. However, this improvement leads to a larger index that may become a bottleneck. In the case of protein similarity search, we propose to decrease the index size by reducing the amino acid alphabet. The paper presents two main contributions. First, we show that an optimal neighborhood indexing combining an alphabet reduction and a longer neighborhood leads to a reduction of 35% of memory involved into the process, without sacrificing the quality of results nor the computational time. Second, our approach led us to develop a new kind of substitution score matrices and their associated e-value parameters. In contrast to usual matrices, these matrices are rectangular since they compare amino acid groups from different alphabets. We describe the method used for computing those matrices and we provide some typical examples that can be used in such comparisons. Supplementary data can be found on the website http://bioinfo.lifl.fr/reblosum. We propose a practical index size reduction of the neighborhood data, that does not negatively affect the performance of large-scale search in protein sequences. Such an index can be used in any study involving large protein data. Moreover, rectangular substitution score matrices and their associated statistical parameters can have applications in any study involving an alphabet reduction.

  2. Optimal neighborhood indexing for protein similarity search

    Directory of Open Access Journals (Sweden)

    Nguyen Van

    2008-12-01

    Full Text Available Abstract Background Similarity inference, one of the main bioinformatics tasks, has to face an exponential growth of the biological data. A classical approach used to cope with this data flow involves heuristics with large seed indexes. In order to speed up this technique, the index can be enhanced by storing additional information to limit the number of random memory accesses. However, this improvement leads to a larger index that may become a bottleneck. In the case of protein similarity search, we propose to decrease the index size by reducing the amino acid alphabet. Results The paper presents two main contributions. First, we show that an optimal neighborhood indexing combining an alphabet reduction and a longer neighborhood leads to a reduction of 35% of memory involved into the process, without sacrificing the quality of results nor the computational time. Second, our approach led us to develop a new kind of substitution score matrices and their associated e-value parameters. In contrast to usual matrices, these matrices are rectangular since they compare amino acid groups from different alphabets. We describe the method used for computing those matrices and we provide some typical examples that can be used in such comparisons. Supplementary data can be found on the website http://bioinfo.lifl.fr/reblosum. Conclusion We propose a practical index size reduction of the neighborhood data, that does not negatively affect the performance of large-scale search in protein sequences. Such an index can be used in any study involving large protein data. Moreover, rectangular substitution score matrices and their associated statistical parameters can have applications in any study involving an alphabet reduction.

  3. Contrasting HIV phylogenetic relationships and V3 loop protein similarities

    Energy Technology Data Exchange (ETDEWEB)

    Korber, B. (Los Alamos National Lab., NM (United States) Santa Fe Inst., NM (United States)); Myers, G. (Los Alamos National Lab., NM (United States))

    1992-01-01

    At least five distinct sequence subtypes of HIV-I can be identified from the major centers of the AMS pandemic. While it is too early to tell whether these subtypes are serologically or phenotypically similar or distinct in terms of properties such as pathogenicity and transmissibility, we can begin to investigate their potential for phenotypic divergence at the protein sequence level. Phylogenetic analysis of HIV DNA sequences is being widely used to examine lineages of different viral strains as they evolve and spread throughout the globe. We have identified five distinct HIV-1 subtypes (designated A-E), or clades, based on phylogenetic clustering patterns generated from genetic information from both the gag and envelope (env) genes from a spectrum of international isolates. Our initial observations concerning both HIV-1 and HIV-2 sequences indicate that conserved patterns in protein chemistry may indeed exist across distant lineages. Such patterns in V3 loop amino acid chemistry may be indicative of stable lineages or convergence within this highly variable, though functionally and immunologically critical, region. We think that there may be parallels between the apparently stable HIV-2 V3 lineage and the previously mentioned HIV-1 V3 loops which are very similar at the protein level despite being distant by cladistic analysis, and which do not possess the distinctive positively charged residues. Highly conserved V3 loop protein sequences are also encountered in SIVAGMs and CIVs (chimpanzee viral strains), which do not appear to be pathogenic in their wild-caught natural hosts.

  4. Contrasting HIV phylogenetic relationships and V3 loop protein similarities

    Energy Technology Data Exchange (ETDEWEB)

    Korber, B. [Los Alamos National Lab., NM (United States)]|[Santa Fe Inst., NM (United States); Myers, G. [Los Alamos National Lab., NM (United States)

    1992-12-31

    At least five distinct sequence subtypes of HIV-I can be identified from the major centers of the AMS pandemic. While it is too early to tell whether these subtypes are serologically or phenotypically similar or distinct in terms of properties such as pathogenicity and transmissibility, we can begin to investigate their potential for phenotypic divergence at the protein sequence level. Phylogenetic analysis of HIV DNA sequences is being widely used to examine lineages of different viral strains as they evolve and spread throughout the globe. We have identified five distinct HIV-1 subtypes (designated A-E), or clades, based on phylogenetic clustering patterns generated from genetic information from both the gag and envelope (env) genes from a spectrum of international isolates. Our initial observations concerning both HIV-1 and HIV-2 sequences indicate that conserved patterns in protein chemistry may indeed exist across distant lineages. Such patterns in V3 loop amino acid chemistry may be indicative of stable lineages or convergence within this highly variable, though functionally and immunologically critical, region. We think that there may be parallels between the apparently stable HIV-2 V3 lineage and the previously mentioned HIV-1 V3 loops which are very similar at the protein level despite being distant by cladistic analysis, and which do not possess the distinctive positively charged residues. Highly conserved V3 loop protein sequences are also encountered in SIVAGMs and CIVs (chimpanzee viral strains), which do not appear to be pathogenic in their wild-caught natural hosts.

  5. Polynomial sequences generated by infinite Hessenberg matrices

    Directory of Open Access Journals (Sweden)

    Verde-Star Luis

    2017-01-01

    Full Text Available We show that an infinite lower Hessenberg matrix generates polynomial sequences that correspond to the rows of infinite lower triangular invertible matrices. Orthogonal polynomial sequences are obtained when the Hessenberg matrix is tridiagonal. We study properties of the polynomial sequences and their corresponding matrices which are related to recurrence relations, companion matrices, matrix similarity, construction algorithms, and generating functions. When the Hessenberg matrix is also Toeplitz the polynomial sequences turn out to be of interpolatory type and we obtain additional results. For example, we show that every nonderogative finite square matrix is similar to a unique Toeplitz-Hessenberg matrix.

  6. Memory and learning with rapid audiovisual sequences

    Science.gov (United States)

    Keller, Arielle S.; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed. PMID:26575193

  7. Memory and learning with rapid audiovisual sequences.

    Science.gov (United States)

    Keller, Arielle S; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed.

  8. Comparative analysis of sequences from PT 2013

    DEFF Research Database (Denmark)

    Mikkelsen, Susie Sommer

    Sheatfish and not EHNV. Generally, mistakes occurred at the ends of the sequences. This can be due to several factors. One is that the sequence has not been trimmed of the sequence primer sites. Another is the lack of quality control of the chromatogram. Finally, sequencing in just one direction can result...... diseases in Europe. As part of the EURL proficiency test for fish diseases it is required to sequence any RANA virus isolates found in any of the samples. It is also highly recommended to sequence the ISA virus to determine whether it be HPRΔ or HPR0. Furthermore, it is recommended that any VHSV and IHNV...... isolates be genotyped. As part of the evaluation of the proficiency results it was decided this year to look into the quality and similarity of the sequence results for selected viruses. Ampoule III in the proficiency test 2013 contained an EHNV isolate. The EURL received 43 sequences from 41 laboratories...

  9. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  10. The nucleotide sequence of 5S ribosomal RNA from Micrococcus lysodeikticus.

    Science.gov (United States)

    Hori, H; Osawa, S; Murao, K; Ishikura, H

    1980-01-01

    The nucleotide sequence of ribosomal 5S RNA from Micrococcus lysodeikticus is pGUUACGGCGGCUAUAGCGUGGGGGAAACGCCCGGCCGUAUAUCGAACCCGGAAGCUAAGCCCCAUAGCGCCGAUGGUUACUGUAACCGGGAGGUUGUGGGAGAGUAGGUCGCCGCCGUGAOH. When compared to other 5S RNAs, the sequence homology is greatest with Thermus aquaticus, and these two 5S RNAs reveal several features intermediate between those of typical gram-positive bacteria and gram-negative bacteria. PMID:6780979

  11. Rapid Diagnostics of Onboard Sequences

    Science.gov (United States)

    Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

    2012-01-01

    Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command

  12. The RNA world, automatic sequences and oncogenetics

    Energy Technology Data Exchange (ETDEWEB)

    Tahir Shah, K

    1993-04-01

    We construct a model of the RNA world in terms of naturally evolving nucleotide sequences assuming only Crick-Watson base pairing and self-cleaving/splicing capability. These sequences have the following properties. (1) They are recognizable by an automation (or automata). That is, to each k-sequence, there exist a k-automation which accepts, recognizes or generates the k-sequence. These are known as automatic sequences. Fibonacci and Morse-Thue sequences are the most natural outcome of pre-biotic chemical conditions. (2) Infinite (resp. large) sequences are self-similar (resp. nearly self-similar) under certain rewrite rules and consequently give rise to fractal (resp.fractal-like) structures. Computationally, such sequences can also be generated by their corresponding deterministic parallel re-write system, known as a DOL system. The self-similar sequences are fixed points of their respective rewrite rules. Some of these automatic sequences have the capability that they can read or ``accept`` other sequences while others can detect errors and trigger error-correcting mechanisms. They can be enlarged and have block and/or palindrome structure. Linear recurring sequences such as Fibonacci sequence are simply Feed-back Shift Registers, a well know model of information processing machines. We show that a mutation of any rewrite rule can cause a combinatorial explosion of error and relates this to oncogenetical behavior. On the other hand, a mutation of sequences that are not rewrite rules, leads to normal evolutionary change. Known experimental results support our hypothesis. (author). Refs.

  13. The RNA world, automatic sequences and oncogenetics

    International Nuclear Information System (INIS)

    Tahir Shah, K.

    1993-04-01

    We construct a model of the RNA world in terms of naturally evolving nucleotide sequences assuming only Crick-Watson base pairing and self-cleaving/splicing capability. These sequences have the following properties. 1) They are recognizable by an automation (or automata). That is, to each k-sequence, there exist a k-automation which accepts, recognizes or generates the k-sequence. These are known as automatic sequences. Fibonacci and Morse-Thue sequences are the most natural outcome of pre-biotic chemical conditions. 2) Infinite (resp. large) sequences are self-similar (resp. nearly self-similar) under certain rewrite rules and consequently give rise to fractal (resp.fractal-like) structures. Computationally, such sequences can also be generated by their corresponding deterministic parallel re-write system, known as a DOL system. The self-similar sequences are fixed points of their respective rewrite rules. Some of these automatic sequences have the capability that they can read or 'accept' other sequences while others can detect errors and trigger error-correcting mechanisms. They can be enlarged and have block and/or palindrome structure. Linear recurring sequences such as Fibonacci sequence are simply Feed-back Shift Registers, a well know model of information processing machines. We show that a mutation of any rewrite rule can cause a combinatorial explosion of error and relates this to oncogenetical behavior. On the other hand, a mutation of sequences that are not rewrite rules, leads to normal evolutionary change. Known experimental results support our hypothesis. (author). Refs

  14. Shotgun protein sequencing.

    Energy Technology Data Exchange (ETDEWEB)

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  15. The Need For ``Pleasure in Finding Things Out:'' The Use of History and Our Greatest Scientists for Human Survival and Scientific Integrity

    Science.gov (United States)

    Borchardt, Joshua

    2011-03-01

    Why Homo sapiens search for interesting things and the methods of which we do so. The use of philosophical, theoretical, and demonstrated processes for exploration of the natural, and not so natural world are presented based on the ideas and wishes of some of History's greatest scientists, with concentration on Richard P. Feynman's lens on scientific discovery and pursuit, for which the abstract gets its title. This talk is presented towards the layman as well as the physicist, and gives insight to the nature of discovery and what it means to have pleasure in finding things out for the betterment of all mankind.

  16. Development of similarity theory for control systems

    Science.gov (United States)

    Myshlyaev, L. P.; Evtushenko, V. F.; Ivushkin, K. A.; Makarov, G. V.

    2018-05-01

    The area of effective application of the traditional similarity theory and the need necessity of its development for systems are discussed. The main statements underlying the similarity theory of control systems are given. The conditions for the similarity of control systems and the need for similarity control control are formulated. Methods and algorithms for estimating and similarity control of control systems and the results of research of control systems based on their similarity are presented. The similarity control of systems includes the current evaluation of the degree of similarity of control systems and the development of actions controlling similarity, and the corresponding targeted change in the state of any element of control systems.

  17. Sequence Read Archive (SRA)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome...

  18. Sleep Disturbances in Adults With Arthritis: Prevalence, Mediators, and Subgroups at Greatest Risk. Data From the 2007 National Health Interview Survey

    Science.gov (United States)

    LOUIE, GRANT H.; TEKTONIDOU, MARIA G.; CABAN-MARTINEZ, ALBERTO J.; WARD, MICHAEL M.

    2012-01-01

    Objective To examine the prevalence of sleep disturbances in adults with arthritis in a nationally representative sample, mediators of sleep difficulties, and subgroups of individuals with arthritis at greatest risk. Methods Using data on US adults ages ≥18 years participating in the 2007 National Health Interview Survey, we computed the prevalence of 3 measures of sleep disturbance (insomnia, excessive daytime sleepiness, and sleep duration arthritis. We used logistic regression analysis to examine if the association of arthritis and sleep disturbances was independent of sociodemographic characteristics and comorbidities, and to identify potential mediators. We used classification trees to identify subgroups at higher risk. Results The adjusted prevalence of insomnia was higher among adults with arthritis than those without arthritis (23.1% versus 16.4%; P arthritis were more likely than those without arthritis to report insomnia (unadjusted odds ratio 2.92, 95% confidence interval 2.68 –3.17), but adjustment for sociodemographic characteristics and comorbidities attenuated this association. Joint pain and limitation due to pain mediated the association between arthritis and insomnia. Among adults with arthritis, those with depression and anxiety were at highest risk for sleep disturbance. Results for excessive daytime sleepiness and sleep duration arthritis, and is mediated by joint pain and limitation due to pain. Among individuals with arthritis, those with depression and anxiety are at greatest risk. PMID:20890980

  19. Will Climate Change, Genetic and Demographic Variation or Rat Predation Pose the Greatest Risk for Persistence of an Altitudinally Distributed Island Endemic?

    Directory of Open Access Journals (Sweden)

    Alison Shapcott

    2012-11-01

    Full Text Available Species endemic to mountains on oceanic islands are subject to a number of existing threats (in particular, invasive species along with the impacts of a rapidly changing climate. The Lord Howe Island endemic palm Hedyscepe canterburyana is restricted to two mountains above 300 m altitude. Predation by the introduced Black Rat (Rattus rattus is known to significantly reduce seedling recruitment. We examined the variation in Hedyscepe in terms of genetic variation, morphology, reproductive output and demographic structure, across an altitudinal gradient. We used demographic data to model population persistence under climate change predictions of upward range contraction incorporating long-term climatic records for Lord Howe Island. We also accounted for alternative levels of rat predation into the model to reflect management options for control. We found that Lord Howe Island is getting warmer and drier and quantified the degree of temperature change with altitude (0.9 °C per 100 m. For H. canterburyana, differences in development rates, population structure, reproductive output and population growth rate were identified between altitudes. In contrast, genetic variation was high and did not vary with altitude. There is no evidence of an upward range contraction as was predicted and recruitment was greatest at lower altitudes. Our models predicted slow population decline in the species and that the highest altitude populations are under greatest threat of extinction. Removal of rat predation would significantly enhance future persistence of this species.

  20. A Signal Processing Method to Explore Similarity in Protein Flexibility

    Directory of Open Access Journals (Sweden)

    Simina Vasilache

    2010-01-01

    Full Text Available Understanding mechanisms of protein flexibility is of great importance to structural biology. The ability to detect similarities between proteins and their patterns is vital in discovering new information about unknown protein functions. A Distance Constraint Model (DCM provides a means to generate a variety of flexibility measures based on a given protein structure. Although information about mechanical properties of flexibility is critical for understanding protein function for a given protein, the question of whether certain characteristics are shared across homologous proteins is difficult to assess. For a proper assessment, a quantified measure of similarity is necessary. This paper begins to explore image processing techniques to quantify similarities in signals and images that characterize protein flexibility. The dataset considered here consists of three different families of proteins, with three proteins in each family. The similarities and differences found within flexibility measures across homologous proteins do not align with sequence-based evolutionary methods.

  1. Marriage Matters: Spousal Similarity in Life Satisfaction

    OpenAIRE

    Ulrich Schimmack; Richard Lucas

    2006-01-01

    Examined the concurrent and cross-lagged spousal similarity in life satisfaction over a 21-year period. Analyses were based on married couples (N = 847) in the German Socio-Economic Panel (SOEP). Concurrent spousal similarity was considerably higher than one-year retest similarity, revealing spousal similarity in the variable component of life satisfac-tion. Spousal similarity systematically decreased with length of retest interval, revealing simi-larity in the changing component of life sati...

  2. Similarity of Symbol Frequency Distributions with Heavy Tails

    Directory of Open Access Journals (Sweden)

    Martin Gerlach

    2016-04-01

    Full Text Available Quantifying the similarity between symbolic sequences is a traditional problem in information theory which requires comparing the frequencies of symbols in different sequences. In numerous modern applications, ranging from DNA over music to texts, the distribution of symbol frequencies is characterized by heavy-tailed distributions (e.g., Zipf’s law. The large number of low-frequency symbols in these distributions poses major difficulties to the estimation of the similarity between sequences; e.g., they hinder an accurate finite-size estimation of entropies. Here, we show analytically how the systematic (bias and statistical (fluctuations errors in these estimations depend on the sample size N and on the exponent γ of the heavy-tailed distribution. Our results are valid for the Shannon entropy (α=1, its corresponding similarity measures (e.g., the Jensen-Shanon divergence, and also for measures based on the generalized entropy of order α. For small α’s, including α=1, the errors decay slower than the 1/N decay observed in short-tailed distributions. For α larger than a critical value α^{*}=1+1/γ≤2, the 1/N decay is recovered. We show the practical significance of our results by quantifying the evolution of the English language over the last two centuries using a complete α spectrum of measures. We find that frequent words change more slowly than less frequent words and that α=2 provides the most robust measure to quantify language change.

  3. Is the phonological similarity effect in working memory due to proactive interference?

    Science.gov (United States)

    Baddeley, Alan D; Hitch, Graham J; Quinlan, Philip T

    2018-04-12

    Immediate serial recall of verbal material is highly sensitive to impairment attributable to phonological similarity. Although this has traditionally been interpreted as a within-sequence similarity effect, Engle (2007) proposed an interpretation based on interference from prior sequences, a phenomenon analogous to that found in the Peterson short-term memory (STM) task. We use the method of serial reconstruction to test this in an experiment contrasting the standard paradigm in which successive sequences are drawn from the same set of phonologically similar or dissimilar words and one in which the vowel sound on which similarity is based is switched from trial to trial, a manipulation analogous to that producing release from PI in the Peterson task. A substantial similarity effect occurs under both conditions although there is a small advantage from switching across similar sequences. There is, however, no evidence for the suggestion that the similarity effect will be absent from the very first sequence tested. Our results support the within-sequence similarity rather than a between-list PI interpretation. Reasons for the contrast with the classic Peterson short-term forgetting task are briefly discussed. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  4. The Large Hadron Collider the greatest adventure in town and ten reasons why it matters, as illustrated by the ATLAS experiment

    CERN Document Server

    Millington, Andrew J; MacPherson, Rob; Nordberg, Markus

    2016-01-01

    When the discovery of the Higgs Boson at CERN hit the headlines in 2012, the world was stunned by this achievement of modern science. Less well appreciated, however, were the many ways in which this benefited wider society. The Large Hadron Collider — The Greatest Adventure in Town charts a path through the cultural, economic and medical gains of modern particle physics. It illustrates these messages through the ATLAS experiment at CERN, one of the two big experiments which found the Higgs particle. Moving clear of in-depth physics analysis, it draws on the unparalleled curiosity about particle physics aroused by the Higgs discovery, and relates it to developments familiar in the modern world, including the Internet, its successor "The Grid", and the latest cancer treatments. In this book, advances made from developing the 27 kilometre particle accelerator and its detectors are presented with the benefit of first hand interviews and are extensively illustrated throughout. Interviewees are leading physicis...

  5. On different forms of self similarity

    International Nuclear Information System (INIS)

    Aswathy, R.K.; Mathew, Sunil

    2016-01-01

    Fractal geometry is mainly based on the idea of self-similar forms. To be self-similar, a shape must able to be divided into parts that are smaller copies, which are more or less similar to the whole. There are different forms of self similarity in nature and mathematics. In this paper, some of the topological properties of super self similar sets are discussed. It is proved that in a complete metric space with two or more elements, the set of all non super self similar sets are dense in the set of all non-empty compact sub sets. It is also proved that the product of self similar sets are super self similar in product metric spaces and that the super self similarity is preserved under isometry. A characterization of super self similar sets using contracting sub self similarity is also presented. Some relevant counterexamples are provided. The concepts of exact super and sub self similarity are introduced and a necessary and sufficient condition for a set to be exact super self similar in terms of condensation iterated function systems (Condensation IFS’s) is obtained. A method to generate exact sub self similar sets using condensation IFS’s and the denseness of exact super self similar sets are also discussed.

  6. Microbiota epitope similarity either dampens or enhances the immunogenicity of disease-associated antigenic epitopes.

    Directory of Open Access Journals (Sweden)

    Sebastian Carrasco Pro

    Full Text Available The microbiome influences adaptive immunity and molecular mimicry influences T cell reactivity. Here, we evaluated whether the sequence similarity of various antigens to the microbiota dampens or increases immunogenicity of T cell epitopes. Sets of epitopes and control sequences derived from 38 antigenic categories (infectious pathogens, allergens, autoantigens were retrieved from the Immune Epitope Database (IEDB. Their similarity to microbiome sequences was calculated using the BLOSUM62 matrix. We found that sequence similarity was associated with either dampened (tolerogenic; e.g. most allergens or increased (inflammatory; e.g. Dengue and West Nile viruses likelihood of a peptide being immunogenic as a function of epitope source category. Ten-fold cross-validation and validation using sets of manually curated epitopes and non-epitopes derived from allergens were used to confirm these initial observations. Furthermore, the genus from which the microbiome homologous sequences were derived influenced whether a tolerogenic versus inflammatory modulatory effect was observed, with Fusobacterium most associated with inflammatory influences and Bacteroides most associated with tolerogenic influences. We validated these effects using PBMCs stimulated with various sets of microbiome peptides. "Tolerogenic" microbiome peptides elicited IL-10 production, "inflammatory" peptides elicited mixed IL-10/IFNγ production, while microbiome epitopes homologous to self were completely unreactive for both cytokines. We also tested the sequence similarity of cockroach epitopes to specific microbiome sequences derived from households of cockroach allergic individuals and non-allergic controls. Microbiomes from cockroach allergic households were less likely to contain sequences homologous to previously defined cockroach allergens. These results are compatible with the hypothesis that microbiome sequences may contribute to the tolerization of T cells for allergen

  7. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko; Tanaka, Tsuyoshi; Ohyanagi, Hajime; Hsing, Yue-Ie C.; Itoh, Takeshi

    2018-01-01

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  8. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko

    2018-02-14

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  9. Sequence Factorization with Multiple References.

    Directory of Open Access Journals (Sweden)

    Sebastian Wandelt

    Full Text Available The success of high-throughput sequencing has lead to an increasing number of projects which sequence large populations of a species. Storage and analysis of sequence data is a key challenge in these projects, because of the sheer size of the datasets. Compression is one simple technology to deal with this challenge. Referential factorization and compression schemes, which store only the differences between input sequence and a reference sequence, gained lots of interest in this field. Highly-similar sequences, e.g., Human genomes, can be compressed with a compression ratio of 1,000:1 and more, up to two orders of magnitude better than with standard compression techniques. Recently, it was shown that the compression against multiple references from the same species can boost the compression ratio up to 4,000:1. However, a detailed analysis of using multiple references is lacking, e.g., for main memory consumption and optimality. In this paper, we describe one key technique for the referential compression against multiple references: The factorization of sequences. Based on the notion of an optimal factorization, we propose optimization heuristics and identify parameter settings which greatly influence 1 the size of the factorization, 2 the time for factorization, and 3 the required amount of main memory. We evaluate a total of 30 setups with a varying number of references on data from three different species. Our results show a wide range of factorization sizes (optimal to an overhead of up to 300%, factorization speed (0.01 MB/s to more than 600 MB/s, and main memory usage (few dozen MB to dozens of GB. Based on our evaluation, we identify the best configurations for common use cases. Our evaluation shows that multi-reference factorization is much better than single-reference factorization.

  10. Nonparametric combinatorial sequence models.

    Science.gov (United States)

    Wauthier, Fabian L; Jordan, Michael I; Jojic, Nebojsa

    2011-11-01

    This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This article presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three biological sequence families which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution over sequence representations induced by the prior. By integrating out the posterior, our method compares favorably to leading binding predictors.

  11. Large margin classification with indefinite similarities

    KAUST Repository

    Alabdulmohsin, Ibrahim; Cisse, Moustapha; Gao, Xin; Zhang, Xiangliang

    2016-01-01

    Classification with indefinite similarities has attracted attention in the machine learning community. This is partly due to the fact that many similarity functions that arise in practice are not symmetric positive semidefinite, i.e. the Mercer

  12. Testing Self-Similarity Through Lamperti Transformations

    KAUST Repository

    Lee, Myoungji; Genton, Marc G.; Jun, Mikyoung

    2016-01-01

    extensively, while statistical tests for self-similarity are scarce and limited to processes indexed in one dimension. This paper proposes a statistical hypothesis test procedure for self-similarity of a stochastic process indexed in one dimension and multi

  13. Personality similarity and life satisfaction in couples

    OpenAIRE

    Furler Katrin; Gomez Veronica; Grob Alexander

    2013-01-01

    The present study examined the association between personality similarity and life satisfaction in a large nationally representative sample of 1608 romantic couples. Similarity effects were computed for the Big Five personality traits as well as for personality profiles with global and differentiated indices of similarity. Results showed substantial actor and partner effects indicating that both partners' personality traits were related to both partners' life satisfaction. Personality similar...

  14. Retinoid-binding proteins: similar protein architectures bind similar ligands via completely different ways.

    Directory of Open Access Journals (Sweden)

    Yu-Ru Zhang

    Full Text Available BACKGROUND: Retinoids are a class of compounds that are chemically related to vitamin A, which is an essential nutrient that plays a key role in vision, cell growth and differentiation. In vivo, retinoids must bind with specific proteins to perform their necessary functions. Plasma retinol-binding protein (RBP and epididymal retinoic acid binding protein (ERABP carry retinoids in bodily fluids, while cellular retinol-binding proteins (CRBPs and cellular retinoic acid-binding proteins (CRABPs carry retinoids within cells. Interestingly, although all of these transport proteins possess similar structures, the modes of binding for the different retinoid ligands with their carrier proteins are different. METHODOLOGY/PRINCIPAL FINDINGS: In this work, we analyzed the various retinoid transport mechanisms using structure and sequence comparisons, binding site analyses and molecular dynamics simulations. Our results show that in the same family of proteins and subcellular location, the orientation of a retinoid molecule within a binding protein is same, whereas when different families of proteins are considered, the orientation of the bound retinoid is completely different. In addition, none of the amino acid residues involved in ligand binding is conserved between the transport proteins. However, for each specific binding protein, the amino acids involved in the ligand binding are conserved. The results of this study allow us to propose a possible transport model for retinoids. CONCLUSIONS/SIGNIFICANCE: Our results reveal the differences in the binding modes between the different retinoid-binding proteins.

  15. DNA SEQUENCE SIMILARITY REQUIREMENTS FOR INTERSPECIFIC RECOMBINATION IN BACILLUS. (R825348)

    Science.gov (United States)

    The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...

  16. Similar network activated by young and elderly adults during the acquisition of a motor sequence.

    NARCIS (Netherlands)

    Daselaar, S.M.; Veltman, D.J.; Rombouts, S.A.R.B.; Raaijmakers, J.G.W.; Jonker, C.

    2003-01-01

    Age-related impairments in episodic memory have been related to a deficiency in semantic processing, based on the finding that elderly adults typically benefit less than young adults from deep, semantic as opposed to shallow, nonsemantic processing of study items. In the present study, we tested the

  17. Scaling Relations of Local Magnitude versus Moment Magnitude for Sequences of Similar Earthquakes in Switzerland

    KAUST Repository

    Bethmann, F.; Deichmann, N.; Mai, Paul Martin

    2011-01-01

    Theoretical considerations and empirical regressions show that, in the magnitude range between 3 and 5, local magnitude, ML, and moment magnitude, Mw, scale 1:1. Previous studies suggest that for smaller magnitudes this 1:1 scaling breaks down

  18. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  19. An approach to large scale identification of non-obvious structural similarities between proteins

    Science.gov (United States)

    Cherkasov, Artem; Jones, Steven JM

    2004-01-01

    Background A new sequence independent bioinformatics approach allowing genome-wide search for proteins with similar three dimensional structures has been developed. By utilizing the numerical output of the sequence threading it establishes putative non-obvious structural similarities between proteins. When applied to the testing set of proteins with known three dimensional structures the developed approach was able to recognize structurally similar proteins with high accuracy. Results The method has been developed to identify pathogenic proteins with low sequence identity and high structural similarity to host analogues. Such protein structure relationships would be hypothesized to arise through convergent evolution or through ancient horizontal gene transfer events, now undetectable using current sequence alignment techniques. The pathogen proteins, which could mimic or interfere with host activities, would represent candidate virulence factors. The developed approach utilizes the numerical outputs from the sequence-structure threading. It identifies the potential structural similarity between a pair of proteins by correlating the threading scores of the corresponding two primary sequences against the library of the standard folds. This approach allowed up to 64% sensitivity and 99.9% specificity in distinguishing protein pairs with high structural similarity. Conclusion Preliminary results obtained by comparison of the genomes of Homo sapiens and several strains of Chlamydia trachomatis have demonstrated the potential usefulness of the method in the identification of bacterial proteins with known or potential roles in virulence. PMID:15147578

  20. An approach to large scale identification of non-obvious structural similarities between proteins

    Directory of Open Access Journals (Sweden)

    Cherkasov Artem

    2004-05-01

    Full Text Available Abstract Background A new sequence independent bioinformatics approach allowing genome-wide search for proteins with similar three dimensional structures has been developed. By utilizing the numerical output of the sequence threading it establishes putative non-obvious structural similarities between proteins. When applied to the testing set of proteins with known three dimensional structures the developed approach was able to recognize structurally similar proteins with high accuracy. Results The method has been developed to identify pathogenic proteins with low sequence identity and high structural similarity to host analogues. Such protein structure relationships would be hypothesized to arise through convergent evolution or through ancient horizontal gene transfer events, now undetectable using current sequence alignment techniques. The pathogen proteins, which could mimic or interfere with host activities, would represent candidate virulence factors. The developed approach utilizes the numerical outputs from the sequence-structure threading. It identifies the potential structural similarity between a pair of proteins by correlating the threading scores of the corresponding two primary sequences against the library of the standard folds. This approach allowed up to 64% sensitivity and 99.9% specificity in distinguishing protein pairs with high structural similarity. Conclusion Preliminary results obtained by comparison of the genomes of Homo sapiens and several strains of Chlamydia trachomatis have demonstrated the potential usefulness of the method in the identification of bacterial proteins with known or potential roles in virulence.

  1. Wasting and stunting - similarities and differences

    DEFF Research Database (Denmark)

    Briend, André; Khara, Tanya; Dolan, Carmel

    2015-01-01

    that to decrease malnutrition-related mortality, interventions should aim at preventing both wasting and stunting, which often share common causes. Also, this suggests that treatment interventions should focus on children who are both wasted and stunted and therefore have the greatest deficits in muscle mass...... to target young wasted and stunted children efficiently in situations where these two conditions are present. Wasting is also associated with decreased fat mass. A decreased fat mass is frequent but inconsistent in stunting. Fat secretes multiple hormones, including leptin, which may have a stimulating...... mass. Leptin may also have an effect on bone growth. This may explain why wasted children with low fat stores have reduced linear growth when their weight-for-height remains low. It may also explain the frequent association of stunting with previous episodes of wasting. Stunting, however, can occur...

  2. The Implementation of APIQ Creative Mathematics Game Method in the Subject Matter of Greatest Common Factor and Least Common Multiple in Elementary School

    Science.gov (United States)

    Rahman, Abdul; Saleh Ahmar, Ansari; Arifin, A. Nurani M.; Upu, Hamzah; Mulbar, Usman; Alimuddin; Arsyad, Nurdin; Ruslan; Rusli; Djadir; Sutamrin; Hamda; Minggi, Ilham; Awi; Zaki, Ahmad; Ahmad, Asdar; Ihsan, Hisyam

    2018-01-01

    One of causal factors for uninterested feeling of the students in learning mathematics is a monotonous learning method, like in traditional learning method. One of the ways for motivating students to learn mathematics is by implementing APIQ (Aritmetika Plus Intelegensi Quantum) creative mathematics game method. The purposes of this research are (1) to describe students’ responses toward the implementation of APIQ creative mathematics game method on the subject matter of Greatest Common Factor (GCF) and Least Common Multiple (LCM) and (2) to find out whether by implementing this method, the student’s learning completeness will improve or not. Based on the results of this research, it is shown that the responses of the students toward the implementation of APIQ creative mathematics game method in the subject matters of GCF and LCM were good. It is seen in the percentage of the responses were between 76-100%. (2) The implementation of APIQ creative mathematics game method on the subject matters of GCF and LCM improved the students’ learning.

  3. Long sequence correlation coprocessor

    Science.gov (United States)

    Gage, Douglas W.

    1994-09-01

    A long sequence correlation coprocessor (LSCC) accelerates the bitwise correlation of arbitrarily long digital sequences by calculating in parallel the correlation score for 16, for example, adjacent bit alignments between two binary sequences. The LSCC integrated circuit is incorporated into a computer system with memory storage buffers and a separate general purpose computer processor which serves as its controller. Each of the LSCC's set of sequential counters simultaneously tallies a separate correlation coefficient. During each LSCC clock cycle, computer enable logic associated with each counter compares one bit of a first sequence with one bit of a second sequence to increment the counter if the bits are the same. A shift register assures that the same bit of the first sequence is simultaneously compared to different bits of the second sequence to simultaneously calculate the correlation coefficient by the different counters to represent different alignments of the two sequences.

  4. Roles of repetitive sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bell, G.I.

    1991-12-31

    The DNA of higher eukaryotes contains many repetitive sequences. The study of repetitive sequences is important, not only because many have important biological function, but also because they provide information on genome organization, evolution and dynamics. In this paper, I will first discuss some generic effects that repetitive sequences will have upon genome dynamics and evolution. In particular, it will be shown that repetitive sequences foster recombination among, and turnover of, the elements of a genome. I will then consider some examples of repetitive sequences, notably minisatellite sequences and telomere sequences as examples of tandem repeats, without and with respectively known function, and Alu sequences as an example of interspersed repeats. Some other examples will also be considered in less detail.

  5. Anomaly Detection in Sequences

    Data.gov (United States)

    National Aeronautics and Space Administration — We present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that...

  6. DNA sequencing conference, 2

    Energy Technology Data Exchange (ETDEWEB)

    Cook-Deegan, R.M. [Georgetown Univ., Kennedy Inst. of Ethics, Washington, DC (United States); Venter, J.C. [National Inst. of Neurological Disorders and Strokes, Bethesda, MD (United States); Gilbert, W. [Harvard Univ., Cambridge, MA (United States); Mulligan, J. [Stanford Univ., CA (United States); Mansfield, B.K. [Oak Ridge National Lab., TN (United States)

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  7. sequenceMiner algorithm

    Data.gov (United States)

    National Aeronautics and Space Administration — Detecting and describing anomalies in large repositories of discrete symbol sequences. sequenceMiner has been open-sourced! Download the file below to try it out....

  8. Testing Self-Similarity Through Lamperti Transformations

    KAUST Repository

    Lee, Myoungji

    2016-07-14

    Self-similar processes have been widely used in modeling real-world phenomena occurring in environmetrics, network traffic, image processing, and stock pricing, to name but a few. The estimation of the degree of self-similarity has been studied extensively, while statistical tests for self-similarity are scarce and limited to processes indexed in one dimension. This paper proposes a statistical hypothesis test procedure for self-similarity of a stochastic process indexed in one dimension and multi-self-similarity for a random field indexed in higher dimensions. If self-similarity is not rejected, our test provides a set of estimated self-similarity indexes. The key is to test stationarity of the inverse Lamperti transformations of the process. The inverse Lamperti transformation of a self-similar process is a strongly stationary process, revealing a theoretical connection between the two processes. To demonstrate the capability of our test, we test self-similarity of fractional Brownian motions and sheets, their time deformations and mixtures with Gaussian white noise, and the generalized Cauchy family. We also apply the self-similarity test to real data: annual minimum water levels of the Nile River, network traffic records, and surface heights of food wrappings. © 2016, International Biometric Society.

  9. Similarity increases altruistic punishment in humans.

    Science.gov (United States)

    Mussweiler, Thomas; Ockenfels, Axel

    2013-11-26

    Humans are attracted to similar others. As a consequence, social networks are homogeneous in sociodemographic, intrapersonal, and other characteristics--a principle called homophily. Despite abundant evidence showing the importance of interpersonal similarity and homophily for human relationships, their behavioral correlates and cognitive foundations are poorly understood. Here, we show that perceived similarity substantially increases altruistic punishment, a key mechanism underlying human cooperation. We induced (dis)similarity perception by manipulating basic cognitive mechanisms in an economic cooperation game that included a punishment phase. We found that similarity-focused participants were more willing to punish others' uncooperative behavior. This influence of similarity is not explained by group identity, which has the opposite effect on altruistic punishment. Our findings demonstrate that pure similarity promotes reciprocity in ways known to encourage cooperation. At the same time, the increased willingness to punish norm violations among similarity-focused participants provides a rationale for why similar people are more likely to build stable social relationships. Finally, our findings show that altruistic punishment is differentially involved in encouraging cooperation under pure similarity vs. in-group conditions.

  10. Comparative genomic sequence analysis of strawberry and other rosids reveals significant microsynteny

    Directory of Open Access Journals (Sweden)

    Abbott Albert

    2010-06-01

    Full Text Available Abstract Background Fragaria belongs to the Rosaceae, an economically important family that includes a number of important fruit producing genera such as Malus and Prunus. Using genomic sequences from 50 Fragaria fosmids, we have examined the microsynteny between Fragaria and other plant models. Results In more than half of the strawberry fosmids, we found syntenic regions that are conserved in Populus, Vitis, Medicago and/or Arabidopsis with Populus containing the greatest number of syntenic regions with Fragaria. The longest syntenic region was between LG VIII of the poplar genome and the strawberry fosmid 72E18, where seven out of twelve predicted genes were collinear. We also observed an unexpectedly high level of conserved synteny between Fragaria (rosid I and Vitis (basal rosid. One of the strawberry fosmids, 34E24, contained a cluster of R gene analogs (RGAs with NBS and LRR domains. We detected clusters of RGAs with high sequence similarity to those in 34E24 in all the genomes compared. In the phylogenetic tree we have generated, all the NBS-LRR genes grouped together with Arabidopsis CNL-A type NBS-LRR genes. The Fragaria RGA grouped together with those of Vitis and Populus in the phylogenetic tree. Conclusions Our analysis shows considerable microsynteny between Fragaria and other plant genomes such as Populus, Medicago, Vitis, and Arabidopsis to a lesser degree. We also detected a cluster of NBS-LRR type genes that are conserved in all the genomes compared.

  11. Notions of similarity for systems biology models.

    Science.gov (United States)

    Henkel, Ron; Hoehndorf, Robert; Kacprowski, Tim; Knüpfer, Christian; Liebermeister, Wolfram; Waltemath, Dagmar

    2018-01-01

    Systems biology models are rapidly increasing in complexity, size and numbers. When building large models, researchers rely on software tools for the retrieval, comparison, combination and merging of models, as well as for version control. These tools need to be able to quantify the differences and similarities between computational models. However, depending on the specific application, the notion of 'similarity' may greatly vary. A general notion of model similarity, applicable to various types of models, is still missing. Here we survey existing methods for the comparison of models, introduce quantitative measures for model similarity, and discuss potential applications of combined similarity measures. To frame model comparison as a general problem, we describe a theoretical approach to defining and computing similarities based on a combination of different model aspects. The six aspects that we define as potentially relevant for similarity are underlying encoding, references to biological entities, quantitative behaviour, qualitative behaviour, mathematical equations and parameters and network structure. We argue that future similarity measures will benefit from combining these model aspects in flexible, problem-specific ways to mimic users' intuition about model similarity, and to support complex model searches in databases. © The Author 2016. Published by Oxford University Press.

  12. Similar speaker recognition using nonlinear analysis

    International Nuclear Information System (INIS)

    Seo, J.P.; Kim, M.S.; Baek, I.C.; Kwon, Y.H.; Lee, K.S.; Chang, S.W.; Yang, S.I.

    2004-01-01

    Speech features of the conventional speaker identification system, are usually obtained by linear methods in spectral space. However, these methods have the drawback that speakers with similar voices cannot be distinguished, because the characteristics of their voices are also similar in spectral space. To overcome the difficulty in linear methods, we propose to use the correlation exponent in the nonlinear space as a new feature vector for speaker identification among persons with similar voices. We show that our proposed method surprisingly reduces the error rate of speaker identification system to speakers with similar voices

  13. Factors within the family environment such as parents' dietary habits and fruit and vegetable availability have the greatest influence on fruit and vegetable consumption by Polish children.

    Science.gov (United States)

    Wolnicka, Katarzyna; Taraszewska, Anna Małgorzata; Jaczewska-Schuetz, Joanna; Jarosz, Mirosław

    2015-10-01

    To identify determinants of fruit and vegetable (F&V) consumption among school-aged children. A survey study was conducted in October 2010. The questionnaire contained questions concerning social and demographic data, lifestyle and dietary habits, particularly the frequency of F&V consumption, availability of F&V and knowledge about recommended amounts of F&V intake. Polish primary schools. Children (n 1255) aged 9 years from randomly selected primary schools and their parents. The children's consumption of fruit and of vegetables was influenced by the fruit consumption and vegetable consumption of their parents (r=0·333 and r=0·273, respectively; P=0·001), parents encouraging their children to eat F&V (r=0·259 and r=0·271, respectively; P=0·001), giving children F&V to take to school (r=0·338 and r=0·321, respectively; P=0·001) and the availability of F&V at home (r=0·200 and r=0·296, respectively; P=0·001). Parental education influenced only the frequency of fruit consumption (r=0·074; P=0·01). A correlation between parents' knowledge of the recommended intakes and the frequency of vegetable and fruit consumption by children was noticed (r=0·258 and r=0·192, respectively, P=0·001). Factors within the family environment such as parents' dietary habits and F&V availability had the greatest influence on the F&V consumption by children. Educational activities aimed at parents are crucial to increase the consumption of F&V among children.

  14. Entropic fluctuations in DNA sequences

    Science.gov (United States)

    Thanos, Dimitrios; Li, Wentian; Provata, Astero

    2018-03-01

    The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.

  15. On self-similar Tolman models

    International Nuclear Information System (INIS)

    Maharaj, S.D.

    1988-01-01

    The self-similar spherically symmetric solutions of the Einstein field equation for the case of dust are identified. These form a subclass of the Tolman models. These self-similar models contain the solution recently presented by Chi [J. Math. Phys. 28, 1539 (1987)], thereby refuting the claim of having found a new solution to the Einstein field equations

  16. Mining Diagnostic Assessment Data for Concept Similarity

    Science.gov (United States)

    Madhyastha, Tara; Hunt, Earl

    2009-01-01

    This paper introduces a method for mining multiple-choice assessment data for similarity of the concepts represented by the multiple choice responses. The resulting similarity matrix can be used to visualize the distance between concepts in a lower-dimensional space. This gives an instructor a visualization of the relative difficulty of concepts…

  17. Similarity indices I: what do they measure

    International Nuclear Information System (INIS)

    Johnston, J.W.

    1976-11-01

    A method for estimating the effects of environmental effusions on ecosystems is described. The characteristics of 25 similarity indices used in studies of ecological communities were investigated. The type of data structure, to which these indices are frequently applied, was described as consisting of vectors of measurements on attributes (species) observed in a set of samples. A general similarity index was characterized as the result of a two-step process defined on a pair of vectors. In the first step an attribute similarity score is obtained for each attribute by comparing the attribute values observed in the pair of vectors. The result is a vector of attribute similarity scores. These are combined in the second step to arrive at the similarity index. The operation in the first step was characterized as a function, g, defined on pairs of attribute values. The second operation was characterized as a function, F, defined on the vector of attribute similarity scores from the first step. Usually, F was a simple sum or weighted sum of the attribute similarity scores. It is concluded that similarity indices should not be used as the test statistic to discriminate between two ecological communities

  18. Measuring transferring similarity via local information

    Science.gov (United States)

    Yin, Likang; Deng, Yong

    2018-05-01

    Recommender systems have developed along with the web science, and how to measure the similarity between users is crucial for processing collaborative filtering recommendation. Many efficient models have been proposed (i.g., the Pearson coefficient) to measure the direct correlation. However, the direct correlation measures are greatly affected by the sparsity of dataset. In other words, the direct correlation measures would present an inauthentic similarity if two users have a very few commonly selected objects. Transferring similarity overcomes this drawback by considering their common neighbors (i.e., the intermediates). Yet, the transferring similarity also has its drawback since it can only provide the interval of similarity. To break the limitations, we propose the Belief Transferring Similarity (BTS) model. The contributions of BTS model are: (1) BTS model addresses the issue of the sparsity of dataset by considering the high-order similarity. (2) BTS model transforms uncertain interval to a certain state based on fuzzy systems theory. (3) BTS model is able to combine the transferring similarity of different intermediates using information fusion method. Finally, we compare BTS models with nine different link prediction methods in nine different networks, and we also illustrate the convergence property and efficiency of the BTS model.

  19. On distributional assumptions and whitened cosine similarities

    DEFF Research Database (Denmark)

    Loog, Marco

    2008-01-01

    Recently, an interpretation of the whitened cosine similarity measure as a Bayes decision rule was proposed (C. Liu, "The Bayes Decision Rule Induced Similarity Measures,'' IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 1086-1090, June 2007. This communication makes th...

  20. Self-Similar Traffic In Wireless Networks

    OpenAIRE

    Jerjomins, R.; Petersons, E.

    2005-01-01

    Many studies have shown that traffic in Ethernet and other wired networks is self-similar. This paper reveals that wireless network traffic is also self-similar and long-range dependant by analyzing big amount of data captured from the wireless router.

  1. Similarity Structure of Wave-Collapse

    DEFF Research Database (Denmark)

    Rypdal, Kristoffer; Juul Rasmussen, Jens; Thomsen, Kenneth

    1985-01-01

    Similarity transformations of the cubic Schrödinger equation (CSE) are investigated. The transformations are used to remove the explicit time variation in the CSE and reduce it to differential equations in the spatial variables only. Two different methods for similarity reduction are employed and...

  2. Similarity indices I: what do they measure.

    Energy Technology Data Exchange (ETDEWEB)

    Johnston, J.W.

    1976-11-01

    A method for estimating the effects of environmental effusions on ecosystems is described. The characteristics of 25 similarity indices used in studies of ecological communities were investigated. The type of data structure, to which these indices are frequently applied, was described as consisting of vectors of measurements on attributes (species) observed in a set of samples. A general similarity index was characterized as the result of a two-step process defined on a pair of vectors. In the first step an attribute similarity score is obtained for each attribute by comparing the attribute values observed in the pair of vectors. The result is a vector of attribute similarity scores. These are combined in the second step to arrive at the similarity index. The operation in the first step was characterized as a function, g, defined on pairs of attribute values. The second operation was characterized as a function, F, defined on the vector of attribute similarity scores from the first step. Usually, F was a simple sum or weighted sum of the attribute similarity scores. It is concluded that similarity indices should not be used as the test statistic to discriminate between two ecological communities.

  3. Information filtering based on transferring similarity.

    Science.gov (United States)

    Sun, Duo; Zhou, Tao; Liu, Jian-Guo; Liu, Run-Ran; Jia, Chun-Xiao; Wang, Bing-Hong

    2009-07-01

    In this Brief Report, we propose an index of user similarity, namely, the transferring similarity, which involves all high-order similarities between users. Accordingly, we design a modified collaborative filtering algorithm, which provides remarkably higher accurate predictions than the standard collaborative filtering. More interestingly, we find that the algorithmic performance will approach its optimal value when the parameter, contained in the definition of transferring similarity, gets close to its critical value, before which the series expansion of transferring similarity is convergent and after which it is divergent. Our study is complementary to the one reported in [E. A. Leicht, P. Holme, and M. E. J. Newman, Phys. Rev. E 73, 026120 (2006)], and is relevant to the missing link prediction problem.

  4. Self-similar continued root approximants

    International Nuclear Information System (INIS)

    Gluzman, S.; Yukalov, V.I.

    2012-01-01

    A novel method of summing asymptotic series is advanced. Such series repeatedly arise when employing perturbation theory in powers of a small parameter for complicated problems of condensed matter physics, statistical physics, and various applied problems. The method is based on the self-similar approximation theory involving self-similar root approximants. The constructed self-similar continued roots extrapolate asymptotic series to finite values of the expansion parameter. The self-similar continued roots contain, as a particular case, continued fractions and Padé approximants. A theorem on the convergence of the self-similar continued roots is proved. The method is illustrated by several examples from condensed-matter physics.

  5. Correlation between social proximity and mobility similarity.

    Science.gov (United States)

    Fan, Chao; Liu, Yiding; Huang, Junming; Rong, Zhihai; Zhou, Tao

    2017-09-20

    Human behaviors exhibit ubiquitous correlations in many aspects, such as individual and collective levels, temporal and spatial dimensions, content, social and geographical layers. With rich Internet data of online behaviors becoming available, it attracts academic interests to explore human mobility similarity from the perspective of social network proximity. Existent analysis shows a strong correlation between online social proximity and offline mobility similarity, namely, mobile records between friends are significantly more similar than between strangers, and those between friends with common neighbors are even more similar. We argue the importance of the number and diversity of common friends, with a counter intuitive finding that the number of common friends has no positive impact on mobility similarity while the diversity plays a key role, disagreeing with previous studies. Our analysis provides a novel view for better understanding the coupling between human online and offline behaviors, and will help model and predict human behaviors based on social proximity.

  6. Scalar Similarity for Relaxed Eddy Accumulation Methods

    Science.gov (United States)

    Ruppert, Johannes; Thomas, Christoph; Foken, Thomas

    2006-07-01

    The relaxed eddy accumulation (REA) method allows the measurement of trace gas fluxes when no fast sensors are available for eddy covariance measurements. The flux parameterisation used in REA is based on the assumption of scalar similarity, i.e., similarity of the turbulent exchange of two scalar quantities. In this study changes in scalar similarity between carbon dioxide, sonic temperature and water vapour were assessed using scalar correlation coefficients and spectral analysis. The influence on REA measurements was assessed by simulation. The evaluation is based on observations over grassland, irrigated cotton plantation and spruce forest. Scalar similarity between carbon dioxide, sonic temperature and water vapour showed a distinct diurnal pattern and change within the day. Poor scalar similarity was found to be linked to dissimilarities in the energy contained in the low frequency part of the turbulent spectra ( definition.

  7. Surf similarity and solitary wave runup

    DEFF Research Database (Denmark)

    Fuhrman, David R.; Madsen, Per A.

    2008-01-01

    The notion of surf similarity in the runup of solitary waves is revisited. We show that the surf similarity parameter for solitary waves may be effectively reduced to the beach slope divided by the offshore wave height to depth ratio. This clarifies its physical interpretation relative to a previ...... functional dependence on their respective surf similarity parameters. Important equivalencies in the runup of sinusoidal and solitary waves are thus revealed.......The notion of surf similarity in the runup of solitary waves is revisited. We show that the surf similarity parameter for solitary waves may be effectively reduced to the beach slope divided by the offshore wave height to depth ratio. This clarifies its physical interpretation relative...... to a previous parameterization, which was not given in an explicit form. Good coherency with experimental (breaking) runup data is preserved with this simpler parameter. A recasting of analytical (nonbreaking) runup expressions for sinusoidal and solitary waves additionally shows that they contain identical...

  8. Similarity in Bilateral Isolated Internal Orbital Fractures.

    Science.gov (United States)

    Chen, Hung-Chang; Cox, Jacob T; Sanyal, Abanti; Mahoney, Nicholas R

    2018-04-13

    In evaluating patients sustaining bilateral isolated internal orbital fractures, the authors have observed both similar fracture locations and also similar expansion of orbital volumes. In this study, we aim to investigate if there is a propensity for the 2 orbits to fracture in symmetrically similar patterns when sustaining similar trauma. A retrospective chart review was performed studying all cases at our institution of bilateral isolated internal orbital fractures involving the medial wall and/or the floor at the time of presentation. The similarity of the bilateral fracture locations was evaluated using the Fisher's exact test. The bilateral expanded orbital volumes were analyzed using the Wilcoxon signed-rank test to assess for orbital volume similarity. Twenty-four patients with bilateral internal orbital fractures were analyzed for fracture location similarity. Seventeen patients (70.8%) had 100% concordance in the orbital subregion fractured, and the association between the right and the left orbital fracture subregion locations was statistically significant (P < 0.0001). Fifteen patients were analyzed for orbital volume similarity. The average orbital cavity volume was 31.2 ± 3.8 cm on the right and 32.0 ± 3.7 cm on the left. There was a statistically significant difference between right and left orbital cavity volumes (P = 0.0026). The data from this study suggest that an individual who suffers isolated bilateral internal orbital fractures has a statistically significant similarity in the location of their orbital fractures. However, there does not appear to be statistically significant similarity in the expansion of the orbital volumes in these patients.

  9. Enhanced Dynamic Algorithm of Genome Sequence Alignments

    OpenAIRE

    Arabi E. keshk

    2014-01-01

    The merging of biology and computer science has created a new field called computational biology that explore the capacities of computers to gain knowledge from biological data, bioinformatics. Computational biology is rooted in life sciences as well as computers, information sciences, and technologies. The main problem in computational biology is sequence alignment that is a way of arranging the sequences of DNA, RNA or protein to identify the region of similarity and relationship between se...

  10. Similarity as an organising principle in short-term memory.

    Science.gov (United States)

    LeCompte, D C; Watkins, M J

    1993-03-01

    The role of stimulus similarity as an organising principle in short-term memory was explored in a series of seven experiments. Each experiment involved the presentation of a short sequence of items that were drawn from two distinct physical classes and arranged such that item class changed after every second item. Following presentation, one item was re-presented as a probe for the 'target' item that had directly followed it in the sequence. Memory for the sequence was considered organised by class if probability of recall was higher when the probe and target were from the same class than when they were from different classes. Such organisation was found when one class was auditory and the other was visual (spoken vs. written words, and sounds vs. pictures). It was also found when both classes were auditory (words spoken in a male voice vs. words spoken in a female voice) and when both classes were visual (digits shown in one location vs. digits shown in another). It is concluded that short-term memory can be organised on the basis of sensory modality and on the basis of certain features within both the auditory and visual modalities.

  11. Measure of Node Similarity in Multilayer Networks.

    Directory of Open Access Journals (Sweden)

    Anders Mollgaard

    Full Text Available The weight of links in a network is often related to the similarity of the nodes. Here, we introduce a simple tunable measure for analysing the similarity of nodes across different link weights. In particular, we use the measure to analyze homophily in a group of 659 freshman students at a large university. Our analysis is based on data obtained using smartphones equipped with custom data collection software, complemented by questionnaire-based data. The network of social contacts is represented as a weighted multilayer network constructed from different channels of telecommunication as well as data on face-to-face contacts. We find that even strongly connected individuals are not more similar with respect to basic personality traits than randomly chosen pairs of individuals. In contrast, several socio-demographics variables have a significant degree of similarity. We further observe that similarity might be present in one layer of the multilayer network and simultaneously be absent in the other layers. For a variable such as gender, our measure reveals a transition from similarity between nodes connected with links of relatively low weight to dis-similarity for the nodes connected by the strongest links. We finally analyze the overlap between layers in the network for different levels of acquaintanceships.

  12. Notions of similarity for computational biology models

    KAUST Repository

    Waltemath, Dagmar

    2016-03-21

    Computational models used in biology are rapidly increasing in complexity, size, and numbers. To build such large models, researchers need to rely on software tools for model retrieval, model combination, and version control. These tools need to be able to quantify the differences and similarities between computational models. However, depending on the specific application, the notion of similarity may greatly vary. A general notion of model similarity, applicable to various types of models, is still missing. Here, we introduce a general notion of quantitative model similarities, survey the use of existing model comparison methods in model building and management, and discuss potential applications of model comparison. To frame model comparison as a general problem, we describe a theoretical approach to defining and computing similarities based on different model aspects. Potentially relevant aspects of a model comprise its references to biological entities, network structure, mathematical equations and parameters, and dynamic behaviour. Future similarity measures could combine these model aspects in flexible, problem-specific ways in order to mimic users\\' intuition about model similarity, and to support complex model searches in databases.

  13. Trajectory similarity join in spatial networks

    KAUST Repository

    Shang, Shuo

    2017-09-07

    The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider the case of trajectory similarity join (TS-Join), where the objects are trajectories of vehicles moving in road networks. Thus, given two sets of trajectories and a threshold θ, the TS-Join returns all pairs of trajectories from the two sets with similarity above θ. This join targets applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction. With these applications in mind, we provide a purposeful definition of similarity. To enable efficient TS-Join processing on large sets of trajectories, we develop search space pruning techniques and take into account the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer algorithm. For each trajectory, the algorithm first finds similar trajectories. Then it merges the results to achieve a final result. The algorithm exploits an upper bound on the spatiotemporal similarity and a heuristic scheduling strategy for search space pruning. The algorithm\\'s per-trajectory searches are independent of each other and can be performed in parallel, and the merging has constant cost. An empirical study with real data offers insight in the performance of the algorithm and demonstrates that is capable of outperforming a well-designed baseline algorithm by an order of magnitude.

  14. The baryonic self similarity of dark matter

    International Nuclear Information System (INIS)

    Alard, C.

    2014-01-01

    The cosmological simulations indicates that dark matter halos have specific self-similar properties. However, the halo similarity is affected by the baryonic feedback. By using momentum-driven winds as a model to represent the baryon feedback, an equilibrium condition is derived which directly implies the emergence of a new type of similarity. The new self-similar solution has constant acceleration at a reference radius for both dark matter and baryons. This model receives strong support from the observations of galaxies. The new self-similar properties imply that the total acceleration at larger distances is scale-free, the transition between the dark matter and baryons dominated regime occurs at a constant acceleration, and the maximum amplitude of the velocity curve at larger distances is proportional to M 1/4 . These results demonstrate that this self-similar model is consistent with the basics of modified Newtonian dynamics (MOND) phenomenology. In agreement with the observations, the coincidence between the self-similar model and MOND breaks at the scale of clusters of galaxies. Some numerical experiments show that the behavior of the density near the origin is closely approximated by a Einasto profile.

  15. Notions of similarity for computational biology models

    KAUST Repository

    Waltemath, Dagmar; Henkel, Ron; Hoehndorf, Robert; Kacprowski, Tim; Knuepfer, Christian; Liebermeister, Wolfram

    2016-01-01

    Computational models used in biology are rapidly increasing in complexity, size, and numbers. To build such large models, researchers need to rely on software tools for model retrieval, model combination, and version control. These tools need to be able to quantify the differences and similarities between computational models. However, depending on the specific application, the notion of similarity may greatly vary. A general notion of model similarity, applicable to various types of models, is still missing. Here, we introduce a general notion of quantitative model similarities, survey the use of existing model comparison methods in model building and management, and discuss potential applications of model comparison. To frame model comparison as a general problem, we describe a theoretical approach to defining and computing similarities based on different model aspects. Potentially relevant aspects of a model comprise its references to biological entities, network structure, mathematical equations and parameters, and dynamic behaviour. Future similarity measures could combine these model aspects in flexible, problem-specific ways in order to mimic users' intuition about model similarity, and to support complex model searches in databases.

  16. Immunoinformatics and Similarity Analysis of House Dust Mite Tropomyosin

    Directory of Open Access Journals (Sweden)

    Mohammad Mehdi Ranjbar

    2015-10-01

    Full Text Available Background: Dermatophagoides farinae and Dermatophagoides pteronyssinus are house dust mites (HDM that they cause severe asthma and allergic symptoms. Tropomyosin protein plays an important role in mentioned immune and allergic reactions to HDMs. Here, tropomyosin protein from Dermatophagoides spp. was comprehensively screened in silico for its allergenicity, antigenicity and similarity/conservation.Materials and Methods: The amino acid sequences of D. farinae tropomyosin, D. pteronyssinus and other mites were retrieved. We included alignments and evaluated conserved/ variable regions along sequences, constructed their phylogenetic tree and estimated overall mean distances. Then, followed by with prediction of linear B-cell epitope based on different approaches, and besides in-silico evaluation of IgE epitopes allergenicity (by SVMc, IgE epitope, ARPs BLAST, MAST and hybrid method. Finally, comparative analysis of results by different approaches was made.Results: Alignment results revealed near complete identity between D. farina and D. pteronyssinus members, and also there was close similarity among Dermatophagoides spp. Most of the variations among mites' tropomyosin were approximately located at amino acids 23 to 80, 108 to 120, 142 to 153 and 220 to 230. Topology of tree showed close relationships among mites in tropomyosin protein sequence, although their sequences in D. farina, D. pteronyssinus and Psoroptes ovis are more similar to each other and clustered. Dermanyssus gallinae (AC: Q2WBI0 has less relationship to other mites, being located in a separate branch. Hydrophilicity and flexibility plots revealed that many parts of this protein have potential to be hydrophilic and flexible. Surface accessibility represented 7 different epitopes. Beta-turns in this protein are with high probability in the middle part and its two terminals. Kolaskar and Tongaonkar method analysis represented 11 immunogenic epitopes between amino acids 7-16. From

  17. Query-dependent banding (QDB for faster RNA similarity searches.

    Directory of Open Access Journals (Sweden)

    Eric P Nawrocki

    2007-03-01

    Full Text Available When searching sequence databases for RNAs, it is desirable to score both primary sequence and RNA secondary structure similarity. Covariance models (CMs are probabilistic models well-suited for RNA similarity search applications. However, the computational complexity of CM dynamic programming alignment algorithms has limited their practical application. Here we describe an acceleration method called query-dependent banding (QDB, which uses the probabilistic query CM to precalculate regions of the dynamic programming lattice that have negligible probability, independently of the target database. We have implemented QDB in the freely available Infernal software package. QDB reduces the average case time complexity of CM alignment from LN(2.4 to LN(1.3 for a query RNA of N residues and a target database of L residues, resulting in a 4-fold speedup for typical RNA queries. Combined with other improvements to Infernal, including informative mixture Dirichlet priors on model parameters, benchmarks also show increased sensitivity and specificity resulting from improved parameterization.

  18. A self-similar hierarchy of the Korean stock market

    Science.gov (United States)

    Lim, Gyuchang; Min, Seungsik; Yoo, Kun-Woo

    2013-01-01

    A scaling analysis is performed on market values of stocks listed on Korean stock exchanges such as the KOSPI and the KOSDAQ. Different from previous studies on price fluctuations, market capitalizations are dealt with in this work. First, we show that the sum of the two stock exchanges shows a clear rank-size distribution, i.e., the Zipf's law, just as each separate one does. Second, by abstracting Zipf's law as a γ-sequence, we define a self-similar hierarchy consisting of many levels, with the numbers of firms at each level forming a geometric sequence. We also use two exponential functions to describe the hierarchy and derive a scaling law from them. Lastly, we propose a self-similar hierarchical process and perform an empirical analysis on our data set. Based on our findings, we argue that all money invested in the stock market is distributed in a hierarchical way and that a slight difference exists between the two exchanges.

  19. A Similarity Search Using Molecular Topological Graphs

    Directory of Open Access Journals (Sweden)

    Yoshifumi Fukunishi

    2009-01-01

    Full Text Available A molecular similarity measure has been developed using molecular topological graphs and atomic partial charges. Two kinds of topological graphs were used. One is the ordinary adjacency matrix and the other is a matrix which represents the minimum path length between two atoms of the molecule. The ordinary adjacency matrix is suitable to compare the local structures of molecules such as functional groups, and the other matrix is suitable to compare the global structures of molecules. The combination of these two matrices gave a similarity measure. This method was applied to in silico drug screening, and the results showed that it was effective as a similarity measure.

  20. Similarity-based pattern analysis and recognition

    CERN Document Server

    Pelillo, Marcello

    2013-01-01

    This accessible text/reference presents a coherent overview of the emerging field of non-Euclidean similarity learning. The book presents a broad range of perspectives on similarity-based pattern analysis and recognition methods, from purely theoretical challenges to practical, real-world applications. The coverage includes both supervised and unsupervised learning paradigms, as well as generative and discriminative models. Topics and features: explores the origination and causes of non-Euclidean (dis)similarity measures, and how they influence the performance of traditional classification alg

  1. HYPOTHESIS TESTING WITH THE SIMILARITY INDEX

    Science.gov (United States)

    Mulltilocus DNA fingerprinting methods have been used extensively to address genetic issues in wildlife populations. Hypotheses concerning population subdivision and differing levels of diversity can be addressed through the use of the similarity index (S), a band-sharing coeffic...

  2. On self-similarity of crack layer

    Science.gov (United States)

    Botsis, J.; Kunin, B.

    1987-01-01

    The crack layer (CL) theory of Chudnovsky (1986), based on principles of thermodynamics of irreversible processes, employs a crucial hypothesis of self-similarity. The self-similarity hypothesis states that the value of the damage density at a point x of the active zone at a time t coincides with that at the corresponding point in the initial (t = 0) configuration of the active zone, the correspondence being given by a time-dependent affine transformation of the space variables. In this paper, the implications of the self-similarity hypothesis for qusi-static CL propagation is investigated using polystyrene as a model material and examining the evolution of damage distribution along the trailing edge which is approximated by a straight segment perpendicular to the crack path. The results support the self-similarity hypothesis adopted by the CL theory.

  3. Bilateral Trade Flows and Income Distribution Similarity

    Science.gov (United States)

    2016-01-01

    Current models of bilateral trade neglect the effects of income distribution. This paper addresses the issue by accounting for non-homothetic consumer preferences and hence investigating the role of income distribution in the context of the gravity model of trade. A theoretically justified gravity model is estimated for disaggregated trade data (Dollar volume is used as dependent variable) using a sample of 104 exporters and 108 importers for 1980–2003 to achieve two main goals. We define and calculate new measures of income distribution similarity and empirically confirm that greater similarity of income distribution between countries implies more trade. Using distribution-based measures as a proxy for demand similarities in gravity models, we find consistent and robust support for the hypothesis that countries with more similar income-distributions trade more with each other. The hypothesis is also confirmed at disaggregated level for differentiated product categories. PMID:27137462

  4. Discovering Music Structure via Similarity Fusion

    DEFF Research Database (Denmark)

    for representing music structure is studied in a simplified scenario consisting of 4412 songs and two similarity measures among them. The results suggest that the PLSA model is a useful framework to combine different sources of information, and provides a reasonable space for song representation.......Automatic methods for music navigation and music recommendation exploit the structure in the music to carry out a meaningful exploration of the “song space”. To get a satisfactory performance from such systems, one should incorporate as much information about songs similarity as possible; however...... semantics”, in such a way that all observed similarities can be satisfactorily explained using the latent semantics. Therefore, one can think of these semantics as the real structure in music, in the sense that they can explain the observed similarities among songs. The suitability of the PLSA model...

  5. Abundance estimation of spectrally similar minerals

    CSIR Research Space (South Africa)

    Debba, Pravesh

    2009-07-01

    Full Text Available This paper evaluates a spectral unmixing method for estimating the partial abundance of spectrally similar minerals in complex mixtures. The method requires formulation of a linear function of individual spectra of individual minerals. The first...

  6. Lagrangian-similarity diffusion-deposition model

    International Nuclear Information System (INIS)

    Horst, T.W.

    1979-01-01

    A Lagrangian-similarity diffusion model has been incorporated into the surface-depletion deposition model. This model predicts vertical concentration profiles far downwind of the source that agree with those of a one-dimensional gradient-transfer model

  7. Discovering Music Structure via Similarity Fusion

    DEFF Research Database (Denmark)

    Arenas-García, Jerónimo; Parrado-Hernandez, Emilio; Meng, Anders

    Automatic methods for music navigation and music recommendation exploit the structure in the music to carry out a meaningful exploration of the “song space”. To get a satisfactory performance from such systems, one should incorporate as much information about songs similarity as possible; however...... semantics”, in such a way that all observed similarities can be satisfactorily explained using the latent semantics. Therefore, one can think of these semantics as the real structure in music, in the sense that they can explain the observed similarities among songs. The suitability of the PLSA model...... for representing music structure is studied in a simplified scenario consisting of 4412 songs and two similarity measures among them. The results suggest that the PLSA model is a useful framework to combine different sources of information, and provides a reasonable space for song representation....

  8. Outsourced similarity search on metric data assets

    KAUST Repository

    Yiu, Man Lung

    2012-02-01

    This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries.

  9. Similarity search processing. Paralelization and indexing technologies.

    Directory of Open Access Journals (Sweden)

    Eder Dos Santos

    2015-08-01

    The next Scientific-Technical Report addresses the similarity search and the implementation of metric structures on parallel environments. It also presents the state of the art related to similarity search on metric structures and parallelism technologies. Comparative analysis are also proposed, seeking to identify the behavior of a set of metric spaces and metric structures over processing platforms multicore-based and GPU-based.

  10. Parallel trajectory similarity joins in spatial networks

    KAUST Repository

    Shang, Shuo

    2018-04-04

    The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider two cases of trajectory similarity joins (TS-Joins), including a threshold-based join (Tb-TS-Join) and a top-k TS-Join (k-TS-Join), where the objects are trajectories of vehicles moving in road networks. Given two sets of trajectories and a threshold θ, the Tb-TS-Join returns all pairs of trajectories from the two sets with similarity above θ. In contrast, the k-TS-Join does not take a threshold as a parameter, and it returns the top-k most similar trajectory pairs from the two sets. The TS-Joins target diverse applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction. With these applications in mind, we provide purposeful definitions of similarity. To enable efficient processing of the TS-Joins on large sets of trajectories, we develop search space pruning techniques and enable use of the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer search framework that lays the foundation for the algorithms for the Tb-TS-Join and the k-TS-Join that rely on different pruning techniques to achieve efficiency. For each trajectory, the algorithms first find similar trajectories. Then they merge the results to obtain the final result. The algorithms for the two joins exploit different upper and lower bounds on the spatiotemporal trajectory similarity and different heuristic scheduling strategies for search space pruning. Their per-trajectory searches are independent of each other and can be performed in parallel, and the mergings have constant cost. An empirical study with real data offers insight in the performance of the algorithms and demonstrates that they are capable of outperforming well-designed baseline algorithms by an order of magnitude.

  11. Parallel trajectory similarity joins in spatial networks

    KAUST Repository

    Shang, Shuo; Chen, Lisi; Wei, Zhewei; Jensen, Christian S.; Zheng, Kai; Kalnis, Panos

    2018-01-01

    The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider two cases of trajectory similarity joins (TS-Joins), including a threshold-based join (Tb-TS-Join) and a top-k TS-Join (k-TS-Join), where the objects are trajectories of vehicles moving in road networks. Given two sets of trajectories and a threshold θ, the Tb-TS-Join returns all pairs of trajectories from the two sets with similarity above θ. In contrast, the k-TS-Join does not take a threshold as a parameter, and it returns the top-k most similar trajectory pairs from the two sets. The TS-Joins target diverse applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction. With these applications in mind, we provide purposeful definitions of similarity. To enable efficient processing of the TS-Joins on large sets of trajectories, we develop search space pruning techniques and enable use of the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer search framework that lays the foundation for the algorithms for the Tb-TS-Join and the k-TS-Join that rely on different pruning techniques to achieve efficiency. For each trajectory, the algorithms first find similar trajectories. Then they merge the results to obtain the final result. The algorithms for the two joins exploit different upper and lower bounds on the spatiotemporal trajectory similarity and different heuristic scheduling strategies for search space pruning. Their per-trajectory searches are independent of each other and can be performed in parallel, and the mergings have constant cost. An empirical study with real data offers insight in the performance of the algorithms and demonstrates that they are capable of outperforming well-designed baseline algorithms by an order of magnitude.

  12. Are calanco landforms similar to river basins?

    Science.gov (United States)

    Caraballo-Arias, N A; Ferro, V

    2017-12-15

    In the past badlands have been often considered as ideal field laboratories for studying landscape evolution because of their geometrical similarity to larger fluvial systems. For a given hydrological process, no scientific proof exists that badlands can be considered a model of river basin prototypes. In this paper the measurements carried out on 45 Sicilian calanchi, a type of badlands that appears as a small-scale hydrographic unit, are used to establish their morphological similarity with river systems whose data are available in the literature. At first the geomorphological similarity is studied by identifying the dimensionless groups, which can assume the same value or a scaled one in a fixed ratio, representing drainage basin shape, stream network and relief properties. Then, for each property, the dimensionless groups are calculated for the investigated calanchi and the river basins and their corresponding scale ratio is evaluated. The applicability of Hack's, Horton's and Melton's laws for establishing similarity criteria is also tested. The developed analysis allows to conclude that a quantitative morphological similarity between calanco landforms and river basins can be established using commonly applied dimensionless groups. In particular, the analysis showed that i) calanchi and river basins have a geometrically similar shape respect to the parameters Rf and Re with a scale factor close to 1, ii) calanchi and river basins are similar respect to the bifurcation and length ratios (λ=1), iii) for the investigated calanchi the Melton number assumes values less than that (0.694) corresponding to the river case and a scale ratio ranging from 0.52 and 0.78 can be used, iv) calanchi and river basins have similar mean relief ratio values (λ=1.13) and v) calanchi present active geomorphic processes and therefore fall in a more juvenile stage with respect to river basins. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. Cloning, sequencing, and sequence analysis of two novel plasmids from the thermophilic anaerobic bacterium Anaerocellum thermophilum

    DEFF Research Database (Denmark)

    Clausen, Anders; Mikkelsen, Marie Just; Schrøder, I.

    2004-01-01

    The nucleotide sequence of two novel plasmids isolated from the extreme thermophilic anaerobic bacterium Anaerocellum thermophilum DSM6725 (A. thermophilum), growing optimally at 70degreesC, has been determined. pBAS2 was found to be a 3653 bp plasmid with a GC content of 43%, and the sequence re...... with highest similarity to DNA repair protein from Campylobacter jejuni (25% aa). Orf34 showed similarity to sigma factors with highest similarity (28% aa) to the sporulation specific Sigma factor, Sigma 28(K) from Bacillus thuringiensis....

  14. Analysis of petunia hybrida in response to salt stress using high throughput RNA sequencing

    Science.gov (United States)

    Salt and drought are among the greatest challenges to crop and native plants in meeting their yield and reproductive potentials. DNA sequencing-enabled transcriptome profiling provides a means of assessing what genes are responding to salt or drought stress so as to better understand the molecular ...

  15. Identifying mechanistic similarities in drug responses

    KAUST Repository

    Zhao, C.

    2012-05-15

    Motivation: In early drug development, it would be beneficial to be able to identify those dynamic patterns of gene response that indicate that drugs targeting a particular gene will be likely or not to elicit the desired response. One approach would be to quantitate the degree of similarity between the responses that cells show when exposed to drugs, so that consistencies in the regulation of cellular response processes that produce success or failure can be more readily identified.Results: We track drug response using fluorescent proteins as transcription activity reporters. Our basic assumption is that drugs inducing very similar alteration in transcriptional regulation will produce similar temporal trajectories on many of the reporter proteins and hence be identified as having similarities in their mechanisms of action (MOA). The main body of this work is devoted to characterizing similarity in temporal trajectories/signals. To do so, we must first identify the key points that determine mechanistic similarity between two drug responses. Directly comparing points on the two signals is unrealistic, as it cannot handle delays and speed variations on the time axis. Hence, to capture the similarities between reporter responses, we develop an alignment algorithm that is robust to noise, time delays and is able to find all the contiguous parts of signals centered about a core alignment (reflecting a core mechanism in drug response). Applying the proposed algorithm to a range of real drug experiments shows that the result agrees well with the prior drug MOA knowledge. © The Author 2012. Published by Oxford University Press. All rights reserved.

  16. Semantic similarity between ontologies at different scales

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Qingpeng; Haglin, David J.

    2016-04-01

    In the past decade, existing and new knowledge and datasets has been encoded in different ontologies for semantic web and biomedical research. The size of ontologies is often very large in terms of number of concepts and relationships, which makes the analysis of ontologies and the represented knowledge graph computational and time consuming. As the ontologies of various semantic web and biomedical applications usually show explicit hierarchical structures, it is interesting to explore the trade-offs between ontological scales and preservation/precision of results when we analyze ontologies. This paper presents the first effort of examining the capability of this idea via studying the relationship between scaling biomedical ontologies at different levels and the semantic similarity values. We evaluate the semantic similarity between three Gene Ontology slims (Plant, Yeast, and Candida, among which the latter two belong to the same kingdom—Fungi) using four popular measures commonly applied to biomedical ontologies (Resnik, Lin, Jiang-Conrath, and SimRel). The results of this study demonstrate that with proper selection of scaling levels and similarity measures, we can significantly reduce the size of ontologies without losing substantial detail. In particular, the performance of Jiang-Conrath and Lin are more reliable and stable than that of the other two in this experiment, as proven by (a) consistently showing that Yeast and Candida are more similar (as compared to Plant) at different scales, and (b) small deviations of the similarity values after excluding a majority of nodes from several lower scales. This study provides a deeper understanding of the application of semantic similarity to biomedical ontologies, and shed light on how to choose appropriate semantic similarity measures for biomedical engineering.

  17. Similarity of satellite DNA properties in the order Rodentia

    Energy Technology Data Exchange (ETDEWEB)

    Mazrimas, J A; Hatch, F T

    1977-09-01

    We have characterized satellite DNAs from 9 species of kangaroo rat (Dipodomys) and have shown that the HS-..cap alpha.. and HS-..beta.. satellites, where present, are nearly identical in all species as to melting transition midpoint (Tm), and density in neutral CsCl, alkaline CsCl, and Cs/sub 2/SO/sub 4/-Ag/sup +/ gradients. However, the MS satellites exist in two internally similar classes. The satellite DNAs from three other rodents were characterized (densities listed are in neutral CsCl). The pocket gopher, Thomomys bottae, contains Th-..cap alpha.. (1.713 g/ml) and Th-..beta.. (1.703 g/ml). The guinea pig (Cavia porcellus) contains Ca-..cap alpha.., Ca-..beta.., and Ca-..gamma.. at densities of 1.706 g/ml, 1.704 g/ml, and 1.704 g/ml, respectively. The antelope ground squirrel (Ammospermophilus harrisi) contains Am-..cap alpha.., 1.708 g/ml, Am-..beta.., 1.717 g/ml, and Am-..gamma.., 1.707 g/ml. The physical and chemical properties of the alpha-satellites from the above four rodents representing four different families in two suborders of Rodentia were compared. They show nearly identical Tm, nucleoside composition of single strands, and single strand densities in alkaline CsCl. Similar comparisons on the second or third satellite DNAs from these rodents also indicate a close relationship to each other. Thus the high degree of similarity of satellite sequences found in such a diverse group of rodents suggests a cellular function that is subject to natural selection, and implies that these sequences have been conserved over a considerable span of evolutionary time since the divergence of these rodents about 50 million years ago.

  18. Similarity of satellite DNA properties in the order Rodentia

    Energy Technology Data Exchange (ETDEWEB)

    Mazrimas, J A; Hatch, F T

    1977-09-01

    Satellite DNAs from 9 species of kangaroo rat (Dipodomys) have been characterized and have shown that the HS-..cap alpha.. and HS-..beta.. satellites, where present, are nearly identical in all species as to melting transition midpoint (Tm), and density in neutral CsCl, alkaline CsCl, and Cs/sub 2/SO/sub 4/-Ag/sup +/ gradients. However, the MS satellites exist in two internally similar classes. The satellite DNAs from three other rodents were characterized (densities listed are in neutral CsCl). The pocket gopher, Thomomys bottae, contains Th-..cap alpha.. (1.713 g/ml) and Th..beta.. (1.703 g/ml). The guinea pig (Cavia porcellus) contains Ca-..cap alpha.., Ca-..beta.. and Ca-..gamma.. at densities of 1.706 g/ml, 1.704 g/ml and 1.704 g/ml, respectively. The antelope ground squirrel (Ammospermophilus harrisi) contains Am-..cap alpha.., 1.708 g/ml, Am-..beta.., 1.717 g/ml, and Am-..gamma.., 1.707 g/ml. The physical and chemical properties of the alpha-satellites from the above four rodents representing four different families in two suborders of Rodentia were compared. They show nearly identical Tm, nucleoside composition of single strands, and single strand densities in alkaline CsCl. Similar comparisons on the second or third satellite DNAs from these rodents also indicate a close relationship to each other. Thus the high degree of similarity of satellite sequences found in such a diverse group of rodents suggests a cellular function that is subject to natural selection, and implies that these sequences have been conserved over a considerable span of evolutionary time since the divergence of these rodents about 50 million years ago.

  19. Protecting genomic sequence anonymity with generalization lattices.

    Science.gov (United States)

    Malin, B A

    2005-01-01

    Current genomic privacy technologies assume the identity of genomic sequence data is protected if personal information, such as demographics, are obscured, removed, or encrypted. While demographic features can directly compromise an individual's identity, recent research demonstrates such protections are insufficient because sequence data itself is susceptible to re-identification. To counteract this problem, we introduce an algorithm for anonymizing a collection of person-specific DNA sequences. The technique is termed DNA lattice anonymization (DNALA), and is based upon the formal privacy protection schema of k -anonymity. Under this model, it is impossible to observe or learn features that distinguish one genetic sequence from k-1 other entries in a collection. To maximize information retained in protected sequences, we incorporate a concept generalization lattice to learn the distance between two residues in a single nucleotide region. The lattice provides the most similar generalized concept for two residues (e.g. adenine and guanine are both purines). The method is tested and evaluated with several publicly available human population datasets ranging in size from 30 to 400 sequences. Our findings imply the anonymization schema is feasible for the protection of sequences privacy. The DNALA method is the first computational disclosure control technique for general DNA sequences. Given the computational nature of the method, guarantees of anonymity can be formally proven. There is room for improvement and validation, though this research provides the groundwork from which future researchers can construct genomics anonymization schemas tailored to specific datasharing scenarios.

  20. Sequences for Student Investigation

    Science.gov (United States)

    Barton, Jeffrey; Feil, David; Lartigue, David; Mullins, Bernadette

    2004-01-01

    We describe two classes of sequences that give rise to accessible problems for undergraduate research. These problems may be understood with virtually no prerequisites and are well suited for computer-aided investigation. The first sequence is a variation of one introduced by Stephen Wolfram in connection with his study of cellular automata. The…

  1. Measure of Node Similarity in Multilayer Networks

    DEFF Research Database (Denmark)

    Møllgaard, Anders; Zettler, Ingo; Dammeyer, Jesper

    2016-01-01

    The weight of links in a network is often related to the similarity of thenodes. Here, we introduce a simple tunable measure for analysing the similarityof nodes across different link weights. In particular, we use the measure toanalyze homophily in a group of 659 freshman students at a large...... university.Our analysis is based on data obtained using smartphones equipped with customdata collection software, complemented by questionnaire-based data. The networkof social contacts is represented as a weighted multilayer network constructedfrom different channels of telecommunication as well as data...... might bepresent in one layer of the multilayer network and simultaneously be absent inthe other layers. For a variable such as gender, our measure reveals atransition from similarity between nodes connected with links of relatively lowweight to dis-similarity for the nodes connected by the strongest...

  2. A Novel Hybrid Similarity Calculation Model

    Directory of Open Access Journals (Sweden)

    Xiaoping Fan

    2017-01-01

    Full Text Available This paper addresses the problems of similarity calculation in the traditional recommendation algorithms of nearest neighbor collaborative filtering, especially the failure in describing dynamic user preference. Proceeding from the perspective of solving the problem of user interest drift, a new hybrid similarity calculation model is proposed in this paper. This model consists of two parts, on the one hand the model uses the function fitting to describe users’ rating behaviors and their rating preferences, and on the other hand it employs the Random Forest algorithm to take user attribute features into account. Furthermore, the paper combines the two parts to build a new hybrid similarity calculation model for user recommendation. Experimental results show that, for data sets of different size, the model’s prediction precision is higher than the traditional recommendation algorithms.

  3. Universal self-similarity of propagating populations.

    Science.gov (United States)

    Eliazar, Iddo; Klafter, Joseph

    2010-07-01

    This paper explores the universal self-similarity of propagating populations. The following general propagation model is considered: particles are randomly emitted from the origin of a d-dimensional Euclidean space and propagate randomly and independently of each other in space; all particles share a statistically common--yet arbitrary--motion pattern; each particle has its own random propagation parameters--emission epoch, motion frequency, and motion amplitude. The universally self-similar statistics of the particles' displacements and first passage times (FPTs) are analyzed: statistics which are invariant with respect to the details of the displacement and FPT measurements and with respect to the particles' underlying motion pattern. Analysis concludes that the universally self-similar statistics are governed by Poisson processes with power-law intensities and by the Fréchet and Weibull extreme-value laws.

  4. Universal self-similarity of propagating populations

    Science.gov (United States)

    Eliazar, Iddo; Klafter, Joseph

    2010-07-01

    This paper explores the universal self-similarity of propagating populations. The following general propagation model is considered: particles are randomly emitted from the origin of a d -dimensional Euclidean space and propagate randomly and independently of each other in space; all particles share a statistically common—yet arbitrary—motion pattern; each particle has its own random propagation parameters—emission epoch, motion frequency, and motion amplitude. The universally self-similar statistics of the particles’ displacements and first passage times (FPTs) are analyzed: statistics which are invariant with respect to the details of the displacement and FPT measurements and with respect to the particles’ underlying motion pattern. Analysis concludes that the universally self-similar statistics are governed by Poisson processes with power-law intensities and by the Fréchet and Weibull extreme-value laws.

  5. Trajectory similarity join in spatial networks

    KAUST Repository

    Shang, Shuo; Chen, Lisi; Wei, Zhewei; Jensen, Christian S.; Zheng, Kai; Kalnis, Panos

    2017-01-01

    With these applications in mind, we provide a purposeful definition of similarity. To enable efficient TS-Join processing on large sets of trajectories, we develop search space pruning techniques and take into account the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer algorithm. For each trajectory, the algorithm first finds similar trajectories. Then it merges the results to achieve a final result. The algorithm exploits an upper bound on the spatiotemporal similarity and a heuristic scheduling strategy for search space pruning. The algorithm's per-trajectory searches are independent of each other and can be performed in parallel, and the merging has constant cost. An empirical study with real data offers insight in the performance of the algorithm and demonstrates that is capable of outperforming a well-designed baseline algorithm by an order of magnitude.

  6. Phonological similarity in working memory span tasks.

    Science.gov (United States)

    Chow, Michael; Macnamara, Brooke N; Conway, Andrew R A

    2016-08-01

    In a series of four experiments, we explored what conditions are sufficient to produce a phonological similarity facilitation effect in working memory span tasks. By using the same set of memoranda, but differing the secondary-task requirements across experiments, we showed that a phonological similarity facilitation effect is dependent upon the semantic relationship between the memoranda and the secondary-task stimuli, and is robust to changes in the representation, ordering, and pool size of the secondary-task stimuli. These findings are consistent with interference accounts of memory (Brown, Neath, & Chater, Psychological Review, 114, 539-576, 2007; Oberauer, Lewandowsky, Farrell, Jarrold, & Greaves, Psychonomic Bulletin & Review, 19, 779-819, 2012), whereby rhyming stimuli provide a form of categorical similarity that allows distractors to be excluded from retrieval at recall.

  7. Unveiling Music Structure Via PLSA Similarity Fusion

    DEFF Research Database (Denmark)

    Arenas-García, Jerónimo; Meng, Anders; Petersen, Kaare Brandt

    2007-01-01

    Nowadays there is an increasing interest in developing methods for building music recommendation systems. In order to get a satisfactory performance from such a system, one needs to incorporate as much information about songs similarity as possible; however, how to do so is not obvious. In this p......Nowadays there is an increasing interest in developing methods for building music recommendation systems. In order to get a satisfactory performance from such a system, one needs to incorporate as much information about songs similarity as possible; however, how to do so is not obvious...... observed similarities can be satisfactorily explained using the latent semantics. Additionally, this approach significantly simplifies the song retrieval phase, leading to a more practical system implementation. The suitability of the PLSA model for representing music structure is studied in a simplified...

  8. Sequence History Update Tool

    Science.gov (United States)

    Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

    2008-01-01

    The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

  9. Similarity joins in relational database systems

    CERN Document Server

    Augsten, Nikolaus

    2013-01-01

    State-of-the-art database systems manage and process a variety of complex objects, including strings and trees. For such objects equality comparisons are often not meaningful and must be replaced by similarity comparisons. This book describes the concepts and techniques to incorporate similarity into database systems. We start out by discussing the properties of strings and trees, and identify the edit distance as the de facto standard for comparing complex objects. Since the edit distance is computationally expensive, token-based distances have been introduced to speed up edit distance comput

  10. Outsourced Similarity Search on Metric Data Assets

    DEFF Research Database (Denmark)

    Yiu, Man Lung; Assent, Ira; Jensen, Christian S.

    2012-01-01

    . Outsourcing offers the data owner scalability and a low initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying......This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example...

  11. Measure of Node Similarity in Multilayer Networks

    DEFF Research Database (Denmark)

    Møllgaard, Anders; Zettler, Ingo; Dammeyer, Jesper

    2016-01-01

    university.Our analysis is based on data obtained using smartphones equipped with customdata collection software, complemented by questionnaire-based data. The networkof social contacts is represented as a weighted multilayer network constructedfrom different channels of telecommunication as well as data...... might bepresent in one layer of the multilayer network and simultaneously be absent inthe other layers. For a variable such as gender, our measure reveals atransition from similarity between nodes connected with links of relatively lowweight to dis-similarity for the nodes connected by the strongest...

  12. Cultural similarity and adjustment of expatriate academics

    DEFF Research Database (Denmark)

    Selmer, Jan; Lauring, Jakob

    2009-01-01

    The findings of a number of recent empirical studies of business expatriates, using different samples and methodologies, seem to support the counter-intuitive proposition that cultural similarity may be as difficult to adjust to as cultural dissimilarity. However, it is not obvious...... and non-EU countries. Results showed that although the perceived cultural similarity between host and home country for the two groups of investigated respondents was different, there was neither any difference in their adjustment nor in the time it took for them to become proficient. Implications...

  13. Nuclear markers reveal that inter-lake cichlids' similar morphologies do not reflect similar genealogy.

    Science.gov (United States)

    Kassam, Daud; Seki, Shingo; Horic, Michio; Yamaoka, Kosaku

    2006-08-01

    The apparent inter-lake morphological similarity among East African Great Lakes' cichlid species/genera has left evolutionary biologists asking whether such similarity is due to sharing of common ancestor or mere convergent evolution. In order to answer such question, we first used Geometric Morphometrics, GM, to quantify morphological similarity and then subsequently used Amplified Fragment Length Polymorphism, AFLP, to determine if similar morphologies imply shared ancestry or convergent evolution. GM revealed that not all presumed morphological similar pairs were indeed similar, and the dendrogram generated from AFLP data indicated distinct clusters corresponding to each lake and not inter-lake morphological similar pairs. Such results imply that the morphological similarity is due to convergent evolution and not shared ancestry. The congruency of GM and AFLP generated dendrograms imply that GM is capable of picking up phylogenetic signal, and thus GM can be potential tool in phylogenetic systematics.

  14. Self-similar slip distributions on irregular shaped faults

    Science.gov (United States)

    Herrero, A.; Murphy, S.

    2018-06-01

    We propose a strategy to place a self-similar slip distribution on a complex fault surface that is represented by an unstructured mesh. This is possible by applying a strategy based on the composite source model where a hierarchical set of asperities, each with its own slip function which is dependent on the distance from the asperity centre. Central to this technique is the efficient, accurate computation of distance between two points on the fault surface. This is known as the geodetic distance problem. We propose a method to compute the distance across complex non-planar surfaces based on a corollary of the Huygens' principle. The difference between this method compared to others sample-based algorithms which precede it is the use of a curved front at a local level to calculate the distance. This technique produces a highly accurate computation of the distance as the curvature of the front is linked to the distance from the source. Our local scheme is based on a sequence of two trilaterations, producing a robust algorithm which is highly precise. We test the strategy on a planar surface in order to assess its ability to keep the self-similarity properties of a slip distribution. We also present a synthetic self-similar slip distribution on a real slab topography for a M8.5 event. This method for computing distance may be extended to the estimation of first arrival times in both complex 3D surfaces or 3D volumes.

  15. Clustering biomolecular complexes by residue contacts similarity

    NARCIS (Netherlands)

    Garcia Lopes Maia Rodrigues, João; Trellet, Mikaël; Schmitz, Christophe; Kastritis, Panagiotis; Karaca, Ezgi; Melquiond, Adrien S J; Bonvin, Alexandre M J J; Garcia Lopes Maia Rodrigues, João

    Inaccuracies in computational molecular modeling methods are often counterweighed by brute-force generation of a plethora of putative solutions. These are then typically sieved via structural clustering based on similarity measures such as the root mean square deviation (RMSD) of atomic positions.

  16. Similarity principles for equipment qualification by experience

    International Nuclear Information System (INIS)

    Kana, D.D.; Pomerening, D.J.

    1988-07-01

    A methodology is developed for seismic qualification of nuclear plant equipment by applying similarity principles to existing experience data. Experience data are available from previous qualifications by analysis or testing, or from actual earthquake events. Similarity principles are defined in terms of excitation, equipment physical characteristics, and equipment response. Physical similarity is further defined in terms of a critical transfer function for response at a location on a primary structure, whose response can be assumed directly related to ultimate fragility of the item under elevated levels of excitation. Procedures are developed for combining experience data into composite specifications for qualification of equipment that can be shown to be physically similar to the reference equipment. Other procedures are developed for extending qualifications beyond the original specifications under certain conditions. Some examples for application of the procedures and verification of them are given for certain cases that can be approximated by a two degree of freedom simple primary/secondary system. Other examples are based on use of actual test data available from previous qualifications. Relationships of the developments with other previously-published methods are discussed. The developments are intended to elaborate on the rather broad revised guidelines developed by the IEEE 344 Standards Committee for equipment qualification in new nuclear plants. However, the results also contribute to filling a gap that exists between the IEEE 344 methodology and that previously developed by the Seismic Qualification Utilities Group. The relationship of the results to safety margin methodology is also discussed. (author)

  17. 7 CFR 51.1997 - Similar type.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Similar type. 51.1997 Section 51.1997 Agriculture Regulations of the Department of Agriculture AGRICULTURAL MARKETING SERVICE (Standards, Inspections, Marketing Practices), DEPARTMENT OF AGRICULTURE REGULATIONS AND STANDARDS UNDER THE AGRICULTURAL MARKETING ACT OF 1946...

  18. Efficient Similarity Retrieval in Music Databases

    DEFF Research Database (Denmark)

    Ruxanda, Maria Magdalena; Jensen, Christian Søndergaard

    2006-01-01

    Audio music is increasingly becoming available in digital form, and the digital music collections of individuals continue to grow. Addressing the need for effective means of retrieving music from such collections, this paper proposes new techniques for content-based similarity search. Each music...

  19. Similarity search of business process models

    NARCIS (Netherlands)

    Dumas, M.; García-Bañuelos, L.; Dijkman, R.M.

    2009-01-01

    Similarity search is a general class of problems in which a given object, called a query object, is compared against a collection of objects in order to retrieve those that most closely resemble the query object. This paper reviews recent work on an instance of this class of problems, where the

  20. Evaluating gender similarities and differences using metasynthesis.

    Science.gov (United States)

    Zell, Ethan; Krizan, Zlatan; Teeter, Sabrina R

    2015-01-01

    Despite the common lay assumption that males and females are profoundly different, Hyde (2005) used data from 46 meta-analyses to demonstrate that males and females are highly similar. Nonetheless, the gender similarities hypothesis has remained controversial. Since Hyde's provocative report, there has been an explosion of meta-analytic interest in psychological gender differences. We utilized this enormous collection of 106 meta-analyses and 386 individual meta-analytic effects to reevaluate the gender similarities hypothesis. Furthermore, we employed a novel data-analytic approach called metasynthesis (Zell & Krizan, 2014) to estimate the average difference between males and females and to explore moderators of gender differences. The average, absolute difference between males and females across domains was relatively small (d = 0.21, SD = 0.14), with the majority of effects being either small (46%) or very small (39%). Magnitude of differences fluctuated somewhat as a function of the psychological domain (e.g., cognitive variables, social and personality variables, well-being), but remained largely constant across age, culture, and generations. These findings provide compelling support for the gender similarities hypothesis, but also underscore conditions under which gender differences are most pronounced. PsycINFO Database Record (c) 2015 APA, all rights reserved.

  1. Cross-kingdom similarities in microbiome functions

    NARCIS (Netherlands)

    Mendes, R.; Raaijmakers, J.M.

    2015-01-01

    Recent advances in medical research have revealed how humans rely on their microbiome for diverse traits and functions. Similarly, microbiomes of other higher organisms play key roles in disease, health, growth and development of their host. Exploring microbiome functions across kingdoms holds

  2. Measuring structural similarity in large online networks.

    Science.gov (United States)

    Shi, Yongren; Macy, Michael

    2016-09-01

    Structural similarity based on bipartite graphs can be used to detect meaningful communities, but the networks have been tiny compared to massive online networks. Scalability is important in applications involving tens of millions of individuals with highly skewed degree distributions. Simulation analysis holding underlying similarity constant shows that two widely used measures - Jaccard index and cosine similarity - are biased by the distribution of out-degree in web-scale networks. However, an alternative measure, the Standardized Co-incident Ratio (SCR), is unbiased. We apply SCR to members of Congress, musical artists, and professional sports teams to show how massive co-following on Twitter can be used to map meaningful affiliations among cultural entities, even in the absence of direct connections to one another. Our results show how structural similarity can be used to map cultural alignments and demonstrate the potential usefulness of social media data in the study of culture, politics, and organizations across the social and behavioral sciences. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Phonological Similarity in American Sign Language.

    Science.gov (United States)

    Hildebrandt, Ursula; Corina, David

    2002-01-01

    Investigates deaf and hearing subjects' ratings of American Sign Language (ASL) signs to assess whether linguistic experience shapes judgments of sign similarity. Findings are consistent with linguistic theories that posit movement and location as core structural elements of syllable structure in ASL. (Author/VWL)

  4. Structural similarity and category-specificity

    DEFF Research Database (Denmark)

    Gerlach, Christian; Law, Ian; Paulson, Olaf B

    2004-01-01

    It has been suggested that category-specific recognition disorders for natural objects may reflect that natural objects are more structurally (visually) similar than artefacts and therefore more difficult to recognize following brain damage. On this account one might expect a positive relationshi...

  5. Music Retrieval based on Melodic Similarity

    NARCIS (Netherlands)

    Typke, R.

    2007-01-01

    This thesis introduces a method for measuring melodic similarity for notated music such as MIDI files. This music search algorithm views music as sets of notes that are represented as weighted points in the two-dimensional space of time and pitch. Two point sets can be compared by calculating how

  6. Measurement of Similarity in Academic Contexts

    Directory of Open Access Journals (Sweden)

    Omid Mahian

    2017-06-01

    Full Text Available We propose some reflections, comments and suggestions about the measurement of similar and matched content in scientific papers and documents, and the need to develop appropriate tools and standards for an ethically fair and equitable treatment of authors.

  7. Appropriate Similarity Measures for Author Cocitation Analysis

    NARCIS (Netherlands)

    N.J.P. van Eck (Nees Jan); L. Waltman (Ludo)

    2007-01-01

    textabstractWe provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of

  8. Similarity of Experience and Empathy in Preschoolers.

    Science.gov (United States)

    Barnett, Mark A.

    The present study examined the role of similarity of experience in young children's affective reactions to others. Some preschoolers played one of two games (Puzzle Board or Buckets) and were informed that they had either failed or succeeded; others merely observed the games being played and were given no evaluative feedback. Subsequently, each…

  9. Cultural Similarities and Differences on Idiom Translation

    Institute of Scientific and Technical Information of China (English)

    黄频频; 陈于全

    2010-01-01

    Both English and Chinese are abound with idioms. Idioms are an important part of the hnguage and culture of a society. English and Chinese idioms carved with cultural characteristics account for a great part in the tramlation. This paper studies the translation of idioms concerning their cultural similarities, cultural differences and transhtion principles.

  10. Learning by similarity in coordination problems

    Czech Academy of Sciences Publication Activity Database

    Steiner, Jakub; Stewart, C.

    -, č. 324 (2007), s. 1-40 ISSN 1211-3298 R&D Projects: GA MŠk LC542 Institutional research plan: CEZ:AV0Z70850503 Keywords : similarity * learning * case-based reasoning Subject RIV: AH - Economics http://www.cerge-ei.cz/pdf/wp/Wp324.pdf

  11. Outsourced similarity search on metric data assets

    KAUST Repository

    Yiu, Man Lung; Assent, Ira; Jensen, Christian Sø ndergaard; Kalnis, Panos

    2012-01-01

    for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise

  12. Protein-protein interaction network-based detection of functionally similar proteins within species.

    Science.gov (United States)

    Song, Baoxing; Wang, Fen; Guo, Yang; Sang, Qing; Liu, Min; Li, Dengyun; Fang, Wei; Zhang, Deli

    2012-07-01

    Although functionally similar proteins across species have been widely studied, functionally similar proteins within species showing low sequence similarity have not been examined in detail. Identification of these proteins is of significant importance for understanding biological functions, evolution of protein families, progression of co-evolution, and convergent evolution and others which cannot be obtained by detection of functionally similar proteins across species. Here, we explored a method of detecting functionally similar proteins within species based on graph theory. After denoting protein-protein interaction networks using graphs, we split the graphs into subgraphs using the 1-hop method. Proteins with functional similarities in a species were detected using a method of modified shortest path to compare these subgraphs and to find the eligible optimal results. Using seven protein-protein interaction networks and this method, some functionally similar proteins with low sequence similarity that cannot detected by sequence alignment were identified. By analyzing the results, we found that, sometimes, it is difficult to separate homologous from convergent evolution. Evaluation of the performance of our method by gene ontology term overlap showed that the precision of our method was excellent. Copyright © 2012 Wiley Periodicals, Inc.

  13. Extending the Similarity-Attraction Effect : The effects of When-Similarity in mediated communication

    NARCIS (Netherlands)

    Kaptein, M.C.; Castaneda, D.; Fernandez, N.; Nass, C.

    2014-01-01

    The feeling of connectedness experienced in computer-mediated relationships can be explained by the similarity-attraction effect (SAE). Though SAE is well established in psychology, the effects of some types of similarity have not yet been explored. In 2 studies, we demonstrate similarity-attraction

  14. AlignMe—a membrane protein sequence alignment web server

    Science.gov (United States)

    Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.

    2014-01-01

    We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425

  15. HIV Sequence Compendium 2015

    Energy Technology Data Exchange (ETDEWEB)

    Foley, Brian Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas Kenneth [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Cristian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Pennsylvania, Philadelphia, PA (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette Tina Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  16. Mapping sequences by parts

    Directory of Open Access Journals (Sweden)

    Guziolowski Carito

    2007-09-01

    Full Text Available Abstract Background: We present the N-map method, a pairwise and asymmetrical approach which allows us to compare sequences by taking into account evolutionary events that produce shuffled, reversed or repeated elements. Basically, the optimal N-map of a sequence s over a sequence t is the best way of partitioning the first sequence into N parts and placing them, possibly complementary reversed, over the second sequence in order to maximize the sum of their gapless alignment scores. Results: We introduce an algorithm computing an optimal N-map with time complexity O (|s| × |t| × N using O (|s| × |t| × N memory space. Among all the numbers of parts taken in a reasonable range, we select the value N for which the optimal N-map has the most significant score. To evaluate this significance, we study the empirical distributions of the scores of optimal N-maps and show that they can be approximated by normal distributions with a reasonable accuracy. We test the functionality of the approach over random sequences on which we apply artificial evolutionary events. Practical Application: The method is illustrated with four case studies of pairs of sequences involving non-standard evolutionary events.

  17. Comparison of cDNA-derived protein sequences of the human fibronectin and vitronectin receptor α-subunits and platelet glycoprotein IIb

    International Nuclear Information System (INIS)

    Fitzgerald, L.A.; Poncz, M.; Steiner, B.; Rall, S.C. Jr.; Bennett, J.S.; Phillips, D.R.

    1987-01-01

    The fibronectin receptor (FnR), the vitronectin receptor (VnR), and the platelet membrane glycoprotein (GP) IIb-IIIa complex are members of a family of cell adhesion receptors, which consist of noncovalently associated α- and β-subunits. The present study was designed to compare the cDNA-derived protein sequences of the α-subunits of human FnR, VnR, and platelet GP IIb. cDNA clones for the α-subunit of the FnR (FnR/sub α/) were obtained from a human umbilical vein endothelial (HUVE) cell library by using an oligonucleotide probe designed from a peptide sequence of platelet GP IIb. cDNA clones for platelet GP IIb were isolated from a cDNA expression library of human erythroleukemia cells by using antibodies. cDNA clones of the VnR α-subunit (VnR/sub α/) were obtained from the HUVE cell library by using an oligonucleotide probe from the partial cDNA sequence for the VnR/sub α/. Translation of these sequences showed that the FNR/sub α/, the VnR/sub α/, and GP IIb are composed of disulfide-linked large (858-871 amino acids) and small (137-158 amino acids) chains that are posttranslationally processed from a single mRNA. A single hydrophobic segment located near the carboxyl terminus of each small chain appears to be a transmembrane domain. The large chains appear to be entirely extracellular, and each contains four repeated putative Ca 2+ -binding domains of about 30 amino acids that have sequence similarities to other Ca 2+ -binding proteins. The identity among the protein sequences of the three receptor α-subunits ranges from 36.1% to 44.5%, with the Ca 2+ -binding domains having the greatest homology. These proteins apparently evolved by a process of gene duplication

  18. SIGNIFICANCE OF TARGETED EXOME SEQUENCING AND METHODS OF DATA ANALYSIS IN THE DIAGNOSIS OF GENETIC DISORDERS LEADING TO THE DEVELOPMENT OF EPILEPTIC ENCEPHALOPATHY

    Directory of Open Access Journals (Sweden)

    Tatyana Victorovna Kozhanova

    2017-08-01

    Full Text Available Epilepsy is the most common serious neurological disorder, and there is a genetic basis in almost 50% of people with epilepsy. The diagnosis of genetic epilepsies makes to estimate reasons of seizures in the patient. Last decade has shown tremendous growth in gene sequencing technologies, which have made genetic tests available. The aim is to show significance of targeted exome sequencing and methods of data analysis in the diagnosis of hereditary syndromes leading to the development of epileptic encephalopathy. We examined 27 patients with с early EE (resistant to antiepileptic drugs, psychomotor and speech development delay in the psycho-neurological department. Targeted exome sequencing was performed for patients without a previously identified molecular diagnosis using 454 Sequencing GS Junior sequencer (Roche and IlluminaNextSeq 500 platform. As a result of the analysis, specific epilepsy genetic variants were diagnosed in 27 patients. The greatest number of cases was due to mutations in the SCN1A gene (7/27. The structure of mutations for other genes (mutations with a minor allele frequency of less than 0,5% are presented: ALDH7A1 (n=1, CACNA1C (n=1, CDKL5 (n=1, CNTNAP2 (n=2, DLGAP2 (n=2, DOCK7 (n=2, GRIN2B (n=2, HCN1 (n=1, NRXN1 (n=3, PCDH19 (n=1, RNASEH2B (n=2, SLC2A1 (n=1, UBE3A (n=1. The use of the exome sequencing in the genetic practice allows to significantly improve the effectiveness of medical genetic counseling, as it made possible to diagnose certain variants of genetically heterogeneous groups of diseases with similar of clinical manifestations.

  19. The Colliding Beams Sequencer

    International Nuclear Information System (INIS)

    Johnson, D.E.; Johnson, R.P.

    1989-01-01

    The Colliding Beam Sequencer (CBS) is a computer program used to operate the pbar-p Collider by synchronizing the applications programs and simulating the activities of the accelerator operators during filling and storage. The Sequencer acts as a meta-program, running otherwise stand alone applications programs, to do the set-up, beam transfers, acceleration, low beta turn on, and diagnostics for the transfers and storage. The Sequencer and its operational performance will be described along with its special features which include a periodic scheduler and command logger. 14 refs., 3 figs

  20. Phylogenetic Trees From Sequences

    Science.gov (United States)

    Ryvkin, Paul; Wang, Li-San

    In this chapter, we review important concepts and approaches for phylogeny reconstruction from sequence data.We first cover some basic definitions and properties of phylogenetics, and briefly explain how scientists model sequence evolution and measure sequence divergence. We then discuss three major approaches for phylogenetic reconstruction: distance-based phylogenetic reconstruction, maximum parsimony, and maximum likelihood. In the third part of the chapter, we review how multiple phylogenies are compared by consensus methods and how to assess confidence using bootstrapping. At the end of the chapter are two sections that list popular software packages and additional reading.

  1. Recalling visual serial order for verbal sequences

    NARCIS (Netherlands)

    Logie, R.H.; Saito, S.; Morita, A.; Varma, S.; Norris, D.

    2016-01-01

    We report three experiments in which participants performed written serial recall of visually presented verbal sequences with items varying in visual similarity. In Experiments 1 and 2 native speakers of Japanese recalled visually presented Japanese Kanji characters. In Experiment 3, native speakers

  2. Draft Genome Sequence of Lactobacillus rhamnosus 2166.

    OpenAIRE

    Karlyshev, Andrey V.; Melnikov, Vyacheslav G.; Kosarev, Igor V.; Abramov, Vyacheslav M.

    2014-01-01

    In this report, we present a draft sequence of the genome of Lactobacillus rhamnosus strain 2166, a potential novel probiotic. Genome annotation and read mapping onto a reference genome of L. rhamnosus strain GG allowed for the identification of the differences and similarities in the genomic contents and gene arrangements of these strains.

  3. Popularity versus similarity in growing networks

    Science.gov (United States)

    Krioukov, Dmitri; Papadopoulos, Fragkiskos; Kitsak, Maksim; Serrano, Mariangeles; Boguna, Marian

    2012-02-01

    Preferential attachment is a powerful mechanism explaining the emergence of scaling in growing networks. If new connections are established preferentially to more popular nodes in a network, then the network is scale-free. Here we show that not only popularity but also similarity is a strong force shaping the network structure and dynamics. We develop a framework where new connections, instead of preferring popular nodes, optimize certain trade-offs between popularity and similarity. The framework admits a geometric interpretation, in which preferential attachment emerges from local optimization processes. As opposed to preferential attachment, the optimization framework accurately describes large-scale evolution of technological (Internet), social (web of trust), and biological (E.coli metabolic) networks, predicting the probability of new links in them with a remarkable precision. The developed framework can thus be used for predicting new links in evolving networks, and provides a different perspective on preferential attachment as an emergent phenomenon.

  4. Similarity, trust in institutions, affect, and populism

    DEFF Research Database (Denmark)

    Scholderer, Joachim; Finucane, Melissa L.

    -based evaluations are fundamental to human information processing, they can contribute significantly to other judgments (such as the risk, cost-effectiveness, trustworthiness) of the same stimulus object. Although deliberation and analysis are certainly important in some decision-making circumstances, reliance...... on affect is a quicker, easier, and a more efficient way of navigating in a complex and uncertain world. Hence, many theorists give affect a direct and primary role in motivating behavior. Taken together, the results provide uncannily strong support for the value-similarity hypothesis, strengthening...... types of information about gene technology. The materials were attributed to different institutions. The results indicated that participants' trust in an institution was a function of the similarity between the position advocated in the materials and participants' own attitudes towards gene technology...

  5. Contingency and similarity in response selection.

    Science.gov (United States)

    Prinz, Wolfgang

    2018-05-09

    This paper explores issues of task representation in choice reaction time tasks. How is it possible, and what does it take, to represent such a task in a way that enables a performer to do the task in line with the prescriptions entailed in the instructions? First, a framework for task representation is outlined which combines the implementation of task sets and their use for performance with different kinds of representational operations (pertaining to feature compounds for event codes and code assemblies for task sets, respectively). Then, in a second step, the framework is itself embedded in the bigger picture of the classical debate on the roles of contingency and similarity for the formation of associations. The final conclusion is that both principles are needed and that the operation of similarity at the level of task sets requires and presupposes the operation of contingency at the level of event codes. Copyright © 2018 The Author. Published by Elsevier Inc. All rights reserved.

  6. Similarity and Modeling in Science and Engineering

    CERN Document Server

    Kuneš, Josef

    2012-01-01

    The present text sets itself in relief to other titles on the subject in that it addresses the means and methodologies versus a narrow specific-task oriented approach. Concepts and their developments which evolved to meet the changing needs of applications are addressed. This approach provides the reader with a general tool-box to apply to their specific needs. Two important tools are presented: dimensional analysis and the similarity analysis methods. The fundamental point of view, enabling one to sort all models, is that of information flux between a model and an original expressed by the similarity and abstraction. Each chapter includes original examples and ap-plications. In this respect, the models can be divided into several groups. The following models are dealt with separately by chapter; mathematical and physical models, physical analogues, deterministic, stochastic, and cybernetic computer models. The mathematical models are divided into asymptotic and phenomenological models. The phenomenological m...

  7. Similarity solutions for phase-change problems

    Science.gov (United States)

    Canright, D.; Davis, S. H.

    1989-01-01

    A modification of Ivantsov's (1947) similarity solutions is proposed which can describe phase-change processes which are limited by diffusion. The method has application to systems that have n-components and possess cross-diffusion and Soret and Dufour effects, along with convection driven by density discontinuities at the two-phase interface. Local thermal equilibrium is assumed at the interface. It is shown that analytic solutions are possible when the material properties are constant.

  8. Stochastic self-similar and fractal universe

    International Nuclear Information System (INIS)

    Iovane, G.; Laserra, E.; Tortoriello, F.S.

    2004-01-01

    The structures formation of the Universe appears as if it were a classically self-similar random process at all astrophysical scales. An agreement is demonstrated for the present hypotheses of segregation with a size of astrophysical structures by using a comparison between quantum quantities and astrophysical ones. We present the observed segregated Universe as the result of a fundamental self-similar law, which generalizes the Compton wavelength relation. It appears that the Universe has a memory of its quantum origin as suggested by R. Penrose with respect to quasi-crystal. A more accurate analysis shows that the present theory can be extended from the astrophysical to the nuclear scale by using generalized (stochastically) self-similar random process. This transition is connected to the relevant presence of the electromagnetic and nuclear interactions inside the matter. In this sense, the presented rule is correct from a subatomic scale to an astrophysical one. We discuss the near full agreement at organic cell scale and human scale too. Consequently the Universe, with its structures at all scales (atomic nucleus, organic cell, human, planet, solar system, galaxy, clusters of galaxy, super clusters of galaxy), could have a fundamental quantum reason. In conclusion, we analyze the spatial dimensions of the objects in the Universe as well as space-time dimensions. The result is that it seems we live in an El Naschie's E-infinity Cantorian space-time; so we must seriously start considering fractal geometry as the geometry of nature, a type of arena where the laws of physics appear at each scale in a self-similar way as advocated long ago by the Swedish school of astrophysics

  9. Similarity-based Polymorphic Shellcode Detection

    Directory of Open Access Journals (Sweden)

    Denis Yurievich Gamayunov

    2013-02-01

    Full Text Available In the work the method for polymorphic shellcode dedection based on the set of known shellcodes is proposed. The method’s main idea is in sequential applying of deobfuscating transformations to a data analyzed and then recognizing similarity with malware samples. The method has been tested on the sets of shellcodes generated using Metasploit Framework v.4.1.0 and PELock Obfuscator and shows 87 % precision with zero false positives rate.

  10. Quasi-Similarity Model of Synthetic Jets

    Czech Academy of Sciences Publication Activity Database

    Tesař, Václav; Kordík, Jozef

    2009-01-01

    Roč. 149, č. 2 (2009), s. 255-265 ISSN 0924-4247 R&D Projects: GA AV ČR IAA200760705; GA ČR GA101/07/1499 Institutional research plan: CEZ:AV0Z20760514 Keywords : jets * synthetic jets * similarity solution Subject RIV: BK - Fluid Dynamics Impact factor: 1.674, year: 2009 http://www.sciencedirect.com

  11. Multidimensional Scaling Visualization using Parametric Similarity Indices

    OpenAIRE

    Machado, J. A. Tenreiro; Lopes, António M.; Galhano, A.M.

    2015-01-01

    In this paper, we apply multidimensional scaling (MDS) and parametric similarity indices (PSI) in the analysis of complex systems (CS). Each CS is viewed as a dynamical system, exhibiting an output time-series to be interpreted as a manifestation of its behavior. We start by adopting a sliding window to sample the original data into several consecutive time periods. Second, we define a given PSI for tracking pieces of data. We then compare the windows for different values of the parameter, an...

  12. The fluid similarity of the boiling crisis

    International Nuclear Information System (INIS)

    Katsaounis, A.

    1986-01-01

    Most of the measurements related to the boiling crisis have, until now, been undertaken for a wide parameter variation in the water, and were mainly related to the water-cooled reactor. This article investigates, whether or how the measuring results can be transferred to other fluids. Derived dimensionless similarity figures and those taken from literature are verified by measurements from complex geometries in water and freon 12. (orig.) [de

  13. The fluid similarity of the boiling crisis

    International Nuclear Information System (INIS)

    Katsaounis, A.

    1987-01-01

    Most of the measurements related to the boiling crisis have, until now, been undertaken for a wide parameter variation in the water, and were mainly related to the water-cooled reactor. This article investigates, whether or how the measuring results can be transferred to other fluids. Derived dimensionless similarity figures and those taken from literature are verified by measurements from complex geometries in water and freon 12. (orig./GL) [de

  14. Semantic Similarity between Web Documents Using Ontology

    Science.gov (United States)

    Chahal, Poonam; Singh Tomer, Manjeet; Kumar, Suresh

    2018-06-01

    The World Wide Web is the source of information available in the structure of interlinked web pages. However, the procedure of extracting significant information with the assistance of search engine is incredibly critical. This is for the reason that web information is written mainly by using natural language, and further available to individual human. Several efforts have been made in semantic similarity computation between documents using words, concepts and concepts relationship but still the outcome available are not as per the user requirements. This paper proposes a novel technique for computation of semantic similarity between documents that not only takes concepts available in documents but also relationships that are available between the concepts. In our approach documents are being processed by making ontology of the documents using base ontology and a dictionary containing concepts records. Each such record is made up of the probable words which represents a given concept. Finally, document ontology's are compared to find their semantic similarity by taking the relationships among concepts. Relevant concepts and relations between the concepts have been explored by capturing author and user intention. The proposed semantic analysis technique provides improved results as compared to the existing techniques.

  15. Semantic Similarity between Web Documents Using Ontology

    Science.gov (United States)

    Chahal, Poonam; Singh Tomer, Manjeet; Kumar, Suresh

    2018-03-01

    The World Wide Web is the source of information available in the structure of interlinked web pages. However, the procedure of extracting significant information with the assistance of search engine is incredibly critical. This is for the reason that web information is written mainly by using natural language, and further available to individual human. Several efforts have been made in semantic similarity computation between documents using words, concepts and concepts relationship but still the outcome available are not as per the user requirements. This paper proposes a novel technique for computation of semantic similarity between documents that not only takes concepts available in documents but also relationships that are available between the concepts. In our approach documents are being processed by making ontology of the documents using base ontology and a dictionary containing concepts records. Each such record is made up of the probable words which represents a given concept. Finally, document ontology's are compared to find their semantic similarity by taking the relationships among concepts. Relevant concepts and relations between the concepts have been explored by capturing author and user intention. The proposed semantic analysis technique provides improved results as compared to the existing techniques.

  16. Exploration of noncoding sequences in metagenomes.

    Directory of Open Access Journals (Sweden)

    Fabián Tobar-Tosse

    Full Text Available Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C content, Codon Usage (Cd, Trinucleotide Usage (Tn, and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.

  17. Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm.

    Science.gov (United States)

    Seo, Joo-Hyun; Park, Jihyang; Kim, Eun-Mi; Kim, Juhan; Joo, Keehyoung; Lee, Jooyoung; Kim, Byung-Gee

    2014-02-01

    Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping. Copyright © 2013. Published by Elsevier Ltd.

  18. Sequencing of BAC pools by different next generation sequencing platforms and strategies

    Directory of Open Access Journals (Sweden)

    Scholz Uwe

    2011-10-01

    Full Text Available Abstract Background Next generation sequencing of BACs is a viable option for deciphering the sequence of even large and highly repetitive genomes. In order to optimize this strategy, we examined the influence of read length on the quality of Roche/454 sequence assemblies, to what extent Illumina/Solexa mate pairs (MPs improve the assemblies by scaffolding and whether barcoding of BACs is dispensable. Results Sequencing four BACs with both FLX and Titanium technologies revealed similar sequencing accuracy, but showed that the longer Titanium reads produce considerably less misassemblies and gaps. The 454 assemblies of 96 barcoded BACs were improved by scaffolding 79% of the total contig length with MPs from a non-barcoded library. Assembly of the unmasked 454 sequences without separation by barcodes revealed chimeric contig formation to be a major problem, encompassing 47% of the total contig length. Masking the sequences reduced this fraction to 24%. Conclusion Optimal BAC pool sequencing should be based on the longest available reads, with barcoding essential for a comprehensive assessment of both repetitive and non-repetitive sequence information. When interest is restricted to non-repetitive regions and repeats are masked prior to assembly, barcoding is non-essential. In any case, the assemblies can be improved considerably by scaffolding with non-barcoded BAC pool MPs.

  19. Gomphid DNA sequence data

    Data.gov (United States)

    U.S. Environmental Protection Agency — DNA sequence data for several genetic loci. This dataset is not publicly accessible because: It's already publicly available on GenBank. It can be accessed through...

  20. Yeast genome sequencing:

    DEFF Research Database (Denmark)

    Piskur, Jure; Langkjær, Rikke Breinhold

    2004-01-01

    For decades, unicellular yeasts have been general models to help understand the eukaryotic cell and also our own biology. Recently, over a dozen yeast genomes have been sequenced, providing the basis to resolve several complex biological questions. Analysis of the novel sequence data has shown...... of closely related species helps in gene annotation and to answer how many genes there really are within the genomes. Analysis of non-coding regions among closely related species has provided an example of how to determine novel gene regulatory sequences, which were previously difficult to analyse because...... they are short and degenerate and occupy different positions. Comparative genomics helps to understand the origin of yeasts and points out crucial molecular events in yeast evolutionary history, such as whole-genome duplication and horizontal gene transfer(s). In addition, the accumulating sequence data provide...

  1. Formatt: Correcting protein multiple structural alignments by incorporating sequence alignment

    Directory of Open Access Journals (Sweden)

    Daniels Noah M

    2012-10-01

    Full Text Available Abstract Background The quality of multiple protein structure alignments are usually computed and assessed based on geometric functions of the coordinates of the backbone atoms from the protein chains. These purely geometric methods do not utilize directly protein sequence similarity, and in fact, determining the proper way to incorporate sequence similarity measures into the construction and assessment of protein multiple structure alignments has proved surprisingly difficult. Results We present Formatt, a multiple structure alignment based on the Matt purely geometric multiple structure alignment program, that also takes into account sequence similarity when constructing alignments. We show that Formatt outperforms Matt and other popular structure alignment programs on the popular HOMSTRAD benchmark. For the SABMark twilight zone benchmark set that captures more remote homology, Formatt and Matt outperform other programs; depending on choice of embedded sequence aligner, Formatt produces either better sequence and structural alignments with a smaller core size than Matt, or similarly sized alignments with better sequence similarity, for a small cost in average RMSD. Conclusions Considering sequence information as well as purely geometric information seems to improve quality of multiple structure alignments, though defining what constitutes the best alignment when sequence and structural measures would suggest different alignments remains a difficult open question.

  2. Dynamic Sequence Assignment.

    Science.gov (United States)

    1983-12-01

    D-136 548 DYNAMIIC SEQUENCE ASSIGNMENT(U) ADVANCED INFORMATION AND 1/2 DECISION SYSTEMS MOUNTAIN YIELW CA C A 0 REILLY ET AL. UNCLSSIIED DEC 83 AI/DS...I ADVANCED INFORMATION & DECISION SYSTEMS Mountain View. CA 94040 84 u ,53 V,..’. Unclassified _____ SCURITY CLASSIFICATION OF THIS PAGE REPORT...reviews some important heuristic algorithms developed for fas- ter solution of the sequence assignment problem. 3.1. DINAMIC MOGRAMUNIG FORMULATION FOR

  3. HIV Sequence Compendium 2010

    Energy Technology Data Exchange (ETDEWEB)

    Kuiken, Carla [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Foley, Brian [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Christian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Alabama, Tuscaloosa, AL (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  4. General LTE Sequence

    OpenAIRE

    Billal, Masum

    2015-01-01

    In this paper,we have characterized sequences which maintain the same property described in Lifting the Exponent Lemma. Lifting the Exponent Lemma is a very powerful tool in olympiad number theory and recently it has become very popular. We generalize it to all sequences that maintain a property like it i.e. if p^{\\alpha}||a_k and p^\\b{eta}||n, then p^{{\\alpha}+\\b{eta}}||a_{nk}.

  5. Pairwise Sequence Alignment Library

    Energy Technology Data Exchange (ETDEWEB)

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.

  6. Constraint Satisfaction Inference : Non-probabilistic Global Inference for Sequence Labelling

    NARCIS (Netherlands)

    Canisius, S.V.M.; van den Bosch, A.; Daelemans, W.; Basili, R.; Moschitti, A.

    2006-01-01

    We present a new method for performing sequence labelling based on the idea of using a machine-learning classifier to generate several possible output sequences, and then applying an inference procedure to select the best sequence among those. Most sequence labelling methods following a similar

  7. Emergent self-similarity of cluster coagulation

    Science.gov (United States)

    Pushkin, Dmtiri O.

    A wide variety of nonequilibrium processes, such as coagulation of colloidal particles, aggregation of bacteria into colonies, coalescence of rain drops, bond formation between polymerization sites, and formation of planetesimals, fall under the rubric of cluster coagulation. We predict emergence of self-similar behavior in such systems when they are 'forced' by an external source of the smallest particles. The corresponding self-similar coagulation spectra prove to be power laws. Starting from the classical Smoluchowski coagulation equation, we identify the conditions required for emergence of self-similarity and show that the power-law exponent value for a particular coagulation mechanism depends on the homogeneity index of the corresponding coagulation kernel only. Next, we consider the current wave of mergers of large American banks as an 'unorthodox' application of coagulation theory. We predict that the bank size distribution has propensity to become a power law, and verify our prediction in a statistical study of the available economical data. We conclude this chapter by discussing economically significant phenomenon of capital condensation and predicting emergence of power-law distributions in other economical and social data. Finally, we turn to apparent semblance between cluster coagulation and turbulence and conclude that it is not accidental: both of these processes are instances of nonlinear cascades. This class of processes also includes river network formation models, certain force-chain models in granular mechanics, fragmentation due to collisional cascades, percolation, and growing random networks. We characterize a particular cascade by three indicies and show that the resulting power-law spectrum exponent depends on the indicies values only. The ensuing algebraic formula is remarkable for its simplicity.

  8. Spherically symmetric self-similar universe

    Energy Technology Data Exchange (ETDEWEB)

    Dyer, C C [Toronto Univ., Ontario (Canada)

    1979-10-01

    A spherically symmetric self-similar dust-filled universe is considered as a simple model of a hierarchical universe. Observable differences between the model in parabolic expansion and the corresponding homogeneous Einstein-de Sitter model are considered in detail. It is found that an observer at the centre of the distribution has a maximum observable redshift and can in principle see arbitrarily large blueshifts. It is found to yield an observed density-distance law different from that suggested by the observations of de Vaucouleurs. The use of these solutions as central objects for Swiss-cheese vacuoles is discussed.

  9. Image magnification based on similarity analogy

    International Nuclear Information System (INIS)

    Chen Zuoping; Ye Zhenglin; Wang Shuxun; Peng Guohua

    2009-01-01

    Aiming at the high time complexity of the decoding phase in the traditional image enlargement methods based on fractal coding, a novel image magnification algorithm is proposed in this paper, which has the advantage of iteration-free decoding, by using the similarity analogy between an image and its zoom-out and zoom-in. A new pixel selection technique is also presented to further improve the performance of the proposed method. Furthermore, by combining some existing fractal zooming techniques, an efficient image magnification algorithm is obtained, which can provides the image quality as good as the state of the art while greatly decrease the time complexity of the decoding phase.

  10. Modeling Timbre Similarity of Short Music Clips.

    Science.gov (United States)

    Siedenburg, Kai; Müllensiefen, Daniel

    2017-01-01

    There is evidence from a number of recent studies that most listeners are able to extract information related to song identity, emotion, or genre from music excerpts with durations in the range of tenths of seconds. Because of these very short durations, timbre as a multifaceted auditory attribute appears as a plausible candidate for the type of features that listeners make use of when processing short music excerpts. However, the importance of timbre in listening tasks that involve short excerpts has not yet been demonstrated empirically. Hence, the goal of this study was to develop a method that allows to explore to what degree similarity judgments of short music clips can be modeled with low-level acoustic features related to timbre. We utilized the similarity data from two large samples of participants: Sample I was obtained via an online survey, used 16 clips of 400 ms length, and contained responses of 137,339 participants. Sample II was collected in a lab environment, used 16 clips of 800 ms length, and contained responses from 648 participants. Our model used two sets of audio features which included commonly used timbre descriptors and the well-known Mel-frequency cepstral coefficients as well as their temporal derivates. In order to predict pairwise similarities, the resulting distances between clips in terms of their audio features were used as predictor variables with partial least-squares regression. We found that a sparse selection of three to seven features from both descriptor sets-mainly encoding the coarse shape of the spectrum as well as spectrotemporal variability-best predicted similarities across the two sets of sounds. Notably, the inclusion of non-acoustic predictors of musical genre and record release date allowed much better generalization performance and explained up to 50% of shared variance ( R 2 ) between observations and model predictions. Overall, the results of this study empirically demonstrate that both acoustic features related

  11. Similar on the Inside (pre-grinding)

    Science.gov (United States)

    2004-01-01

    This approximate true-color image taken by the panoramic camera on the Mars Exploration Rover Opportunity show the rock called 'Pilbara' located in the small crater dubbed 'Fram.' The rock appears to be dotted with the same 'blueberries,' or spherules, found at 'Eagle Crater.' Spirit drilled into this rock with its rock abrasion tool. After analyzing the hole with the rover's scientific instruments, scientists concluded that Pilbara has a similar chemical make-up, and thus watery past, to rocks studied at Eagle Crater. This image was taken with the panoramic camera's 480-, 530- and 600-nanometer filters.

  12. Similar on the Inside (post-grinding)

    Science.gov (United States)

    2004-01-01

    This approximate true-color image taken by the panoramic camera on the Mars Exploration Rover Opportunity show the hole drilled into the rock called 'Pilbara,' which is located in the small crater dubbed 'Fram.' Spirit drilled into this rock with its rock abrasion tool. The rock appears to be dotted with the same 'blueberries,' or spherules, found at 'Eagle Crater.' After analyzing the hole with the rover's scientific instruments, scientists concluded that Pilbara has a similar chemical make-up, and thus watery past, to rocks studied at Eagle Crater. This image was taken with the panoramic camera's 480-, 530- and 600-nanometer filters.

  13. Self-similar magnetohydrodynamic boundary layers

    Energy Technology Data Exchange (ETDEWEB)

    Nunez, Manuel; Lastra, Alberto, E-mail: mnjmhd@am.uva.e [Departamento de Analisis Matematico, Universidad de Valladolid, 47005 Valladolid (Spain)

    2010-10-15

    The boundary layer created by parallel flow in a magnetized fluid of high conductivity is considered in this paper. Under appropriate boundary conditions, self-similar solutions analogous to the ones studied by Blasius for the hydrodynamic problem may be found. It is proved that for these to be stable, the size of the Alfven velocity at the outer flow must be smaller than the flow velocity, a fact that has a ready physical explanation. The process by which the transverse velocity and the thickness of the layer grow with the size of the Alfven velocity is detailed.

  14. Self-similar magnetohydrodynamic boundary layers

    International Nuclear Information System (INIS)

    Nunez, Manuel; Lastra, Alberto

    2010-01-01

    The boundary layer created by parallel flow in a magnetized fluid of high conductivity is considered in this paper. Under appropriate boundary conditions, self-similar solutions analogous to the ones studied by Blasius for the hydrodynamic problem may be found. It is proved that for these to be stable, the size of the Alfven velocity at the outer flow must be smaller than the flow velocity, a fact that has a ready physical explanation. The process by which the transverse velocity and the thickness of the layer grow with the size of the Alfven velocity is detailed.

  15. [Similarity system theory to evaluate similarity of chromatographic fingerprints of traditional Chinese medicine].

    Science.gov (United States)

    Liu, Yongsuo; Meng, Qinghua; Jiang, Shumin; Hu, Yuzhu

    2005-03-01

    The similarity evaluation of the fingerprints is one of the most important problems in the quality control of the traditional Chinese medicine (TCM). Similarity measures used to evaluate the similarity of the common peaks in the chromatogram of TCM have been discussed. Comparative studies were carried out among correlation coefficient, cosine of the angle and an improved extent similarity method using simulated data and experimental data. Correlation coefficient and cosine of the angle are not sensitive to the differences of the data set. They are still not sensitive to the differences of the data even after normalization. According to the similarity system theory, an improved extent similarity method was proposed. The improved extent similarity is more sensitive to the differences of the data sets than correlation coefficient and cosine of the angle. And the character of the data sets needs not to be changed compared with log-transformation. The improved extent similarity can be used to evaluate the similarity of the chromatographic fingerprints of TCM.

  16. [Genome similarity of Baikal omul and sig].

    Science.gov (United States)

    Bychenko, O S; Sukhanova, L V; Ukolova, S S; Skvortsov, T A; Potapov, V K; Azhikina, T L; Sverdlov, E D

    2009-01-01

    Two members of the Baikal sig family, a lake sig (Coregonus lavaretus baicalensis Dybovsky) and omul (C. autumnalis migratorius Georgi), are close relatives that diverged from the same ancestor 10-20 thousand years ago. In this work, we studied genomic polymorphism of these two fish species. The method of subtraction hybridization (SH) did not reveal the presence of extended sequences in the sig genome and their absence in the omul genome. All the fragments found by SH corresponded to polymorphous noncoding genome regions varying in mononucleotide substitutions and short deletions. Many of them are mapped close to genes of the immune system and have regions identical to the Tc-1-like transposons abundant among fish, whose transcription activity may affect the expression of adjacent genes. Thus, we showed for the first time that genetic differences between Baikal sig family members are extremely small and cannot be revealed by the SH method. This is another endorsement of the hypothesis on the close relationship between Baikal sig and omul and their evolutionarily recent divergence from a common ancestor.

  17. Adolescentes y maternidad en el cine: «Juno», «Precious» y «The Greatest» Teenagers and Motherhood in the Cinema: «Juno», «Precious» and «The Greatest»

    Directory of Open Access Journals (Sweden)

    Flora Marín Murillo

    2011-03-01

    Full Text Available En la actualidad son muchas las adolescentes en España que tienen embarazos no deseados. La ampliación de la Ley del aborto, así como la aprobación de la venta de la píldora del día después sin receta, han focalizado la atención en las jóvenes menores de 18 años. La maternidad, los embarazos no deseados y las alternativas ante estos son variables a las que las adolescentes se enfrentan en el mundo real, y sobre las cuales los filmes construyen sus propios discursos coincidentes o no con la realidad social. En las pantallas de cine películas como «Juno», «Precious» y «The Greatest» tratan bajo diferentes prismas el tema del embarazo adolescente. Estos textos audiovisuales inciden de manera directa en la reproducción y creación de modelos, actitudes y valores. Su influencia en la juventud es constatable y suponen una referencia junto con la familia y la escuela a la hora de adoptar determinados patrones de comportamiento e interiorizar arquetipos socialmente admitidos. Este trabajo examina estos filmes utilizando las herramientas tanto del lenguaje audiovisual como del análisis textual, atendiendo a una perspectiva de género. A través del análisis se constata qué visiones de la maternidad y el sexo en la adolescencia se construyen y cuáles son las estrategias de producción de sentido utilizadas. Los resultados muestran cómo los modelos y estereotipos tradicionales perviven bajo la apariencia de discursos audiovisuales renovados y alternativos.Today in Spain there are many teenagers who suffer unwanted pregnancies. The extension of the abortion law and the approval of the sale of morning-after pill without a prescription have focused attention on girls under 18. The possibilities of motherhood, an unwanted pregnancy and the alternatives are variables that young women face in the real world, and upon which the discourses of films are constructed, some of which coincide with reality and some of which do not. On the big

  18. Gait Recognition Using Image Self-Similarity

    Directory of Open Access Journals (Sweden)

    Chiraz BenAbdelkader

    2004-04-01

    Full Text Available Gait is one of the few biometrics that can be measured at a distance, and is hence useful for passive surveillance as well as biometric applications. Gait recognition research is still at its infancy, however, and we have yet to solve the fundamental issue of finding gait features which at once have sufficient discrimination power and can be extracted robustly and accurately from low-resolution video. This paper describes a novel gait recognition technique based on the image self-similarity of a walking person. We contend that the similarity plot encodes a projection of gait dynamics. It is also correspondence-free, robust to segmentation noise, and works well with low-resolution video. The method is tested on multiple data sets of varying sizes and degrees of difficulty. Performance is best for fronto-parallel viewpoints, whereby a recognition rate of 98% is achieved for a data set of 6 people, and 70% for a data set of 54 people.

  19. Self-similarity in applied superconductivity

    International Nuclear Information System (INIS)

    Dresner, Lawrence

    1981-09-01

    Self-similarity is a descriptive term applying to a family of curves. It means that the family is invariant to a one-parameter group of affine (stretching) transformations. The property of self-similarity has been exploited in a wide variety of problems in applied superconductivity, namely, (i) transient distribution of the current among the filaments of a superconductor during charge-up, (ii) steady distribution of current among the filaments of a superconductor near the current leads, (iii) transient heat transfer in superfluid helium, (iv) transient diffusion in cylindrical geometry (important in studying the growth rate of the reacted layer in A15 materials), (v) thermal expulsion of helium from quenching cable-in-conduit conductors, (vi) eddy current heating of irregular plates by slow, ramped fields, and (vii) the specific heat of type-II superconductors. Most, but not all, of the applications involve differential equations, both ordinary and partial. The novel methods explained in this report should prove of great value in other fields, just as they already have done in applied superconductivity. (author)

  20. Phonological similarity effect in complex span task.

    Science.gov (United States)

    Camos, Valérie; Mora, Gérôme; Barrouillet, Pierre

    2013-01-01

    The aim of our study was to test the hypothesis that two systems are involved in verbal working memory; one is specifically dedicated to the maintenance of phonological representations through verbal rehearsal while the other would maintain multimodal representations through attentional refreshing. This theoretical framework predicts that phonologically related phenomena such as the phonological similarity effect (PSE) should occur when the domain-specific system is involved in maintenance, but should disappear when concurrent articulation hinders its use. Impeding maintenance in the domain-general system by a concurrent attentional demand should impair recall performance without affecting PSE. In three experiments, we manipulated the concurrent articulation and the attentional demand induced by the processing component of complex span tasks in which participants had to maintain lists of either similar or dissimilar words. Confirming our predictions, PSE affected recall performance in complex span tasks. Although both the attentional demand and the articulatory requirement of the concurrent task impaired recall, only the induction of an articulatory suppression during maintenance made the PSE disappear. These results suggest a duality in the systems devoted to verbal maintenance in the short term, constraining models of working memory.

  1. Popularity versus similarity in growing networks.

    Science.gov (United States)

    Papadopoulos, Fragkiskos; Kitsak, Maksim; Serrano, M Ángeles; Boguñá, Marián; Krioukov, Dmitri

    2012-09-27

    The principle that 'popularity is attractive' underlies preferential attachment, which is a common explanation for the emergence of scaling in growing networks. If new connections are made preferentially to more popular nodes, then the resulting distribution of the number of connections possessed by nodes follows power laws, as observed in many real networks. Preferential attachment has been directly validated for some real networks (including the Internet), and can be a consequence of different underlying processes based on node fitness, ranking, optimization, random walks or duplication. Here we show that popularity is just one dimension of attractiveness; another dimension is similarity. We develop a framework in which new connections optimize certain trade-offs between popularity and similarity, instead of simply preferring popular nodes. The framework has a geometric interpretation in which popularity preference emerges from local optimization. As opposed to preferential attachment, our optimization framework accurately describes the large-scale evolution of technological (the Internet), social (trust relationships between people) and biological (Escherichia coli metabolic) networks, predicting the probability of new links with high precision. The framework that we have developed can thus be used for predicting new links in evolving networks, and provides a different perspective on preferential attachment as an emergent phenomenon.

  2. Predicting the performance of fingerprint similarity searching.

    Science.gov (United States)

    Vogt, Martin; Bajorath, Jürgen

    2011-01-01

    Fingerprints are bit string representations of molecular structure that typically encode structural fragments, topological features, or pharmacophore patterns. Various fingerprint designs are utilized in virtual screening and their search performance essentially depends on three parameters: the nature of the fingerprint, the active compounds serving as reference molecules, and the composition of the screening database. It is of considerable interest and practical relevance to predict the performance of fingerprint similarity searching. A quantitative assessment of the potential that a fingerprint search might successfully retrieve active compounds, if available in the screening database, would substantially help to select the type of fingerprint most suitable for a given search problem. The method presented herein utilizes concepts from information theory to relate the fingerprint feature distributions of reference compounds to screening libraries. If these feature distributions do not sufficiently differ, active database compounds that are similar to reference molecules cannot be retrieved because they disappear in the "background." By quantifying the difference in feature distribution using the Kullback-Leibler divergence and relating the divergence to compound recovery rates obtained for different benchmark classes, fingerprint search performance can be quantitatively predicted.

  3. Sequence-dependent DNA deformability studied using molecular dynamics simulations.

    Science.gov (United States)

    Fujii, Satoshi; Kono, Hidetoshi; Takenaka, Shigeori; Go, Nobuhiro; Sarai, Akinori

    2007-01-01

    Proteins recognize specific DNA sequences not only through direct contact between amino acids and bases, but also indirectly based on the sequence-dependent conformation and deformability of the DNA (indirect readout). We used molecular dynamics simulations to analyze the sequence-dependent DNA conformations of all 136 possible tetrameric sequences sandwiched between CGCG sequences. The deformability of dimeric steps obtained by the simulations is consistent with that by the crystal structures. The simulation results further showed that the conformation and deformability of the tetramers can highly depend on the flanking base pairs. The conformations of xATx tetramers show the most rigidity and are not affected by the flanking base pairs and the xYRx show by contrast the greatest flexibility and change their conformations depending on the base pairs at both ends, suggesting tetramers with the same central dimer can show different deformabilities. These results suggest that analysis of dimeric steps alone may overlook some conformational features of DNA and provide insight into the mechanism of indirect readout during protein-DNA recognition. Moreover, the sequence dependence of DNA conformation and deformability may be used to estimate the contribution of indirect readout to the specificity of protein-DNA recognition as well as nucleosome positioning and large-scale behavior of nucleic acids.

  4. Structural similarities between prokaryotic and eukaryotic 5S ribosomal RNAs

    International Nuclear Information System (INIS)

    Welfle, H.; Boehm, S.; Damaschun, G.; Fabian, H.; Gast, K.; Misselwitz, R.; Mueller, J.J.; Zirwer, D.; Filimonov, V.V.; Venyaminov, S.Yu.; Zalkova, T.N.

    1986-01-01

    5S RNAs from rat liver and E. coli have been studied by diffuse X-ray and dynamic light scattering and by infrared and Raman spectroscopy. Identical structures at a resolution of 1 nm can be deduced from the comparison of the experimental X-ray scattering curves and electron distance distribution functions and from the agreement of the shape parameters. A flat shape model with a compact central region and two protruding arms was derived. Double helical stems are eleven-fold helices with a mean base pair distance of 0.28 nm. The number of base pairs (26 GC, 9 AU for E. coli; 27 GC, 9 AU for rat liver) and the degree of base stacking are the same within the experimental error. A very high regularity in the ribophosphate backbone is indicated for both 5S RNAs. The observed structural similarity and the consensus secondary structure pattern derived from comparative sequence analyses suggest the conclusion that prokaryotic and eukaryotic 5S RNAs are in general very similar with respect to their fundamental structural features. (author)

  5. GIS: a comprehensive source for protein structure similarities.

    Science.gov (United States)

    Guerler, Aysam; Knapp, Ernst-Walter

    2010-07-01

    A web service for analysis of protein structures that are sequentially or non-sequentially similar was generated. Recently, the non-sequential structure alignment algorithm GANGSTA+ was introduced. GANGSTA+ can detect non-sequential structural analogs for proteins stated to possess novel folds. Since GANGSTA+ ignores the polypeptide chain connectivity of secondary structure elements (i.e. alpha-helices and beta-strands), it is able to detect structural similarities also between proteins whose sequences were reshuffled during evolution. GANGSTA+ was applied in an all-against-all comparison on the ASTRAL40 database (SCOP version 1.75), which consists of >10,000 protein domains yielding about 55 x 10(6) possible protein structure alignments. Here, we provide the resulting protein structure alignments as a public web-based service, named GANGSTA+ Internet Services (GIS). We also allow to browse the ASTRAL40 database of protein structures with GANGSTA+ relative to an externally given protein structure using different constraints to select specific results. GIS allows us to analyze protein structure families according to the SCOP classification scheme. Additionally, users can upload their own protein structures for pairwise protein structure comparison, alignment against all protein structures of the ASTRAL40 database (SCOP version 1.75) or symmetry analysis. GIS is publicly available at http://agknapp.chemie.fu-berlin.de/gplus.

  6. On fuzzy semantic similarity measure for DNA coding.

    Science.gov (United States)

    Ahmad, Muneer; Jung, Low Tang; Bhuiyan, Md Al-Amin

    2016-02-01

    A coding measure scheme numerically translates the DNA sequence to a time domain signal for protein coding regions identification. A number of coding measure schemes based on numerology, geometry, fixed mapping, statistical characteristics and chemical attributes of nucleotides have been proposed in recent decades. Such coding measure schemes lack the biologically meaningful aspects of nucleotide data and hence do not significantly discriminate coding regions from non-coding regions. This paper presents a novel fuzzy semantic similarity measure (FSSM) coding scheme centering on FSSM codons׳ clustering and genetic code context of nucleotides. Certain natural characteristics of nucleotides i.e. appearance as a unique combination of triplets, preserving special structure and occurrence, and ability to own and share density distributions in codons have been exploited in FSSM. The nucleotides׳ fuzzy behaviors, semantic similarities and defuzzification based on the center of gravity of nucleotides revealed a strong correlation between nucleotides in codons. The proposed FSSM coding scheme attains a significant enhancement in coding regions identification i.e. 36-133% as compared to other existing coding measure schemes tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed A.; Bonny, Talal; Salama, Khaled N.

    2012-01-01

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  8. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed A.

    2012-01-26

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  9. Vere-Jones' self-similar branching model

    International Nuclear Information System (INIS)

    Saichev, A.; Sornette, D.

    2005-01-01

    Motivated by its potential application to earthquake statistics as well as for its intrinsic interest in the theory of branching processes, we study the exactly self-similar branching process introduced recently by Vere-Jones. This model extends the ETAS class of conditional self-excited branching point-processes of triggered seismicity by removing the problematic need for a minimum (as well as maximum) earthquake size. To make the theory convergent without the need for the usual ultraviolet and infrared cutoffs, the distribution of magnitudes m ' of daughters of first-generation of a mother of magnitude m has two branches m ' ' >m with exponent β+d, where β and d are two positive parameters. We investigate the condition and nature of the subcritical, critical, and supercritical regime in this and in an extended version interpolating smoothly between several models. We predict that the distribution of magnitudes of events triggered by a mother of magnitude m over all generations has also two branches m ' ' >m with exponent β+h, with h=d√(1-s), where s is the fraction of triggered events. This corresponds to a renormalization of the exponent d into h by the hierarchy of successive generations of triggered events. For a significant part of the parameter space, the distribution of magnitudes over a full catalog summed over an average steady flow of spontaneous sources (immigrants) reproduces the distribution of the spontaneous sources with a single branch and is blind to the exponents β,d of the distribution of triggered events. Since the distribution of earthquake magnitudes is usually obtained with catalogs including many sequences, we conclude that the two branches of the distribution of aftershocks are not directly observable and the model is compatible with real seismic catalogs. In summary, the exactly self-similar Vere-Jones model provides an attractive new approach to model triggered seismicity, which alleviates delicate questions on the role of

  10. Similar or different?: the importance of similarities and differences for support between siblings

    NARCIS (Netherlands)

    Voorpostel, M.; van der Lippe, T.; Dykstra, P.A.; Flap, H.

    2007-01-01

    Using a large-scale Dutch national sample (N = 7,126), the authors examine the importance of similarities and differences in the sibling dyad for the provision of support. Similarities are assumed to enhance attraction and empathy; differences are assumed to be related to different possibilities for

  11. Similar or Different? The Importance of Similarities and Differences for Support Between Siblings

    NARCIS (Netherlands)

    Voorpostel, Marieke; Lippe, Tanja van der; Dykstra, Pearl A.; Flap, Henk

    2007-01-01

    Using a large-scale Dutch national sample (N = 7,126), the authors examine the importance of similarities and differences in the sibling dyad for the provision of support. Similarities are assumed to enhance attraction and empathy; differences are assumed to be related to different possibilities for

  12. Multilocus sequence typing of Xylella fastidiosa isolated from olive affected by “olive quick decline syndrome” in Italy

    Directory of Open Access Journals (Sweden)

    Toufic ELBEAINO

    2015-01-01

    Full Text Available The recent finding of Xylella fastidiosa (Xf in olive trees in southern Italy, the scanty molecular information on this bacterium and its association with the olive quick decline syndrome (OQDS prompted the necessity to isolate and acquire more genetic data on the type of strain present in that region. For the first time, the bacterium was isolated from infected olive on culture media. Genetic information were obtained through genomic comparison with other subspecies or strains. The sequences of thirteen genes from its genome, comprising seven housekeeping genes (leuA, petC, lacF, cysG, holC, nuoL and gltT usually used in multilocus sequence typing (MLST systems, and six genes involved in different biochemical functions (RNA Pol sigma-70 factor, hypothetical protein HL, 16S rRNA, rfbD, nuoN, and pilU, were analyzed. The sequences of the biochemical function genes were explored  individually to study the genetic structure of this bacterium, while the MLST genes were linked together into one concatameric sequence (4161 bp long to increase the resolution of the phylogenetic analysis when compared with Xf strains previously reported. Sequence analyses of single genes showed that the Xf olive strain is distinct from the four previously defined taxons (Xf subsp. fastidiosa, Xf subsp. multiplex, Xf subsp. sandyi and Xf subsp. pauca with a dissimilarity rate that reached 4%. In particular, Xf from olive shared the greatest identity with the strain “9a5c” (subsp. pauca, but was nevertheless distinct from it. Similarly, the MLST based on concatameric sequences confirmed the genetic variance of Xf from olive by generating a novel sequence type profile (ST53. Phylogenetic tree analyses showed that Xf from olive clustered in one clade close to subspecies pauca (strains “9a5c” and “CVC0018”, but was nevertheless distinct from them. These results indicate molecular divergence of this olive bacterium with all other strains yet reported.

  13. Dynamical similarity of geomagnetic field reversals.

    Science.gov (United States)

    Valet, Jean-Pierre; Fournier, Alexandre; Courtillot, Vincent; Herrero-Bervera, Emilio

    2012-10-04

    No consensus has been reached so far on the properties of the geomagnetic field during reversals or on the main features that might reveal its dynamics. A main characteristic of the reversing field is a large decrease in the axial dipole and the dominant role of non-dipole components. Other features strongly depend on whether they are derived from sedimentary or volcanic records. Only thermal remanent magnetization of lava flows can capture faithful records of a rapidly varying non-dipole field, but, because of episodic volcanic activity, sequences of overlying flows yield incomplete records. Here we show that the ten most detailed volcanic records of reversals can be matched in a very satisfactory way, under the assumption of a common duration, revealing common dynamical characteristics. We infer that the reversal process has remained unchanged, with the same time constants and durations, at least since 180 million years ago. We propose that the reversing field is characterized by three successive phases: a precursory event, a 180° polarity switch and a rebound. The first and third phases reflect the emergence of the non-dipole field with large-amplitude secular variation. They are rarely both recorded at the same site owing to the rapidly changing field geometry and last for less than 2,500 years. The actual transit between the two polarities does not last longer than 1,000 years and might therefore result from mechanisms other than those governing normal secular variation. Such changes are too brief to be accurately recorded by most sediments.

  14. Dynamic programming algorithms for biological sequence comparison.

    Science.gov (United States)

    Pearson, W R; Miller, W

    1992-01-01

    Efficient dynamic programming algorithms are available for a broad class of protein and DNA sequence comparison problems. These algorithms require computer time proportional to the product of the lengths of the two sequences being compared [O(N2)] but require memory space proportional only to the sum of these lengths [O(N)]. Although the requirement for O(N2) time limits use of the algorithms to the largest computers when searching protein and DNA sequence databases, many other applications of these algorithms, such as calculation of distances for evolutionary trees and comparison of a new sequence to a library of sequence profiles, are well within the capabilities of desktop computers. In particular, the results of library searches with rapid searching programs, such as FASTA or BLAST, should be confirmed by performing a rigorous optimal alignment. Whereas rapid methods do not overlook significant sequence similarities, FASTA limits the number of gaps that can be inserted into an alignment, so that a rigorous alignment may extend the alignment substantially in some cases. BLAST does not allow gaps in the local regions that it reports; a calculation that allows gaps is very likely to extend the alignment substantially. Although a Monte Carlo evaluation of the statistical significance of a similarity score with a rigorous algorithm is much slower than the heuristic approach used by the RDF2 program, the dynamic programming approach should take less than 1 hr on a 386-based PC or desktop Unix workstation. For descriptive purposes, we have limited our discussion to methods for calculating similarity scores and distances that use gap penalties of the form g = rk. Nevertheless, programs for the more general case (g = q+rk) are readily available. Versions of these programs that run either on Unix workstations, IBM-PC class computers, or the Macintosh can be obtained from either of the authors.

  15. Sequence to Sequence - Video to Text

    Science.gov (United States)

    2015-12-11

    by centering x and y flow values around 128 and multiplying by a scalar such that flow values fall between 0 and 255. We also calculate the flow magni...as for MPII- MD. 4.2. Evaluation Metrics Quantitative evaluation of the models are performed us- ing the METEOR [7] metric which was originally pro...candidate refer- ence sentences. METEOR compares exact token matches, stemmed tokens, paraphrase matches, as well as semanti- cally similar matches

  16. Similarity problems and completely bounded maps

    CERN Document Server

    Pisier, Gilles

    2001-01-01

    These notes revolve around three similarity problems, appearing in three different contexts, but all dealing with the space B(H) of all bounded operators on a complex Hilbert space H. The first one deals with group representations, the second one with C* -algebras and the third one with the disc algebra. We describe them in detail in the introduction which follows. This volume is devoted to the background necessary to understand these three problems, to the solutions that are known in some special cases and to numerous related concepts, results, counterexamples or extensions which their investigation has generated. While the three problems seem different, it is possible to place them in a common framework using the key concept of "complete boundedness", which we present in detail. Using this notion, the three problems can all be formulated as asking whether "boundedness" implies "complete boundedness" for linear maps satisfying certain additional algebraic identities. Two chapters have been added on the HALMO...

  17. Social values as arguments: similar is convincing

    Science.gov (United States)

    Maio, Gregory R.; Hahn, Ulrike; Frost, John-Mark; Kuppens, Toon; Rehman, Nadia; Kamble, Shanmukh

    2014-01-01

    Politicians, philosophers, and rhetors engage in co-value argumentation: appealing to one value in order to support another value (e.g., “equality leads to freedom”). Across four experiments in the United Kingdom and India, we found that the psychological relatedness of values affects the persuasiveness of the arguments that bind them. Experiment 1 found that participants were more persuaded by arguments citing values that fulfilled similar motives than by arguments citing opposing values. Experiments 2 and 3 replicated this result using a wider variety of values, while finding that the effect is stronger among people higher in need for cognition and that the effect is mediated by the greater plausibility of co-value arguments that link motivationally compatible values. Experiment 4 extended the effect to real-world arguments taken from political propaganda and replicated the mediating effect of argument plausibility. The findings highlight the importance of value relatedness in argument persuasiveness. PMID:25147529

  18. Image Steganalysis with Binary Similarity Measures

    Directory of Open Access Journals (Sweden)

    Kharrazi Mehdi

    2005-01-01

    Full Text Available We present a novel technique for steganalysis of images that have been subjected to embedding by steganographic algorithms. The seventh and eighth bit planes in an image are used for the computation of several binary similarity measures. The basic idea is that the correlation between the bit planes as well as the binary texture characteristics within the bit planes will differ between a stego image and a cover image. These telltale marks are used to construct a classifier that can distinguish between stego and cover images. We also provide experimental results using some of the latest steganographic algorithms. The proposed scheme is found to have complementary performance vis-à-vis Farid's scheme in that they outperform each other in alternate embedding techniques.

  19. A Lithium Vapor Box Divertor Similarity Experiment

    Science.gov (United States)

    Cohen, Robert A.; Emdee, Eric D.; Goldston, Robert J.; Jaworski, Michael A.; Schwartz, Jacob A.

    2017-10-01

    A lithium vapor box divertor offers an alternate means of managing the extreme power density of divertor plasmas by leveraging gaseous lithium to volumetrically extract power. The vapor box divertor is a baffled slot with liquid lithium coated walls held at temperatures which increase toward the divertor floor. The resulting vapor pressure differential drives gaseous lithium from hotter chambers into cooler ones, where the lithium condenses and returns. A similarity experiment was devised to investigate the advantages offered by a vapor box divertor design. We discuss the design, construction, and early findings of the vapor box divertor experiment including vapor can construction, power transfer calculations, joint integrity tests, and thermocouple data logging. Heat redistribution of an incident plasma-based heat flux from a typical linear plasma device is also presented. This work supported by DOE Contract No. DE-AC02-09CH11466 and The Princeton Environmental Institute.

  20. Correct Bayesian and frequentist intervals are similar

    International Nuclear Information System (INIS)

    Atwood, C.L.

    1986-01-01

    This paper argues that Bayesians and frequentists will normally reach numerically similar conclusions, when dealing with vague data or sparse data. It is shown that both statistical methodologies can deal reasonably with vague data. With sparse data, in many important practical cases Bayesian interval estimates and frequentist confidence intervals are approximately equal, although with discrete data the frequentist intervals are somewhat longer. This is not to say that the two methodologies are equally easy to use: The construction of a frequentist confidence interval may require new theoretical development. Bayesians methods typically require numerical integration, perhaps over many variables. Also, Bayesian can easily fall into the trap of over-optimism about their amount of prior knowledge. But in cases where both intervals are found correctly, the two intervals are usually not very different. (orig.)

  1. Soldier motivation – different or similar?

    DEFF Research Database (Denmark)

    Brænder, Morten; Andersen, Lotte Bøgh

    Recent research in military sociology has shown that in addition to their strong peer motivation modern soldiers are oriented toward contributing to society. It has not, however, been tested how soldier motivation differs from the motivation of other citizens in this respect. In this paper......, by means of public service motivation, a concept developed within the public administration literature, we compare soldier and civilian motivation. The contribution of this paper is an analysis of whether and how Danish combat soldiers differs from other Danes in regard to public service motivation? Using...... surveys with similar questions, we find that soldiers are more normatively motivated to contribute to society than other citizens (higher commitment to the public interest), while their affectively based motivation is lower (lower compassion). This points towards a potential problem in regard...

  2. Social Values as Arguments: Similar is Convincing

    Directory of Open Access Journals (Sweden)

    Gregory R Maio

    2014-08-01

    Full Text Available Politicians, philosophers, and rhetors engage in co-value argumentation: appealing to one value in order to support another value (e.g., equality leads to freedom. Across four experiments in the United Kingdom and India, we found that the psychological relatedness of values affects the persuasiveness of the arguments that bind them. Experiment 1 found that participants were more persuaded by arguments citing values that fulfilled similar motives than by arguments citing opposing values. Experiments 2 and 3 replicated this result using a wider variety of values, while finding that the effect is stronger among people higher in need for cognition and that the effect is mediated by the greater plausibility of co-value arguments that link motivationally compatible values. Experiment 4 extended the effect to real-world arguments taken from political propaganda and replicated the mediating effect of argument plausibility. The findings highlight the importance of value relatedness in argument persuasiveness.

  3. Formulation of similarity porous media systems

    International Nuclear Information System (INIS)

    Anderson, R.M.; Ford, W.T.; Ruttan, A.; Strauss, M.J.

    1982-01-01

    The mathematical formulation of the Porous Media System (PMS) describing two-phase, immiscible, compressible fluid flow in linear, homogeneous porous media is reviewed and expanded. It is shown that families of common vertex, coaxial parabolas and families of parallel lines are the only families of curves on which solutions of the PMS may be constant. A coordinate transformation is used to change the partial differential equations of the PMS to a system of ordinary differential equations, referred to as a similarity Porous Media System (SPMS), in which the independent variable denotes movement from curve to curve in a selected family of curves. Properties of solutions of the first boundary value problem are developed for the SPMS

  4. Contextual Factors for Finding Similar Experts

    DEFF Research Database (Denmark)

    Hofmann, Katja; Balog, Krisztian; Bogers, Toine

    2010-01-01

    -seeking models, are rarely taken into account. In this article, we extend content-based expert-finding approaches with contextual factors that have been found to influence human expert finding. We focus on a task of science communicators in a knowledge-intensive environment, the task of finding similar experts......, given an example expert. Our approach combines expertise-seeking and retrieval research. First, we conduct a user study to identify contextual factors that may play a role in the studied task and environment. Then, we design expert retrieval models to capture these factors. We combine these with content......-based retrieval models and evaluate them in a retrieval experiment. Our main finding is that while content-based features are the most important, human participants also take contextual factors into account, such as media experience and organizational structure. We develop two principled ways of modeling...

  5. Complete genome sequence of a novel pestivirus from sheep.

    Science.gov (United States)

    Becher, Paul; Schmeiser, Stefanie; Oguzoglu, Tuba Cigdem; Postel, Alexander

    2012-10-01

    We report here the complete genome sequence of pestivirus strain Aydin/04-TR, which is the prototype of a group of similar viruses currently present in sheep and goats in Turkey. Sequence data from this virus showed that it clusters separately from the established and previously proposed tentative pestivirus species.

  6. Complete Genome Sequence of a Novel Pestivirus from Sheep

    OpenAIRE

    Becher, Paul; Schmeiser, Stefanie; Oguzoglu, Tuba Cigdem; Postel, Alexander

    2012-01-01

    We report here the complete genome sequence of pestivirus strain Aydin/04-TR, which is the prototype of a group of similar viruses currently present in sheep and goats in Turkey. Sequence data from this virus showed that it clusters separately from the established and previously proposed tentative pestivirus species.

  7. Sequence embedding for fast construction of guide trees for multiple sequence alignment

    LENUS (Irish Health Repository)

    Blackshields, Gordon

    2010-05-14

    Abstract Background The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http:\\/\\/www.clustal.org\\/mbed.tgz.

  8. Main sequence mass loss

    International Nuclear Information System (INIS)

    Brunish, W.M.; Guzik, J.A.; Willson, L.A.; Bowen, G.

    1987-01-01

    It has been hypothesized that variable stars may experience mass loss, driven, at least in part, by oscillations. The class of stars we are discussing here are the δ Scuti variables. These are variable stars with masses between about 1.2 and 2.25 M/sub θ/, lying on or very near the main sequence. According to this theory, high rotation rates enhance the rate of mass loss, so main sequence stars born in this mass range would have a range of mass loss rates, depending on their initial rotation velocity and the amplitude of the oscillations. The stars would evolve rapidly down the main sequence until (at about 1.25 M/sub θ/) a surface convection zone began to form. The presence of this convective region would slow the rotation, perhaps allowing magnetic braking to occur, and thus sharply reduce the mass loss rate. 7 refs

  9. A Unified Theoretical Framework for Cognitive Sequencing.

    Science.gov (United States)

    Savalia, Tejas; Shukla, Anuj; Bapi, Raju S

    2016-01-01

    The capacity to sequence information is central to human performance. Sequencing ability forms the foundation stone for higher order cognition related to language and goal-directed planning. Information related to the order of items, their timing, chunking and hierarchical organization are important aspects in sequencing. Past research on sequencing has emphasized two distinct and independent dichotomies: implicit vs. explicit and goal-directed vs. habits. We propose a theoretical framework unifying these two streams. Our proposal relies on brain's ability to implicitly extract statistical regularities from the stream of stimuli and with attentional engagement organizing sequences explicitly and hierarchically. Similarly, sequences that need to be assembled purposively to accomplish a goal require engagement of attentional processes. With repetition, these goal-directed plans become habits with concomitant disengagement of attention. Thus, attention and awareness play a crucial role in the implicit-to-explicit transition as well as in how goal-directed plans become automatic habits. Cortico-subcortical loops basal ganglia-frontal cortex and hippocampus-frontal cortex loops mediate the transition process. We show how the computational principles of model-free and model-based learning paradigms, along with a pivotal role for attention and awareness, offer a unifying framework for these two dichotomies. Based on this framework, we make testable predictions related to the potential influence of response-to-stimulus interval (RSI) on developing awareness in implicit learning tasks.

  10. A Unified Theoretical Framework for Cognitive Sequencing

    Directory of Open Access Journals (Sweden)

    Tejas Savalia

    2016-11-01

    Full Text Available The capacity to sequence information is central to human performance. Sequencing ability forms the foundation stone for higher order cognition related to language and goal-directed planning. Information related to the order of items, their timing, chunking and hierarchical organization are important aspects in sequencing. Past research on sequencing has emphasized two distinct and independent dichotomies: implicit versus explicit and goal-directed versus habits. We propose a theoretical framework unifying these two streams. Our proposal relies on brain's ability to implicitly extract statistical regularities from the stream of stimuli and with attentional engagement organizing sequences explicitly and hierarchically. Similarly, sequences that need to be assembled purposively to accomplish a goal require engagement of attentional processes. With repetition, these goal-directed plans become habits with concomitant disengagement of attention. Thus attention and awareness play a crucial role in the implicit-to-explicit transition as well as in how goal-directed plans become automatic habits. Cortico-subcortical loops ─ basal ganglia-frontal cortex and hippocampus-frontal cortex loops ─ mediate the transition process. We show how the computational principles of model-free and model-based learning paradigms, along with a pivotal role for attention and awareness, offer a unifying framework for these two dichotomies. Based on this framework, we make testable predictions related to the potential influence of response-to-stimulus interval (RSI on developing awareness in implicit learning tasks.

  11. Electricity sequence control

    International Nuclear Information System (INIS)

    Shin, Heung Ryeol

    2010-03-01

    The contents of the book are introduction of control system, like classification and control signal, introduction of electricity power switch, such as push-button and detection switch sensor for induction type and capacitance type machinery for control, solenoid valve, expression of sequence and type of electricity circuit about using diagram, time chart, marking and term, logic circuit like Yes, No, and, or and equivalence logic, basic electricity circuit, electricity sequence control, added condition, special program control about choice and jump of program, motor control, extra circuit on repeat circuit, pause circuit in a conveyer, safety regulations and rule about classification of electricity disaster and protective device for insulation.

  12. Next-generation sequencing

    DEFF Research Database (Denmark)

    Rieneck, Klaus; Bak, Mads; Jønson, Lars

    2013-01-01

    , Illumina); several millions of PCR sequences were analyzed. RESULTS: The results demonstrated the feasibility of diagnosing the fetal KEL1 or KEL2 blood group from cell-free DNA purified from maternal plasma. CONCLUSION: This method requires only one primer pair, and the large amount of sequence...... information obtained allows well for statistical analysis of the data. This general approach can be integrated into current laboratory practice and has numerous applications. Besides DNA-based predictions of blood group phenotypes, platelet phenotypes, or sickle cell anemia, and the determination of zygosity...

  13. Improvement of training set structure in fusion data cleaning using Time-Domain Global Similarity method

    International Nuclear Information System (INIS)

    Liu, J.; Lan, T.; Qin, H.

    2017-01-01

    Traditional data cleaning identifies dirty data by classifying original data sequences, which is a class-imbalanced problem since the proportion of incorrect data is much less than the proportion of correct ones for most diagnostic systems in Magnetic Confinement Fusion (MCF) devices. When using machine learning algorithms to classify diagnostic data based on class-imbalanced training set, most classifiers are biased towards the major class and show very poor classification rates on the minor class. By transforming the direct classification problem about original data sequences into a classification problem about the physical similarity between data sequences, the class-balanced effect of Time-Domain Global Similarity (TDGS) method on training set structure is investigated in this paper. Meanwhile, the impact of improved training set structure on data cleaning performance of TDGS method is demonstrated with an application example in EAST POlarimetry-INTerferometry (POINT) system.

  14. Identification of structural similarities between putative transmission proteins of Polymyxa and Spongospora transmitted bymoviruses and furoviruses.

    Science.gov (United States)

    Dessens, J T; Meyer, M

    1996-01-01

    Comparison of amino acid sequence and hydropathy profiles shows conserved, structural similarities between the capsid readthrough protein of potato mop top virus (transmitted by Spongospora subterranea) and furovirus and bymovirus proteins implicated in transmission by Polymyxa spp. This suggests that these proteins have a common ancestry and are involved in a common biological process: virus transmission by plasmodiophorid fungi.

  15. Collaborative Filtering Recommendation on Users' Interest Sequences.

    Directory of Open Access Journals (Sweden)

    Weijie Cheng

    Full Text Available As an important factor for improving recommendations, time information has been introduced to model users' dynamic preferences in many papers. However, the sequence of users' behaviour is rarely studied in recommender systems. Due to the users' unique behavior evolution patterns and personalized interest transitions among items, users' similarity in sequential dimension should be introduced to further distinguish users' preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users' interest sequences (IS that rank users' ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users' longest common sub-IS (LCSIS and the count of users' total common sub-IS (ACSIS. Then, these semantics are utilized to obtain users' IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users' preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction.

  16. Collaborative Filtering Recommendation on Users' Interest Sequences.

    Science.gov (United States)

    Cheng, Weijie; Yin, Guisheng; Dong, Yuxin; Dong, Hongbin; Zhang, Wansong

    2016-01-01

    As an important factor for improving recommendations, time information has been introduced to model users' dynamic preferences in many papers. However, the sequence of users' behaviour is rarely studied in recommender systems. Due to the users' unique behavior evolution patterns and personalized interest transitions among items, users' similarity in sequential dimension should be introduced to further distinguish users' preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users' interest sequences (IS) that rank users' ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users' longest common sub-IS (LCSIS) and the count of users' total common sub-IS (ACSIS). Then, these semantics are utilized to obtain users' IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users' preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction.

  17. Collaborative Filtering Recommendation on Users’ Interest Sequences

    Science.gov (United States)

    Cheng, Weijie; Yin, Guisheng; Dong, Yuxin; Dong, Hongbin; Zhang, Wansong

    2016-01-01

    As an important factor for improving recommendations, time information has been introduced to model users’ dynamic preferences in many papers. However, the sequence of users’ behaviour is rarely studied in recommender systems. Due to the users’ unique behavior evolution patterns and personalized interest transitions among items, users’ similarity in sequential dimension should be introduced to further distinguish users’ preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users’ interest sequences (IS) that rank users’ ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users’ longest common sub-IS (LCSIS) and the count of users’ total common sub-IS (ACSIS). Then, these semantics are utilized to obtain users’ IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users’ preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction. PMID:27195787

  18. BSSF: a fingerprint based ultrafast binding site similarity search and function analysis server

    Directory of Open Access Journals (Sweden)

    Jiang Hualiang

    2010-01-01

    Full Text Available Abstract Background Genome sequencing and post-genomics projects such as structural genomics are extending the frontier of the study of sequence-structure-function relationship of genes and their products. Although many sequence/structure-based methods have been devised with the aim of deciphering this delicate relationship, there still remain large gaps in this fundamental problem, which continuously drives researchers to develop novel methods to extract relevant information from sequences and structures and to infer the functions of newly identified genes by genomics technology. Results Here we present an ultrafast method, named BSSF(Binding Site Similarity & Function, which enables researchers to conduct similarity searches in a comprehensive three-dimensional binding site database extracted from PDB structures. This method utilizes a fingerprint representation of the binding site and a validated statistical Z-score function scheme to judge the similarity between the query and database items, even if their similarities are only constrained in a sub-pocket. This fingerprint based similarity measurement was also validated on a known binding site dataset by comparing with geometric hashing, which is a standard 3D similarity method. The comparison clearly demonstrated the utility of this ultrafast method. After conducting the database searching, the hit list is further analyzed to provide basic statistical information about the occurrences of Gene Ontology terms and Enzyme Commission numbers, which may benefit researchers by helping them to design further experiments to study the query proteins. Conclusions This ultrafast web-based system will not only help researchers interested in drug design and structural genomics to identify similar binding sites, but also assist them by providing further analysis of hit list from database searching.

  19. Similarity queries for temporal toxicogenomic expression profiles.

    Directory of Open Access Journals (Sweden)

    Adam A Smith

    2008-07-01

    Full Text Available We present an approach for answering similarity queries about gene expression time series that is motivated by the task of characterizing the potential toxicity of various chemicals. Our approach involves two key aspects. First, our method employs a novel alignment algorithm based on time warping. Our time warping algorithm has several advantages over previous approaches. It allows the user to impose fairly strong biases on the form that the alignments can take, and it permits a type of local alignment in which the entirety of only one series has to be aligned. Second, our method employs a relaxed spline interpolation to predict expression responses for unmeasured time points, such that the spline does not necessarily exactly fit every observed point. We evaluate our approach using expression time series from the Edge toxicology database. Our experiments show the value of using spline representations for sparse time series. More significantly, they show that our time warping method provides more accurate alignments and classifications than previous standard alignment methods for time series.

  20. Humans and mice express similar olfactory preferences.

    Directory of Open Access Journals (Sweden)

    Nathalie Mandairon

    Full Text Available In humans, the pleasantness of odors is a major contributor to social relationships and food intake. Smells evoke attraction and repulsion responses, reflecting the hedonic value of the odorant. While olfactory preferences are known to be strongly modulated by experience and learning, it has been recently suggested that, in humans, the pleasantness of odors may be partly explained by the physicochemical properties of the odorant molecules themselves. If odor hedonic value is indeed predetermined by odorant structure, then it could be hypothesized that other species will show similar odor preferences to humans. Combining behavioral and psychophysical approaches, we here show that odorants rated as pleasant by humans were also those which, behaviorally, mice investigated longer and human subjects sniffed longer, thereby revealing for the first time a component of olfactory hedonic perception conserved across species. Consistent with this, we further show that odor pleasantness rating in humans and investigation time in mice were both correlated with the physicochemical properties of the molecules, suggesting that olfactory preferences are indeed partly engraved in the physicochemical structure of the odorant. That odor preferences are shared between mammal species and are guided by physicochemical features of odorant stimuli strengthens the view that odor preference is partially predetermined. These findings open up new perspectives for the study of the neural mechanisms of hedonic perception.

  1. Different-but-Similar Judgments by Bumblebees

    Directory of Open Access Journals (Sweden)

    Vicki Xu

    2016-08-01

    Full Text Available This study examines picture perception in an invertebrate. Two questions regarding possible picture-object correspondence are addressed for bumblebees (Bombus impatiens: (1 Do bees perceive the difference between an object and its corresponding picture even when they have not been trained to do so? (2 Do they also perceive the similarity? Twenty bees from each of four colonies underwent discrimination training of stimuli placed in a radial maze. Bees were trained to discriminate between two objects (artificial flowers in one group and between photos of those objects in another. Subsequent testing on unrewarding stimuli revealed, for both groups, a significant discrimination between the object and its photo: discrimination training was not necessary for bees to detect a difference between corresponding objects and pictures. We obtained not only object-to-picture transfer, as in previous research, but also the reverse: picture-to-object transfer. In the absence of the rewarding object, its photo, though never seen before by the bees, was accepted as a substitute. The reverse was also true. Bumblebees treated pictures as “different-but-similar” without having been trained to do so, which is in turn useful in floral categorization.

  2. Block generators for the similarity renormalization group

    Energy Technology Data Exchange (ETDEWEB)

    Huether, Thomas; Roth, Robert [TU Darmstadt (Germany)

    2016-07-01

    The Similarity Renormalization Group (SRG) is a powerful tool to improve convergence behavior of many-body calculations using NN and 3N interactions from chiral effective field theory. The SRG method decouples high and low-energy physics, through a continuous unitary transformation implemented via a flow equation approach. The flow is determined by a generator of choice. This generator governs the decoupling pattern and, thus, the improvement of convergence, but it also induces many-body interactions. Through the design of the generator we can optimize the balance between convergence and induced forces. We explore a new class of block generators that restrict the decoupling to the high-energy sector and leave the diagonalization in the low-energy sector to the many-body method. In this way one expects a suppression of induced forces. We analyze the induced many-body forces and the convergence behavior in light and medium-mass nuclei in No-Core Shell Model and In-Medium SRG calculations.

  3. State and Mafia, Differences and Similarities

    Directory of Open Access Journals (Sweden)

    Alfano Vincenzo

    2015-02-01

    Full Text Available The purpose of this article is to investigate about the differences and, if any, the similarities among the modern State and the mafia criminal organizations. In particular, starting from their definitions, I will try to find the differences between State and mafia, to then focus on the operational aspects of the functioning of these two organizations, with specific reference to the effect/impact that both these human constructs have on citizens’ existences, and especially on citizen’s economic lives. All this in order to understand whether it is possible to identify an objective difference – beside morals – between taxation by the modern State and extortion by criminal organizations. With this of course I do not want to argue that the mafia is in any way justifiable or absolvable, nor that it is better than the State. However, I want to investigate whether there is a real, logical reason why the State should be considered by the citizens more desirable than the criminal organizations oppressing Southern Italy, from a strictly logical point of view and not from the point of view of ethics and morality.

  4. Genetic and 'cultural' similarity in wild chimpanzees.

    Science.gov (United States)

    Langergraber, Kevin E; Boesch, Christophe; Inoue, Eiji; Inoue-Murayama, Miho; Mitani, John C; Nishida, Toshisada; Pusey, Anne; Reynolds, Vernon; Schubert, Grit; Wrangham, Richard W; Wroblewski, Emily; Vigilant, Linda

    2011-02-07

    The question of whether animals possess 'cultures' or 'traditions' continues to generate widespread theoretical and empirical interest. Studies of wild chimpanzees have featured prominently in this discussion, as the dominant approach used to identify culture in wild animals was first applied to them. This procedure, the 'method of exclusion,' begins by documenting behavioural differences between groups and then infers the existence of culture by eliminating ecological explanations for their occurrence. The validity of this approach has been questioned because genetic differences between groups have not explicitly been ruled out as a factor contributing to between-group differences in behaviour. Here we investigate this issue directly by analysing genetic and behavioural data from nine groups of wild chimpanzees. We find that the overall levels of genetic and behavioural dissimilarity between groups are highly and statistically significantly correlated. Additional analyses show that only a very small number of behaviours vary between genetically similar groups, and that there is no obvious pattern as to which classes of behaviours (e.g. tool-use versus communicative) have a distribution that matches patterns of between-group genetic dissimilarity. These results indicate that genetic dissimilarity cannot be eliminated as playing a major role in generating group differences in chimpanzee behaviour.

  5. Multidimensional Scaling Visualization Using Parametric Similarity Indices

    Directory of Open Access Journals (Sweden)

    J. A. Tenreiro Machado

    2015-03-01

    Full Text Available In this paper, we apply multidimensional scaling (MDS and parametric similarity indices (PSI in the analysis of complex systems (CS. Each CS is viewed as a dynamical system, exhibiting an output time-series to be interpreted as a manifestation of its behavior. We start by adopting a sliding window to sample the original data into several consecutive time periods. Second, we define a given PSI for tracking pieces of data. We then compare the windows for different values of the parameter, and we generate the corresponding MDS maps of ‘points’. Third, we use Procrustes analysis to linearly transform the MDS charts for maximum superposition and to build a globalMDS map of “shapes”. This final plot captures the time evolution of the phenomena and is sensitive to the PSI adopted. The generalized correlation, theMinkowski distance and four entropy-based indices are tested. The proposed approach is applied to the Dow Jones Industrial Average stock market index and the Europe Brent Spot Price FOB time-series.

  6. Exploring similarities among many species distributions

    Science.gov (United States)

    Simmerman, Scott; Wang, Jingyuan; Osborne, James; Shook, Kimberly; Huang, Jian; Godsoe, William; Simons, Theodore R.

    2012-01-01

    Collecting species presence data and then building models to predict species distribution has been long practiced in the field of ecology for the purpose of improving our understanding of species relationships with each other and with the environment. Due to limitations of computing power as well as limited means of using modeling software on HPC facilities, past species distribution studies have been unable to fully explore diverse data sets. We build a system that can, for the first time to our knowledge, leverage HPC to support effective exploration of species similarities in distribution as well as their dependencies on common environmental conditions. Our system can also compute and reveal uncertainties in the modeling results enabling domain experts to make informed judgments about the data. Our work was motivated by and centered around data collection efforts within the Great Smoky Mountains National Park that date back to the 1940s. Our findings present new research opportunities in ecology and produce actionable field-work items for biodiversity management personnel to include in their planning of daily management activities.

  7. Similarities and differences in vapor explosion criteria

    International Nuclear Information System (INIS)

    Cronenberg, A.W.

    1978-01-01

    An overview of recent ideas pertaining to vapor explosion criteria indicates that in general sense, a consensus of opinion is emerging on the conditions applicable to explosive vaporization. Experimental and theoretical work has lead a number of investigators to the formulation of such conditions which are quite similar in many respects, although the quantitative details of the model formulation of such conditions are somewhat different. All model concepts are consistent in that an initial period of stable film boiling, separating molten fuel from coolant, is considered necessary (at least for large-scale interactions and efficient intermixing), with subsequent breakdown of film boiling due to pressure and/or thermal effects, followed by intimate fuel-coolant contact and a rapid vaporization process which is sufficient to cause shock pressurization. Although differences arise as to the conditions for and the energetics associated with film boiling destabilization and the mode and energetics of fragmentation and intermixing. However, the principal area of difference seems to be the question of what constitutes the requisite condition(s) for rapid vapor production to cause shock pressurization

  8. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene...

  9. THE RHIC SEQUENCER

    International Nuclear Information System (INIS)

    VAN ZEIJTS, J.; DOTTAVIO, T.; FRAK, B.; MICHNOFF, R.

    2001-01-01

    The Relativistic Heavy Ion Collider (RHIC) has a high level asynchronous time-line driven by a controlling program called the ''Sequencer''. Most high-level magnet and beam related issues are orchestrated by this system. The system also plays an important task in coordinated data acquisition and saving. We present the program, operator interface, operational impact and experience

  10. Twin anemia polycythemia sequence

    NARCIS (Netherlands)

    Slaghekke, Femke

    2014-01-01

    In this thesis we describe that Twin Anemia Polycythemia Sequence (TAPS) is a form of chronic feto-fetal transfusion in monochorionic (identical) twins based on a small amount of blood transfusion through very small anastomoses. For the antenatal diagnosis of TAPS, Middle Cerebral Artery – Peak

  11. simple sequence repeat (SSR)

    African Journals Online (AJOL)

    In the present study, 78 mapped simple sequence repeat (SSR) markers representing 11 linkage groups of adzuki bean were evaluated for transferability to mungbean and related Vigna spp. 41 markers amplified characteristic bands in at least one Vigna species. The transferability percentage across the genotypes ranged ...

  12. Similarly shaped letters evoke similar colors in grapheme-color synesthesia.

    Science.gov (United States)

    Brang, David; Rouw, Romke; Ramachandran, V S; Coulson, Seana

    2011-04-01

    Grapheme-color synesthesia is a neurological condition in which viewing numbers or letters (graphemes) results in the concurrent sensation of color. While the anatomical substrates underlying this experience are well understood, little research to date has investigated factors influencing the particular colors associated with particular graphemes or how synesthesia occurs developmentally. A recent suggestion of such an interaction has been proposed in the cascaded cross-tuning (CCT) model of synesthesia, which posits that in synesthetes connections between grapheme regions and color area V4 participate in a competitive activation process, with synesthetic colors arising during the component-stage of grapheme processing. This model more directly suggests that graphemes sharing similar component features (lines, curves, etc.) should accordingly activate more similar synesthetic colors. To test this proposal, we created and regressed synesthetic color-similarity matrices for each of 52 synesthetes against a letter-confusability matrix, an unbiased measure of visual similarity among graphemes. Results of synesthetes' grapheme-color correspondences indeed revealed that more similarly shaped graphemes corresponded with more similar synesthetic colors, with stronger effects observed in individuals with more intense synesthetic experiences (projector synesthetes). These results support the CCT model of synesthesia, implicate early perceptual mechanisms as driving factors in the elicitation of synesthetic hues, and further highlight the relationship between conceptual and perceptual factors in this phenomenon. Copyright © 2011 Elsevier Ltd. All rights reserved.

  13. Asteroid clusters similar to asteroid pairs

    Science.gov (United States)

    Pravec, P.; Fatka, P.; Vokrouhlický, D.; Scheeres, D. J.; Kušnirák, P.; Hornoch, K.; Galád, A.; Vraštil, J.; Pray, D. P.; Krugly, Yu. N.; Gaftonyuk, N. M.; Inasaridze, R. Ya.; Ayvazian, V. R.; Kvaratskhelia, O. I.; Zhuzhunadze, V. T.; Husárik, M.; Cooney, W. R.; Gross, J.; Terrell, D.; Világi, J.; Kornoš, L.; Gajdoš, Š.; Burkhonov, O.; Ehgamberdiev, Sh. A.; Donchev, Z.; Borisov, G.; Bonev, T.; Rumyantsev, V. V.; Molotov, I. E.

    2018-04-01

    We studied the membership, size ratio and rotational properties of 13 asteroid clusters consisting of between 3 and 19 known members that are on similar heliocentric orbits. By backward integrations of their orbits, we confirmed their cluster membership and estimated times elapsed since separation of the secondaries (the smaller cluster members) from the primary (i.e., cluster age) that are between 105 and a few 106 years. We ran photometric observations for all the cluster primaries and a sample of secondaries and we derived their accurate absolute magnitudes and rotation periods. We found that 11 of the 13 clusters follow the same trend of primary rotation period vs mass ratio as asteroid pairs that was revealed by Pravec et al. (2010). We generalized the model of the post-fission system for asteroid pairs by Pravec et al. (2010) to a system of N components formed by rotational fission and we found excellent agreement between the data for the 11 asteroid clusters and the prediction from the theory of their formation by rotational fission. The two exceptions are the high-mass ratio (q > 0.7) clusters of (18777) Hobson and (22280) Mandragora for which a different formation mechanism is needed. Two candidate mechanisms for formation of more than one secondary by rotational fission were published: the secondary fission process proposed by Jacobson and Scheeres (2011) and a cratering collision event onto a nearly critically rotating primary proposed by Vokrouhlický et al. (2017). It will have to be revealed from future studies which of the clusters were formed by one or the other process. To that point, we found certain further interesting properties and features of the asteroid clusters that place constraints on the theories of their formation, among them the most intriguing being the possibility of a cascade disruption for some of the clusters.

  14. Expanding the boundaries of local similarity analysis.

    Science.gov (United States)

    Durno, W Evan; Hanson, Niels W; Konwar, Kishori M; Hallam, Steven J

    2013-01-01

    Pairwise comparison of time series data for both local and time-lagged relationships is a computationally challenging problem relevant to many fields of inquiry. The Local Similarity Analysis (LSA) statistic identifies the existence of local and lagged relationships, but determining significance through a p-value has been algorithmically cumbersome due to an intensive permutation test, shuffling rows and columns and repeatedly calculating the statistic. Furthermore, this p-value is calculated with the assumption of normality -- a statistical luxury dissociated from most real world datasets. To improve the performance of LSA on big datasets, an asymptotic upper bound on the p-value calculation was derived without the assumption of normality. This change in the bound calculation markedly improved computational speed from O(pm²n) to O(m²n), where p is the number of permutations in a permutation test, m is the number of time series, and n is the length of each time series. The bounding process is implemented as a computationally efficient software package, FASTLSA, written in C and optimized for threading on multi-core computers, improving its practical computation time. We computationally compare our approach to previous implementations of LSA, demonstrate broad applicability by analyzing time series data from public health, microbial ecology, and social media, and visualize resulting networks using the Cytoscape software. The FASTLSA software package expands the boundaries of LSA allowing analysis on datasets with millions of co-varying time series. Mapping metadata onto force-directed graphs derived from FASTLSA allows investigators to view correlated cliques and explore previously unrecognized network relationships. The software is freely available for download at: http://www.cmde.science.ubc.ca/hallam/fastLSA/.

  15. UNSOLVED AND LATENT CRIME: DIFFERENCES AND SIMILARITIES

    Directory of Open Access Journals (Sweden)

    Mikhail Kleymenov

    2017-01-01

    Full Text Available УДК 343Purpose of the article is to study the specific legal and informational nature of the unsolved crime in comparison with the phenomenon of delinquency, special study and analysis to improve the efficiency of law enforcement.Methods of research are abstract-logical, systematic, statistical, study of documents. The main results of research. Unsolved crime has specific legal, statistical and informational na-ture as the crime phenomenon, which is expressed in cumulative statistical population of unsolved crimes. An array of unsolved crimes is the sum of the number of acts, things of which is suspended and not terminated. The fault of the perpetrator in these cases is not proven, they are not considered by the court, it is not a conviction. Unsolved crime must be registered. Latent crime has a different informational nature. The main symptom of latent crimes is the uncertainty for the subjects of law enforcement, which delegated functions of identification, registration and accounting. Latent crime is not recorded. At the same time, there is a "border" area between the latent and unsolved crimes, which includes covered from the account of the crime. In modern Russia the majority of crimes covered from accounting by passing the decision about refusal in excitation of criminal case. Unsolved crime on their criminogenic consequences represents a significant danger to the public is higher compared to latent crime.It is conducted in the article a special analysis of the differences and similarities in the unsolved latent crime for the first time in criminological literature.The analysis proves the need for radical changes in the current Russian assessment of the state of crime and law enforcement to solve crimes. The article argues that an unsolved crime is a separate and, in contrast to latent crime, poorly understood phenomenon. However unsolved latent crime and have common features and areas of interaction.

  16. Targeted sequencing of plant genomes

    Science.gov (United States)

    Mark D. Huynh

    2014-01-01

    Next-generation sequencing (NGS) has revolutionized the field of genetics by providing a means for fast and relatively affordable sequencing. With the advancement of NGS, wholegenome sequencing (WGS) has become more commonplace. However, sequencing an entire genome is still not cost effective or even beneficial in all cases. In studies that do not require a whole-...

  17. Almost convergence of triple sequences

    OpenAIRE

    Ayhan Esi; M.Necdet Catalbas

    2013-01-01

    In this paper we introduce and study the concepts of almost convergence and almost Cauchy for triple sequences. Weshow that the set of almost convergent triple sequences of 0's and 1's is of the first category and also almost everytriple sequence of 0's and 1's is not almost convergent.Keywords: almost convergence, P-convergent, triple sequence.

  18. A few Smarandache Integer Sequences

    OpenAIRE

    Ibstedt, Henry

    2010-01-01

    This paper deals with the analysis of a few Smarandache Integer Sequences which first appeared in Properties or the Numbers, F. Smarandache, University or Craiova Archives, 1975. The first four sequences are recurrence generated sequences while the last three are concatenation sequences.

  19. Advanced Models and Algorithms for Self-Similar IP Network Traffic Simulation and Performance Analysis

    Science.gov (United States)

    Radev, Dimitar; Lokshina, Izabella

    2010-11-01

    The paper examines self-similar (or fractal) properties of real communication network traffic data over a wide range of time scales. These self-similar properties are very different from the properties of traditional models based on Poisson and Markov-modulated Poisson processes. Advanced fractal models of sequentional generators and fixed-length sequence generators, and efficient algorithms that are used to simulate self-similar behavior of IP network traffic data are developed and applied. Numerical examples are provided; and simulation results are obtained and analyzed.

  20. Allele Re-sequencing Technologies

    DEFF Research Database (Denmark)

    Byrne, Stephen; Farrell, Jacqueline Danielle; Asp, Torben

    2013-01-01

    The development of next-generation sequencing technologies has made sequencing an affordable approach for detection of genetic variations associated with various traits. However, the cost of whole genome re-sequencing still remains too high to be feasible for many plant species with large...... alternative to whole genome re-sequencing to identify causative genetic variations in plants. One challenge, however, will be efficient bioinformatics strategies for data handling and analysis from the increasing amount of sequence information....

  1. Amino acid sequence analysis of the annexin super-gene family of proteins.

    Science.gov (United States)

    Barton, G J; Newman, R H; Freemont, P S; Crumpton, M J

    1991-06-15

    The annexins are a widespread family of calcium-dependent membrane-binding proteins. No common function has been identified for the family and, until recently, no crystallographic data existed for an annexin. In this paper we draw together 22 available annexin sequences consisting of 88 similar repeat units, and apply the techniques of multiple sequence alignment, pattern matching, secondary structure prediction and conservation analysis to the characterisation of the molecules. The analysis clearly shows that the repeats cluster into four distinct families and that greatest variation occurs within the repeat 3 units. Multiple alignment of the 88 repeats shows amino acids with conserved physicochemical properties at 22 positions, with only Gly at position 23 being absolutely conserved in all repeats. Secondary structure prediction techniques identify five conserved helices in each repeat unit and patterns of conserved hydrophobic amino acids are consistent with one face of a helix packing against the protein core in predicted helices a, c, d, e. Helix b is generally hydrophobic in all repeats, but contains a striking pattern of repeat-specific residue conservation at position 31, with Arg in repeats 4 and Glu in repeats 2, but unconserved amino acids in repeats 1 and 3. This suggests repeats 2 and 4 may interact via a buried saltbridge. The loop between predicted helices a and b of repeat 3 shows features distinct from the equivalent loop in repeats 1, 2 and 4, suggesting an important structural and/or functional role for this region. No compelling evidence emerges from this study for uteroglobin and the annexins sharing similar tertiary structures, or for uteroglobin representing a derivative of a primordial one-repeat structure that underwent duplication to give the present day annexins. The analyses performed in this paper are re-evaluated in the Appendix, in the light of the recently published X-ray structure for human annexin V. The structure confirms most of

  2. Amplification and chromosomal dispersion of human endogenous retroviral sequences

    International Nuclear Information System (INIS)

    Steele, P.E.; Martin, M.A.; Rabson, A.B.; Bryan, T.; O'Brien, S.J.

    1986-01-01

    Endogenous retroviral sequences have undergone amplification events involving both viral and flanking cellular sequences. The authors cloned members of an amplified family of full-length endogenous retroviral sequences. Genomic blotting, employing a flanking cellular DNA probe derived from a member of this family, revealed a similar array of reactive bands in both humans and chimpanzees, indicating that an amplification event involving retroviral and associated cellular DNA sequences occurred before the evolutionary separation of these two primates. Southern analyses of restricted somatic cell hybrid DNA preparations suggested that endogenous retroviral segments are widely dispersed in the human genome and that amplification and dispersion events may be linked

  3. Heuristics for multiobjective multiple sequence alignment.

    Science.gov (United States)

    Abbasi, Maryam; Paquete, Luís; Pereira, Francisco B

    2016-07-15

    Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment. We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments. The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show

  4. Heuristics for batching and sequencing in batch processing machines

    Directory of Open Access Journals (Sweden)

    Chuda Basnet

    2016-12-01

    Full Text Available In this paper, we discuss the “batch processing” problem, where there are multiple jobs to be processed in flow shops. These jobs can however be formed into batches and the number of jobs in a batch is limited by the capacity of the processing machines to accommodate the jobs. The processing time required by a batch in a machine is determined by the greatest processing time of the jobs included in the batch. Thus, the batch processing problem is a mix of batching and sequencing – the jobs need to be grouped into distinct batches, the batches then need to be sequenced through the flow shop. We apply certain newly developed heuristics to the problem and present computational results. The contributions of this paper are deriving a lower bound, and the heuristics developed and tested in this paper.

  5. Nucleotide Sequences and Comparison of Two Large Conjugative Plasmids from Different Campylobacter species

    National Research Council Canada - National Science Library

    Batchelor, Roger A; Pearson, Bruce M; Friis, Lorna M; Guerry, Patricia; Wells, Jerry M

    2004-01-01

    .... Both plasmids are mosaic in structure, having homologues of genes found in a variety of different commensal and pathogenic bacteria, but nevertheless, showed striking similarities in DNA sequence...

  6. Scalable gastroscopic video summarization via similar-inhibition dictionary selection.

    Science.gov (United States)

    Wang, Shuai; Cong, Yang; Cao, Jun; Yang, Yunsheng; Tang, Yandong; Zhao, Huaici; Yu, Haibin

    2016-01-01

    This paper aims at developing an automated gastroscopic video summarization algorithm to assist clinicians to more effectively go through the abnormal contents of the video. To select the most representative frames from the original video sequence, we formulate the problem of gastroscopic video summarization as a dictionary selection issue. Different from the traditional dictionary selection methods, which take into account only the number and reconstruction ability of selected key frames, our model introduces the similar-inhibition constraint to reinforce the diversity of selected key frames. We calculate the attention cost by merging both gaze and content change into a prior cue to help select the frames with more high-level semantic information. Moreover, we adopt an image quality evaluation process to eliminate the interference of the poor quality images and a segmentation process to reduce the computational complexity. For experiments, we build a new gastroscopic video dataset captured from 30 volunteers with more than 400k images and compare our method with the state-of-the-arts using the content consistency, index consistency and content-index consistency with the ground truth. Compared with all competitors, our method obtains the best results in 23 of 30 videos evaluated based on content consistency, 24 of 30 videos evaluated based on index consistency and all videos evaluated based on content-index consistency. For gastroscopic video summarization, we propose an automated annotation method via similar-inhibition dictionary selection. Our model can achieve better performance compared with other state-of-the-art models and supplies more suitable key frames for diagnosis. The developed algorithm can be automatically adapted to various real applications, such as the training of young clinicians, computer-aided diagnosis or medical report generation. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Similar Symmetries: The Role of Wallpaper Groups in Perceptual Texture Similarity

    Directory of Open Access Journals (Sweden)

    Fraser Halley

    2011-05-01

    Full Text Available Periodic patterns and symmetries are striking visual properties that have been used decoratively around the world throughout human history. Periodic patterns can be mathematically classified into one of 17 different Wallpaper groups, and while computational models have been developed which can extract an image's symmetry group, very little work has been done on how humans perceive these patterns. This study presents the results from a grouping experiment using stimuli from the different wallpaper groups. We find that while different images from the same wallpaper group are perceived as similar to one another, not all groups have the same degree of self-similarity. The similarity relationships between wallpaper groups appear to be dominated by rotations.

  8. Perceptions of randomness in binary sequences: Normative, heuristic, or both?

    Science.gov (United States)

    Reimers, Stian; Donkin, Chris; Le Pelley, Mike E

    2018-03-01

    When people consider a series of random binary events, such as tossing an unbiased coin and recording the sequence of heads (H) and tails (T), they tend to erroneously rate sequences with less internal structure or order (such as HTTHT) as more probable than sequences containing more structure or order (such as HHHHH). This is traditionally explained as a local representativeness effect: Participants assume that the properties of long sequences of random outcomes-such as an equal proportion of heads and tails, and little internal structure-should also apply to short sequences. However, recent theoretical work has noted that the probability of a particular sequence of say, heads and tails of length n, occurring within a larger (>n) sequence of coin flips actually differs by sequence, so P(HHHHH) rational norms based on limited experience. We test these accounts. Participants in Experiment 1 rated the likelihood of occurrence for all possible strings of 4, 5, and 6 observations in a sequence of coin flips. Judgments were better explained by representativeness in alternation rate, relative proportion of heads and tails, and sequence complexity, than by objective probabilities. Experiments 2 and 3 gave similar results using incentivized binary choice procedures. Overall the evidence suggests that participants are not sensitive to variation in objective probabilities of a sub-sequence occurring; they appear to use heuristics based on several distinct forms of representativeness. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Spain's greatest and most recent mine disaster.

    Science.gov (United States)

    Guerrero, Flor Ma; Lozano, Macarena; Rueda-Cantuche, José M

    2008-03-01

    On 25 April 1998, the mineral waste retaining wall at the Swedish-owned pyrite mine at Aznalcóllar (Seville, Spain) burst, causing the most harmful environmental and socio-economic disaster in the history of the River Guadiamar basin. The damage was so great that the regional government decided in May 1998 to finance a comprehensive, multidisciplinary research initiative with the objective of eradicating or at least minimising all of the negative social, economic and environmental impacts. This paper utilises a Strengths, Weaknesses, Opportunities and Threats (SWOT) analysis to identify eight strategic measures aimed at providing policymakers with key guidelines on implementing a sustainable development model, in a broad sense. Empirical evidence, though, reveals that, to date, major efforts to tackle the negative impacts have centred on environmental concerns and that the socio-economic consequences have not been completely mitigated.

  10. Greatest barrier is retaining young scientists

    Science.gov (United States)

    Chandler, Mark; Hopper, John

    The National Science Foundation's top priorities as listed by director Neal Lane in Eos (November 9) are to strengthen NSF and its support of scientific research and education, to better articulate to the public why it is so important that support of science and engineering be strengthened, and to continue to lower barriers that discourage young people from choosing careers in science.While we firmly support the first two priorities, we are concerned about the underlying assumptions and implications of the third. Barriers discouraging women and minorities from considering careers in math and science do exist within our educational system, and there is now abundant statistical evidence showing these groups are under-represented in most fields of science. However, as stated in the Eos article, solving these problems and leveling the playing field is not the primary goal of the NSF policy.

  11. A Challenge with the Greatest Reward

    Science.gov (United States)

    Essary, Jessica

    2007-01-01

    For many teachers, the first time that they have a student who has little or no background with English, the thought of constantly engaging them in relevant activities throughout the entire day can be intimidating. Based on the author's experience teaching English language learners, she can assure teachers that it is a challenge worth facing. In…

  12. The Making of History's Greatest Star Map

    CERN Document Server

    Perryman, Michael

    2010-01-01

    From prehistoric times, mankind has looked up at the night sky, and puzzled at the changing positions of the stars. How far away they are is a question that has confounded scientists for centuries. Over the last few hundred years, many scientific careers – and considerable resources – have been devoted to measuring their positions and motions with ever increasing accuracy. And in the last two decades of the 20th century, the European Space Agency developed and launched the Hipparcos satellite, around which this account revolves, to carry out these exacting measurements from space. What has prompted these remarkable developments? Why have governments been persuaded to fund them? What are scientists learning from astronomy's equivalent of the Human Genome Project? This book traces the subject's history, explains why such enormous efforts are considered worthwhile, and interweaves these with a first-hand insight into the Hipparcos project, and how big science is conducted at an international level. The invol...

  13. Similarity and self-similarity in high energy density physics: application to laboratory astrophysics

    International Nuclear Information System (INIS)

    Falize, E.

    2008-10-01

    The spectacular recent development of powerful facilities allows the astrophysical community to explore, in laboratory, astrophysical phenomena where radiation and matter are strongly coupled. The titles of the nine chapters of the thesis are: from high energy density physics to laboratory astrophysics; Lie groups, invariance and self-similarity; scaling laws and similarity properties in High-Energy-Density physics; the Burgan-Feix-Munier transformation; dynamics of polytropic gases; stationary radiating shocks and the POLAR project; structure, dynamics and stability of optically thin fluids; from young star jets to laboratory jets; modelling and experiences for laboratory jets

  14. Multilocus Sequence Typing

    OpenAIRE

    Belén, Ana; Pavón, Ibarz; Maiden, Martin C.J.

    2009-01-01

    Multilocus sequence typing (MLST) was first proposed in 1998 as a typing approach that enables the unambiguous characterization of bacterial isolates in a standardized, reproducible, and portable manner using the human pathogen Neisseria meningitidis as the exemplar organism. Since then, the approach has been applied to a large and growing number of organisms by public health laboratories and research institutions. MLST data, shared by investigators over the world via the Internet, have been ...

  15. Achalasia Carcinoma Sequence

    OpenAIRE

    Makmun, Dadang

    2001-01-01

    We report a case of carcinoma of the esophagus in a 58 years old woman with achalasia, who has been diagnosed since 30 years ago, which initiated by surgical treatment (myotomy) and the symptoms recurred since 3 years ago. According to the progress of the disease, Malignancy was strongly suspected due to prolonged stasis and mucosal irritation caused by achalasia (achalasia carcinoma sequence). Because of these contributing factors for the development of serious complications such as Malignan...

  16. Analysis of Neuronal Sequences Using Pairwise Biases

    Science.gov (United States)

    2015-08-27

    semantic memory (knowledge of facts) and implicit memory (e.g., how to ride a bike ). Evidence for the participation of the hippocampus in the formation of...hippocampal formation in an attempt to be cured of severe epileptic seizures. Although the surgery was successful in regards to reducing the frequency and...very different from each other in many ways including duration and number of spikes. Still, these sequences share a similar trend in the general order

  17. Integration of Phenotypic Metadata and Protein Similarity in Archaea Using a Spectral Bipartitioning Approach

    Energy Technology Data Exchange (ETDEWEB)

    Hooper, Sean D.; Anderson, Iain J; Pati, Amrita; Dalevi, Daniel; Mavromatis, Konstantinos; Kyrpides, Nikos C

    2009-01-01

    In order to simplify and meaningfully categorize large sets of protein sequence data, it is commonplace to cluster proteins based on the similarity of those sequences. However, it quickly becomes clear that the sequence flexibility allowed a given protein varies significantly among different protein families. The degree to which sequences are conserved not only differs for each protein family, but also is affected by the phylogenetic divergence of the source organisms. Clustering techniques that use similarity thresholds for protein families do not always allow for these variations and thus cannot be confidently used for applications such as automated annotation and phylogenetic profiling. In this work, we applied a spectral bipartitioning technique to all proteins from 53 archaeal genomes. Comparisons between different taxonomic levels allowed us to study the effects of phylogenetic distances on cluster structure. Likewise, by associating functional annotations and phenotypic metadata with each protein, we could compare our protein similarity clusters with both protein function and associated phenotype. Our clusters can be analyzed graphically and interactively online.

  18. Sequencing BPS spectra

    Energy Technology Data Exchange (ETDEWEB)

    Gukov, Sergei [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Max-Planck-Institut für Mathematik,Vivatsgasse 7, D-53111 Bonn (Germany); Nawata, Satoshi [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Centre for Quantum Geometry of Moduli Spaces, University of Aarhus,Nordre Ringgade 1, DK-8000 (Denmark); Saberi, Ingmar [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Stošić, Marko [CAMGSD, Departamento de Matemática, Instituto Superior Técnico,Av. Rovisco Pais, 1049-001 Lisbon (Portugal); Mathematical Institute SANU,Knez Mihajlova 36, 11000 Belgrade (Serbia); Sułkowski, Piotr [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Faculty of Physics, University of Warsaw,ul. Pasteura 5, 02-093 Warsaw (Poland)

    2016-03-02

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincaré polynomials in numerous examples. Among these structural properties is a novel “sliding” property, which can be explained by using (refined) modular S-matrix. This leads to the identification of modular transformations in Chern-Simons theory and 3d N=2 theory via the 3d/3d correspondence. Lastly, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  19. Sequencing BPS spectra

    International Nuclear Information System (INIS)

    Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; Stošić, Marko; Sułkowski, Piotr

    2016-01-01

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincaré polynomials in numerous examples. Among these structural properties is a novel “sliding” property, which can be explained by using (refined) modular S-matrix. This leads to the identification of modular transformations in Chern-Simons theory and 3d N=2 theory via the 3d/3d correspondence. Lastly, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  20. Image sequence analysis

    CERN Document Server

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  1. Sequence Classification: 890345 [

    Lifescience Database Archive (English)

    Full Text Available n involved in transcription; interacts with RNA polymerase II subunits Rpb2p, Rpb3, and Rpb11p; has similarity to human RPAP1; Rba50p || http://www.ncbi.nlm.nih.gov/protein/6320735 ...

  2. Winnowing sequences from a database search.

    Science.gov (United States)

    Berman, P; Zhang, Z; Wolf, Y I; Koonin, E V; Miller, W

    2000-01-01

    In database searches for sequence similarity, matches to a distinct sequence region (e.g., protein domain) are frequently obscured by numerous matches to another region of the same sequence. In order to cope with this problem, algorithms are developed to discard redundant matches. One model for this problem begins with a list of intervals, each with an associated score; each interval gives the range of positions in the query sequence that align to a database sequence, and the score is that of the alignment. If interval I is contained in interval J, and I's score is less than J's, then I is said to be dominated by J. The problem is then to identify each interval that is dominated by at least K other intervals, where K is a given level of "tolerable redundancy." An algorithm is developed to solve the problem in O(N log N) time and O(N*) space, where N is the number of intervals and N* is a precisely defined value that never exceeds N and is frequently much smaller. This criterion for discarding database hits has been implemented in the Blast program, as illustrated herein with examples. Several variations and extensions of this approach are also described.

  3. Improved cosine similarity measures of simplified neutrosophic setsfor medical diagnoses

    OpenAIRE

    Jun Ye

    2014-01-01

    In pattern recognition and medical diagnosis, similarity measure is an important mathematicaltool. To overcome some disadvantages of existing cosine similarity measures of simplified neutrosophicsets (SNSs) in vector space, this paper proposed improved cosine similarity measures of SNSs based oncosine function, including single valued neutrosophic cosine similarity measures and interval neutro-sophic cosine similarity measures. Then, weighted cosine similarity measures of SNSs were introduced...

  4. Protein sequence comparison and protein evolution

    Energy Technology Data Exchange (ETDEWEB)

    Pearson, W.R. [Univ. of Virginia, Charlottesville, VA (United States). Dept. of Biochemistry

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. This tutorial examines how the information conserved during the evolution of a protein molecule can be used to infer reliably homology, and thus a shared proteinfold and possibly a shared active site or function. The authors start by reviewing a geological/evolutionary time scale. Next they look at the evolution of several protein families. During the tutorial, these families will be used to demonstrate that homologous protein ancestry can be inferred with confidence. They also examine different modes of protein evolution and consider some hypotheses that have been presented to explain the very earliest events in protein evolution. The next part of the tutorial will examine the technical aspects of protein sequence comparison. Both optimal and heuristic algorithms and their associated parameters that are used to characterize protein sequence similarities are discussed. Perhaps more importantly, they survey the statistics of local similarity scores, and how these statistics can both be used to improve the selectivity of a search and to evaluate the significance of a match. They them examine distantly related members of three protein families, the serine proteases, the glutathione transferases, and the G-protein-coupled receptors (GCRs). Finally, the discuss how sequence similarity can be used to examine internal repeated or mosaic structures in proteins.

  5. Analytical and functional similarity of Amgen biosimilar ABP 215 to bevacizumab.

    Science.gov (United States)

    Seo, Neungseon; Polozova, Alla; Zhang, Mingxuan; Yates, Zachary; Cao, Shawn; Li, Huimin; Kuhns, Scott; Maher, Gwendolyn; McBride, Helen J; Liu, Jennifer

    ABP 215 is a biosimilar product to bevacizumab. Bevacizumab acts by binding to vascular endothelial growth factor A, inhibiting endothelial cell proliferation and new blood vessel formation, thereby leading to tumor vasculature normalization. The ABP 215 analytical similarity assessment was designed to assess the structural and functional similarity of ABP 215 and bevacizumab sourced from both the United States (US) and the European Union (EU). Similarity assessment was also made between the US- and EU-sourced bevacizumab to assess the similarity between the two products. The physicochemical properties and structural similarity of ABP 215 and bevacizumab were characterized using sensitive state-of-the-art analytical techniques capable of detecting small differences in product attributes. ABP 215 has the same amino acid sequence and exhibits similar post-translational modification profiles compared to bevacizumab. The functional similarity assessment employed orthogonal assays designed to interrogate all expected biological activities, including those known to affect the mechanisms of action for ABP 215 and bevacizumab. More than 20 batches of bevacizumab (US) and bevacizumab (EU), and 13 batches of ABP 215 representing unique drug substance lots were assessed for similarity. The large dataset allows meaningful comparisons and garners confidence in the overall conclusion for the analytical similarity assessment of ABP 215 to both US- and EU-sourced bevacizumab. The structural and purity attributes, and biological properties of ABP 215 are demonstrated to be highly similar to those of bevacizumab.

  6. Hilar cholangiocarcinoma is pathologically similar to pancreatic duct adenocarcinoma: suggestions of similar background and development.

    Science.gov (United States)

    Nakanuma, Yasuni; Sato, Yasunori

    2014-07-01

    Routine experiences suggest that cholangiocarcinomas (CCAs) show different clinicopathological behaviors along the biliary tree, and hilar CCA apparently resembles pancreatic duct adenocarcinoma (PDAC). Herein, the backgrounds for these similarities were reviewed. While all cases of PDAC, hilar CCA, intrahepatic CCA (ICCA) and CCA components of combined hepatocellular-cholangiocarcinoma (cHC-CCA) were adenocarcinomas, micropapillary patterns and columnar carcinoma cells were common in PDAC and hilar CCA, and trabecular components and cuboidal carcinoma cells were common in ICCA and CCA components of cHC-CCA. Anterior gradient protein-2 and S100P were frequently expressed in perihilar CCA and PDAC, while neural cell adhesion molecule and luminal epithelial membrane antigen were common in CCA components of c-HC-CCA. Pdx1 and Hes1 were frequently and markedly expressed aberrantly in PDAC and perihilar CCA, although their expression was rare and mild in CCA components in cHC-CCA and ICCA. Hilar CCA showed a similar postoperative prognosis to PDAC but differed from ICCA and cHC-CCA. Taken together, hilar CCA may differ from ICCA and CCA components of cHC-CCA but have a similar development to PDAC. These similarities may be explained by the unique anatomical, embryological and reactive nature of the pancreatobiliary tract. Further studies of these intractable malignancies are warranted. © 2014 Japanese Society of Hepato-Biliary-Pancreatic Surgery.

  7. When high similarity copycats lose and moderate similarity copycats gain: The impact of comparative evaluation

    NARCIS (Netherlands)

    Van Horen, F.; Pieters, R.

    2012-01-01

    Copycats imitate features of leading brands to free ride on their equity. The prevailing belief is that the more similar copycats are to the leader brand, the more positive their evaluation is, and thus the more they free ride. Three studies demonstrate when the reverse holds true:

  8. When high similarity copycats lose and moderate similarity copycats gain : The impact of comparative evaluation

    NARCIS (Netherlands)

    van Horen, F.; Pieters, R.

    2012-01-01

    Copycats imitate features of leading brands to free ride on their equity. The prevailing belief is that the more similar copycats are to the leader brand, the more positive their evaluation is, and thus the more they free ride. Three studies demonstrate when the reverse holds true:

  9. FASH: A web application for nucleotides sequence search

    Directory of Open Access Journals (Sweden)

    Chew Paul

    2008-05-01

    Full Text Available Abstract FASH (Fourier Alignment Sequence Heuristics is a web application, based on the Fast Fourier Transform, for finding remote homologs within a long nucleic acid sequence. Given a query sequence and a long text-sequence (e.g, the human genome, FASH detects subsequences within the text that are remotely-similar to the query. FASH offers an alternative approach to Blast/Fasta for querying long RNA/DNA sequences. FASH differs from these other approaches in that it does not depend on the existence of contiguous seed-sequences in its initial detection phase. The FASH web server is user friendly and very easy to operate. Availability FASH can be accessed at https://fash.bgu.ac.il:8443/fash/default.jsp (secured website

  10. Foundations of Sequence-to-Sequence Modeling for Time Series

    OpenAIRE

    Kuznetsov, Vitaly; Mariet, Zelda

    2018-01-01

    The availability of large amounts of time series data, paired with the performance of deep-learning algorithms on a broad class of problems, has recently led to significant interest in the use of sequence-to-sequence models for time series forecasting. We provide the first theoretical analysis of this time series forecasting framework. We include a comparison of sequence-to-sequence modeling to classical time series models, and as such our theory can serve as a quantitative guide for practiti...

  11. Novel expressed sequence tag- simple sequence repeats (EST ...

    African Journals Online (AJOL)

    Using different bioinformatic criteria, the SUCEST database was used to mine for simple sequence repeat (SSR) markers. Among 42,189 clusters, 1,425 expressed sequence tag- simple sequence repeats (EST-SSRs) were identified in silico. Trinucleotide repeats were the most abundant SSRs detected. Of 212 primer pairs ...

  12. Engaging narratives evoke similar neural activity and lead to similar time perception.

    Science.gov (United States)

    Cohen, Samantha S; Henin, Simon; Parra, Lucas C

    2017-07-04

    It is said that we lose track of time - that "time flies" - when we are engrossed in a story. How does engagement with the story cause this distorted perception of time, and what are its neural correlates? People commit both time and attentional resources to an engaging stimulus. For narrative videos, attentional engagement can be represented as the level of similarity between the electroencephalographic responses of different viewers. Here we show that this measure of neural engagement predicted the duration of time that viewers were willing to commit to narrative videos. Contrary to popular wisdom, engagement did not distort the average perception of time duration. Rather, more similar brain responses resulted in a more uniform perception of time across viewers. These findings suggest that by capturing the attention of an audience, narrative videos bring both neural processing and the subjective perception of time into synchrony.

  13. Infinite sequences and series

    CERN Document Server

    Knopp, Konrad

    1956-01-01

    One of the finest expositors in the field of modern mathematics, Dr. Konrad Knopp here concentrates on a topic that is of particular interest to 20th-century mathematicians and students. He develops the theory of infinite sequences and series from its beginnings to a point where the reader will be in a position to investigate more advanced stages on his own. The foundations of the theory are therefore presented with special care, while the developmental aspects are limited by the scope and purpose of the book. All definitions are clearly stated; all theorems are proved with enough detail to ma

  14. A Similarity Analysis of Audio Signal to Develop a Human Activity Recognition Using Similarity Networks

    Directory of Open Access Journals (Sweden)

    Alejandra García-Hernández

    2017-11-01

    Full Text Available Human Activity Recognition (HAR is one of the main subjects of study in the areas of computer vision and machine learning due to the great benefits that can be achieved. Examples of the study areas are: health prevention, security and surveillance, automotive research, and many others. The proposed approaches are carried out using machine learning techniques and present good results. However, it is difficult to observe how the descriptors of human activities are grouped. In order to obtain a better understanding of the the behavior of descriptors, it is important to improve the abilities to recognize the human activities. This paper proposes a novel approach for the HAR based on acoustic data and similarity networks. In this approach, we were able to characterize the sound of the activities and identify those activities looking for similarity in the sound pattern. We evaluated the similarity of the sounds considering mainly two features: the sound location and the materials that were used. As a result, the materials are a good reference classifying the human activities compared with the location.

  15. Phonological similarity and orthographic similarity affect probed serial recall of Chinese characters.

    Science.gov (United States)

    Lin, Yi-Chen; Chen, Hsiang-Yu; Lai, Yvonne C; Wu, Denise H

    2015-04-01

    The previous literature on working memory (WM) has indicated that verbal materials are dominantly retained in phonological representations, whereas other linguistic information (e.g., orthography, semantics) only contributes to verbal WM minimally, if not negligibly. Although accumulating evidence has suggested that multiple linguistic components jointly support verbal WM, the visual/orthographic contribution has rarely been addressed in alphabetic languages, possibly due to the difficulty of dissociating the effects of word forms from the effects of their pronunciations in relatively shallow orthography. In the present study, we examined whether the orthographic representations of Chinese characters support the retention of verbal materials in this language of deep orthography. In Experiments 1a and 2, we independently manipulated the phonological and orthographic similarity of horizontal and vertical characters, respectively, and found that participants' accuracy of probed serial recall was reduced by both similar pronunciations and shared phonetic radicals in the to-be-remembered stimuli. Moreover, Experiment 1b showed that only the effect of phonological, but not that of orthographic, similarity was affected by concurrent articulatory suppression. Taken together, the present results indicate the indispensable contribution of orthographic representations to verbal WM of Chinese characters, and suggest that the linguistic characteristics of a specific language not only determine long-term linguistic-processing mechanisms, but also delineate the organization of verbal WM for that language.

  16. Similar or different? The role of the ventrolateral prefrontal cortex in similarity detection.

    Directory of Open Access Journals (Sweden)

    Béatrice Garcin

    Full Text Available Patients with frontal lobe syndrome can exhibit two types of abnormal behaviour when asked to place a banana and an orange in a single category: some patients categorize them at a concrete level (e.g., "both have peel", while others continue to look for differences between these objects (e.g., "one is yellow, the other is orange". These observations raise the question of whether abstraction and similarity detection are distinct processes involved in abstract categorization, and that depend on separate areas of the prefrontal cortex (PFC. We designed an original experimental paradigm for a functional magnetic resonance imaging (fMRI study involving healthy subjects, confirming the existence of two distinct processes relying on different prefrontal areas, and thus explaining the behavioural dissociation in frontal lesion patients. We showed that: 1 Similarity detection involves the anterior ventrolateral PFC bilaterally with a right-left asymmetry: the right anterior ventrolateral PFC is only engaged in detecting physical similarities; 2 Abstraction per se activates the left dorsolateral PFC.

  17. [-25]A Similarity Analysis of Audio Signal to Develop a Human Activity Recognition Using Similarity Networks.

    Science.gov (United States)

    García-Hernández, Alejandra; Galván-Tejada, Carlos E; Galván-Tejada, Jorge I; Celaya-Padilla, José M; Gamboa-Rosales, Hamurabi; Velasco-Elizondo, Perla; Cárdenas-Vargas, Rogelio

    2017-11-21

    Human Activity Recognition (HAR) is one of the main subjects of study in the areas of computer vision and machine learning due to the great benefits that can be achieved. Examples of the study areas are: health prevention, security and surveillance, automotive research, and many others. The proposed approaches are carried out using machine learning techniques and present good results. However, it is difficult to observe how the descriptors of human activities are grouped. In order to obtain a better understanding of the the behavior of descriptors, it is important to improve the abilities to recognize the human activities. This paper proposes a novel approach for the HAR based on acoustic data and similarity networks. In this approach, we were able to characterize the sound of the activities and identify those activities looking for similarity in the sound pattern. We evaluated the similarity of the sounds considering mainly two features: the sound location and the materials that were used. As a result, the materials are a good reference classifying the human activities compared with the location.

  18. Towards predicting the encoding capability of MR fingerprinting sequences.

    Science.gov (United States)

    Sommer, K; Amthor, T; Doneva, M; Koken, P; Meineke, J; Börnert, P

    2017-09-01

    Sequence optimization and appropriate sequence selection is still an unmet need in magnetic resonance fingerprinting (MRF). The main challenge in MRF sequence design is the lack of an appropriate measure of the sequence's encoding capability. To find such a measure, three different candidates for judging the encoding capability have been investigated: local and global dot-product-based measures judging dictionary entry similarity as well as a Monte Carlo method that evaluates the noise propagation properties of an MRF sequence. Consistency of these measures for different sequence lengths as well as the capability to predict actual sequence performance in both phantom and in vivo measurements was analyzed. While the dot-product-based measures yielded inconsistent results for different sequence lengths, the Monte Carlo method was in a good agreement with phantom experiments. In particular, the Monte Carlo method could accurately predict the performance of different flip angle patterns in actual measurements. The proposed Monte Carlo method provides an appropriate measure of MRF sequence encoding capability and may be used for sequence optimization. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal; Salama, Khaled N.

    2011-01-01

    fast alignment algorithm, called 'Alignment By Scanning' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the 'GAP' (which is heuristic) and the 'Needleman

  20. Next-Generation Sequencing Platforms

    Science.gov (United States)

    Mardis, Elaine R.

    2013-06-01

    Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.

  1. Bacterial communities in haloalkaliphilic sulfate-reducing bioreactors under different electron donors revealed by 16S rRNA MiSeq sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, Jiemin [National Key Laboratory of Biochemical Engineering, Institute of Process Engineering, Chinese Academy of Sciences, P.O. Box 353, Beijing 100190 (China); University of Chinese Academy of Sciences, Beijing 100049 (China); Zhou, Xuemei; Li, Yuguang [101 Institute, Ministry of Civil Affairs, Beijing 100070 (China); Xing, Jianmin, E-mail: jmxing@ipe.ac.cn [National Key Laboratory of Biochemical Engineering, Institute of Process Engineering, Chinese Academy of Sciences, P.O. Box 353, Beijing 100190 (China)

    2015-09-15

    Highlights: • Bacterial communities of haloalkaliphilic bioreactors were investigated. • MiSeq was first used in analysis of communities of haloalkaliphilic bioreactors. • Electron donors had significant effect on bacterial communities. - Abstract: Biological technology used to treat flue gas is useful to replace conventional treatment, but there is sulfide inhibition. However, no sulfide toxicity effect was observed in haloalkaliphilic bioreactors. The performance of the ethanol-fed bioreactor was better than that of lactate-, glucose-, and formate-fed bioreactor, respectively. To support this result strongly, Illumina MiSeq paired-end sequencing of 16S rRNA gene was applied to investigate the bacterial communities. A total of 389,971 effective sequences were obtained and all of them were assigned to 10,220 operational taxonomic units (OTUs) at a 97% similarity. Bacterial communities in the glucose-fed bioreactor showed the greatest richness and evenness. The highest relative abundance of sulfate-reducing bacteria (SRB) was found in the ethanol-fed bioreactor, which can explain why the performance of the ethanol-fed bioreactor was the best. Different types of SRB, sulfur-oxidizing bacteria, and sulfur-reducing bacteria were detected, indicating that sulfur may be cycled among these microorganisms. Because high-throughput 16S rRNA gene paired-end sequencing has improved resolution of bacterial community analysis, many rare microorganisms were detected, such as Halanaerobium, Halothiobacillus, Desulfonatronum, Syntrophobacter, and Fusibacter. 16S rRNA gene sequencing of these bacteria would provide more functional and phylogenetic information about the bacterial communities.

  2. Rapid Polymer Sequencer

    Science.gov (United States)

    Stolc, Viktor (Inventor); Brock, Matthew W (Inventor)

    2013-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component. The nanopore preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  3. The advantages of SMRT sequencing

    OpenAIRE

    Roberts, Richard J; Carneiro, Mauricio O; Schatz, Michael C

    2013-01-01

    Of the current next-generation sequencing technologies, SMRT sequencing is sometimes overlooked. However, attributes such as long reads, modified base detection and high accuracy make SMRT a useful technology and an ideal approach to the complete sequencing of small genomes.

  4. Putting instruction sequences into effect

    NARCIS (Netherlands)

    Bergstra, J.A.

    2011-01-01

    An attempt is made to define the concept of execution of an instruction sequence. It is found to be a special case of directly putting into effect of an instruction sequence. Directly putting into effect of an instruction sequences comprises interpretation as well as execution. Directly putting into

  5. Region segmentation along image sequence

    International Nuclear Information System (INIS)

    Monchal, L.; Aubry, P.

    1995-01-01

    A method to extract regions in sequence of images is proposed. Regions are not matched from one image to the following one. The result of a region segmentation is used as an initialization to segment the following and image to track the region along the sequence. The image sequence is exploited as a spatio-temporal event. (authors). 12 refs., 8 figs

  6. Fermentation substrate and forage from south Florida cropping sequences

    Energy Technology Data Exchange (ETDEWEB)

    Kalmbacher, R.S.; Martin, F.G.; Mislevy, P.

    1985-01-01

    Zea mays (maize), Sorghum bicolor (sorghum), Ipomoea batatas (Sweet potato), Helianthus tuberosus (Jerusalem artichoke) and Manihot esculenta (cassava) were grown as alcohol biomass crops in various sequences in 1981 and 1982, on a sandy, siliceous, hyperthermic, typic Haplaquod soil. Herbage yield and yield of non-fermentable by-products were measured as potential cattle feed. Grain produced from Z. mays followed by S. bicolor averaged 11.4 Mg/ha and was greater (P less than 0.05) than other graincrop sequences. Highest (P less than 0.05) root yields were from I. batatas (5.1 Mg/ha) in 1981 and M. esculenta (5.3 Mg/ha) in 1982. Total nonstructural carbohydrate was greatest for Z. mays/S. bicolor (6.0 Mg/ha) and Z. mays/I. batatas (6.8 Mg/ha) sequences. Crops of I. batatas and M. esculenta were hindered by high rainfall and poorly drained soil. Cropping sequences including Z. mays and S. bicolor produced more cattle feed, and they can be expected to produce more alcohol biomass with fewer cultural problems, on south-central Florida flatwoods soils. 20 references.

  7. Sequence Classification: 378420 [

    Lifescience Database Archive (English)

    Full Text Available onate 6-phosphate aldolase and galactonate dehydratase || http://www.ncbi.nlm.nih.gov/protein/13476139 ... ...Non-TMB Non-TMH Non-TMB Non-TMB Non-TMB Non-TMB >gi|13476139|ref|NP_107709.1| similar to 2-oxo-3-deoxygalact

  8. Sequence Classification: 889842 [

    Lifescience Database Archive (English)

    Full Text Available region of Rcr1p in conferring Congo Red resistance when overexpressed; Rcr2p || http://www.ncbi.nlm.nih.gov/protein/6320206 ... ...lum membrane protein with similarity to Rcr1p; C-terminal region can functionally replace the corresponding

  9. Sequence Classification: 890890 [

    Lifescience Database Archive (English)

    Full Text Available lar protein of unknown function, positive regulator of exit from mitosis; involved in regulating the release of Cdc14p from the nucle...olus in early anaphase; proposed to play similar role in meiosis; Spo12p || http://www.ncbi.nlm.nih.gov/protein/6321946 ...

  10. Transcriptome sequencing of the Microarray Quality Control (MAQC RNA reference samples using next generation sequencing

    Directory of Open Access Journals (Sweden)

    Thierry-Mieg Danielle

    2009-06-01

    Full Text Available Abstract Background Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC reference RNA samples using Roche's 454 Genome Sequencer FLX. Results We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. Conclusion Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.

  11. Log-balanced combinatorial sequences

    Directory of Open Access Journals (Sweden)

    Tomislav Došlic

    2005-01-01

    Full Text Available We consider log-convex sequences that satisfy an additional constraint imposed on their rate of growth. We call such sequences log-balanced. It is shown that all such sequences satisfy a pair of double inequalities. Sufficient conditions for log-balancedness are given for the case when the sequence satisfies a two- (or more- term linear recurrence. It is shown that many combinatorially interesting sequences belong to this class, and, as a consequence, that the above-mentioned double inequalities are valid for all of them.

  12. Fast Schemes for Computing Similarities between Gaussian HMMs and Their Applications in Texture Image Classification

    Directory of Open Access Journals (Sweden)

    Chen Ling

    2005-01-01

    Full Text Available An appropriate definition and efficient computation of similarity (or distance measures between two stochastic models are of theoretical and practical interest. In this work, a similarity measure, that is, a modified "generalized probability product kernel," of Gaussian hidden Markov models is introduced. Two efficient schemes for computing this similarity measure are presented. The first scheme adopts a forward procedure analogous to the approach commonly used in probability evaluation of observation sequences on HMMs. The second scheme is based on the specially defined similarity transition matrix of two Gaussian hidden Markov models. Two scaling procedures are also proposed to solve the out-of-precision problem in the implementation. The effectiveness of the proposed methods has been evaluated on simulated observations with predefined model parameters, and on natural texture images. Promising experimental results have been observed.

  13. Discrete Self-Similarity in Interfacial Hydrodynamics and the Formation of Iterated Structures.

    Science.gov (United States)

    Dallaston, Michael C; Fontelos, Marco A; Tseluiko, Dmitri; Kalliadasis, Serafim

    2018-01-19

    The formation of iterated structures, such as satellite and subsatellite drops, filaments, and bubbles, is a common feature in interfacial hydrodynamics. Here we undertake a computational and theoretical study of their origin in the case of thin films of viscous fluids that are destabilized by long-range molecular or other forces. We demonstrate that iterated structures appear as a consequence of discrete self-similarity, where certain patterns repeat themselves, subject to rescaling, periodically in a logarithmic time scale. The result is an infinite sequence of ridges and filaments with similarity properties. The character of these discretely self-similar solutions as the result of a Hopf bifurcation from ordinarily self-similar solutions is also described.

  14. Prioritization of candidate disease genes by combining topological similarity and semantic similarity.

    Science.gov (United States)

    Liu, Bin; Jin, Min; Zeng, Pan

    2015-10-01

    The identification of gene-phenotype relationships is very important for the treatment of human diseases. Studies have shown that genes causing the same or similar phenotypes tend to interact with each other in a protein-protein interaction (PPI) network. Thus, many identification methods based on the PPI network model have achieved good results. However, in the PPI network, some interactions between the proteins encoded by candidate gene and the proteins encoded by known disease genes are very weak. Therefore, some studies have combined the PPI network with other genomic information and reported good predictive performances. However, we believe that the results could be further improved. In this paper, we propose a new method that uses the semantic similarity between the candidate gene and known disease genes to set the initial probability vector of a random walk with a restart algorithm in a human PPI network. The effectiveness of our method was demonstrated by leave-one-out cross-validation, and the experimental results indicated that our method outperformed other methods. Additionally, our method can predict new causative genes of multifactor diseases, including Parkinson's disease, breast cancer and obesity. The top predictions were good and consistent with the findings in the literature, which further illustrates the effectiveness of our method. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. Searching the protein structure database for ligand-binding site similarities using CPASS v.2

    Directory of Open Access Journals (Sweden)

    Caprez Adam

    2011-01-01

    Full Text Available Abstract Background A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2 database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. Findings We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. Conclusions CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ~ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores

  16. Evidence for Deep Regulatory Similarities in Early Developmental Programs across Highly Diverged Insects

    Science.gov (United States)

    Zhang, Yinan; Samee, Md. Abul Hassan; Halfon, Marc S.; Sinha, Saurabh

    2014-01-01

    Many genes familiar from Drosophila development, such as the so-called gap, pair-rule, and segment polarity genes, play important roles in the development of other insects and in many cases appear to be deployed in a similar fashion, despite the fact that Drosophila-like “long germband” development is highly derived and confined to a subset of insect families. Whether or not these similarities extend to the regulatory level is unknown. Identification of regulatory regions beyond the well-studied Drosophila has been challenging as even within the Diptera (flies, including mosquitoes) regulatory sequences have diverged past the point of recognition by standard alignment methods. Here, we demonstrate that methods we previously developed for computational cis-regulatory module (CRM) discovery in Drosophila can be used effectively in highly diverged (250–350 Myr) insect species including Anopheles gambiae, Tribolium castaneum, Apis mellifera, and Nasonia vitripennis. In Drosophila, we have successfully used small sets of known CRMs as “training data” to guide the search for other CRMs with related function. We show here that although species-specific CRM training data do not exist, training sets from Drosophila can facilitate CRM discovery in diverged insects. We validate in vivo over a dozen new CRMs, roughly doubling the number of known CRMs in the four non-Drosophila species. Given the growing wealth of Drosophila CRM annotation, these results suggest that extensive regulatory sequence annotation will be possible in newly sequenced insects without recourse to costly and labor-intensive genome-scale experiments. We develop a new method, Regulus, which computes a probabilistic score of similarity based on binding site composition (despite the absence of nucleotide-level sequence alignment), and demonstrate similarity between functionally related CRMs from orthologous loci. Our work represents an important step toward being able to trace the evolutionary

  17. Google matrix analysis of DNA sequences.

    Science.gov (United States)

    Kandiah, Vivek; Shepelyansky, Dima L

    2013-01-01

    For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  18. Google matrix analysis of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Vivek Kandiah

    Full Text Available For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW. At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  19. New MR pulse sequence

    International Nuclear Information System (INIS)

    Harms, S.E.; Flamig, D.P.; Griffey, R.H.

    1990-01-01

    This paper describes a method for fat suppression for three-dimensional MR imaging. The FATS (fat-suppressed acquisition with echo time shortened) sequence employs a pair of opposing adiabatic half-passage RF pulses tuned on fat resonance. The imaging parameters are as follows: TR, 20 msec; TE, 21.7-3.2 msec; 1,024 x 128 x 128 acquired matrix; imaging time, approximately 11 minutes. A series of 54 examinations were performed. Excellent fat suppression with water excitation is achieved in all cases. The orbital images demonstrate superior resolution of small orbital lesions. The high signal-to-noise ratio (SNR) in cranial studies demonstrates excellent petrous bone and internal auditory canal anatomy

  20. Dynamics based alignment of proteins: an alternative approach to quantify dynamic similarity

    Directory of Open Access Journals (Sweden)

    Lyngsø Rune

    2010-04-01

    Full Text Available Abstract Background The dynamic motions of many proteins are central to their function. It therefore follows that the dynamic requirements of a protein are evolutionary constrained. In order to assess and quantify this, one needs to compare the dynamic motions of different proteins. Comparing the dynamics of distinct proteins may also provide insight into how protein motions are modified by variations in sequence and, consequently, by structure. The optimal way of comparing complex molecular motions is, however, far from trivial. The majority of comparative molecular dynamics studies performed to date relied upon prior sequence or structural alignment to define which residues were equivalent in 3-dimensional space. Results Here we discuss an alternative methodology for comparative molecular dynamics that does not require any prior alignment information. We show it is possible to align proteins based solely on their dynamics and that we can use these dynamics-based alignments to quantify the dynamic similarity of proteins. Our method was tested on 10 representative members of the PDZ domain family. Conclusions As a result of creating pair-wise dynamics-based alignments of PDZ domains, we have found evolutionarily conserved patterns in their backbone dynamics. The dynamic similarity of PDZ domains is highly correlated with their structural similarity as calculated with Dali. However, significant differences in their dynamics can be detected indicating that sequence has a more refined role to play in protein dynamics than just dictating the overall fold. We suggest that the method should be generally applicable.

  1. Properties of Sequence Conservation in Upstream Regulatory and Protein Coding Sequences among Paralogs in Arabidopsis thaliana

    Science.gov (United States)

    Richardson, Dale N.; Wiehe, Thomas

    Whole genome duplication (WGD) has catalyzed the formation of new species, genes with novel functions, altered expression patterns, complexified signaling pathways and has provided organisms a level of genetic robustness. We studied the long-term evolution and interrelationships of 5’ upstream regulatory sequences (URSs), protein coding sequences (CDSs) and expression correlations (EC) of duplicated gene pairs in Arabidopsis. Three distinct methods revealed significant evolutionary conservation between paralogous URSs and were highly correlated with microarray-based expression correlation of the respective gene pairs. Positional information on exact matches between sequences unveiled the contribution of micro-chromosomal rearrangements on expression divergence. A three-way rank analysis of URS similarity, CDS divergence and EC uncovered specific gene functional biases. Transcription factor activity was associated with gene pairs exhibiting conserved URSs and divergent CDSs, whereas a broad array of metabolic enzymes was found to be associated with gene pairs showing diverged URSs but conserved CDSs.

  2. Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

    Directory of Open Access Journals (Sweden)

    Jason D Thompson

    Full Text Available Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.

  3. Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

    Science.gov (United States)

    Thompson, Jason D; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre

    2012-01-01

    Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.

  4. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    Science.gov (United States)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  5. High frequency RNA recombination in porcine reproductive and respiratory syndrome virus occurs preferentially between parental sequences with high similarity

    DEFF Research Database (Denmark)

    van Vugt, Joke .J.F.A.; Storgaard, Torben; Oleksiewicz, Martin B.

    2001-01-01

    Two types of porcine reproductive and respiratory syndrome virus (PRRSV) exist, a North American type and a European type. The co-existence of both types in some countries, such as Denmark, Slovakia and Canada, creates a risk of inter-type recombination. To evaluate this risk, cell cultures were co......, but no recombination was detected between the European and North American types. Calculation of the maximum theoretical risk of European-American recombination, based on the sensitivity of the RT-PCR system, revealed that RNA recombination between the European and North American types of PRRSV is at least 10000 times...

  6. Whole genome sequence of two Rathayibacter toxicus strains reveals a tunicamycin biosynthetic cluster similar to Streptomyces chartreusis

    Science.gov (United States)

    Rathayibacter toxicus is a forage grass associated Gram-positive bacterium of major concern to food safety and agriculture. The species is listed by USDA-APHIS as a plant pathogen select agent due to the fact that it produces a tunicamycin-like toxin that is lethal to livestock. The complete genomes...

  7. Genome Sequence of the Biocontrol Strain Pseudomonas fluorescens F113

    Science.gov (United States)

    Redondo-Nieto, Miguel; Barret, Matthieu; Morrisey, John P.; Germaine, Kieran; Martínez-Granero, Francisco; Barahona, Emma; Navazo, Ana; Sánchez-Contreras, María; Moynihan, Jennifer A.; Giddens, Stephen R.; Coppoolse, Eric R.; Muriel, Candela; Stiekema, Willem J.; Rainey, Paul B.; Dowling, David; O'Gara, Fergal; Martín, Marta

    2012-01-01

    Pseudomonas fluorescens F113 is a plant growth-promoting rhizobacterium (PGPR) that has biocontrol activity against fungal plant pathogens and is a model for rhizosphere colonization. Here, we present its complete genome sequence, which shows that besides a core genome very similar to those of other strains sequenced within this species, F113 possesses a wide array of genes encoding specialized functions for thriving in the rhizosphere and interacting with eukaryotic organisms. PMID:22328765

  8. SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters

    Directory of Open Access Journals (Sweden)

    Lefkowitz Elliot J

    2004-10-01

    Full Text Available Abstract Background Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. Results We describe the implementation of SS-Wrapper (Similarity Search Wrapper, a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST that provides a complementary solution for BLAST searches when the database is too large to fit into

  9. SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.

    Science.gov (United States)

    Wang, Chunlin; Lefkowitz, Elliot J

    2004-10-28

    Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node. Used together

  10. Determination of subjective similarity for pairs of masses and pairs of clustered microcalcifications on mammograms: Comparison of similarity ranking scores and absolute similarity ratings

    International Nuclear Information System (INIS)

    Muramatsu, Chisako; Li Qiang; Schmidt, Robert A.; Shiraishi, Junji; Suzuki, Kenji; Newstead, Gillian M.; Doi, Kunio

    2007-01-01

    The presentation of images that are similar to that of an unknown lesion seen on a mammogram may be helpful for radiologists to correctly diagnose that lesion. For similar images to be useful, they must be quite similar from the radiologists' point of view. We have been trying to quantify the radiologists' impression of similarity for pairs of lesions and to establish a ''gold standard'' for development and evaluation of a computerized scheme for selecting such similar images. However, it is considered difficult to reliably and accurately determine similarity ratings, because they are subjective. In this study, we compared the subjective similarities obtained by two different methods, an absolute rating method and a 2-alternative forced-choice (2AFC) method, to demonstrate that reliable similarity ratings can be determined by the responses of a group of radiologists. The absolute similarity ratings were previously obtained for pairs of masses and pairs of microcalcifications from five and nine radiologists, respectively. In this study, similarity ranking scores for eight pairs of masses and eight pairs of microcalcifications were determined by use of the 2AFC method. In the first session, the eight pairs of masses and eight pairs of microcalcifications were grouped and compared separately for determining the similarity ranking scores. In the second session, another similarity ranking score was determined by use of mixed pairs, i.e., by comparison of the similarity of a mass pair with that of a calcification pair. Four pairs of masses and four pairs of microcalcifications were grouped together to create two sets of eight pairs. The average absolute similarity ratings and the average similarity ranking scores showed very good correlations in the first study (Pearson's correlation coefficients: 0.94 and 0.98 for masses and microcalcifications, respectively). Moreover, in the second study, the correlations between the absolute ratings and the ranking scores were also

  11. Prediction of novel archaeal enzymes from sequence-derived features

    DEFF Research Database (Denmark)

    Jensen, Lars Juhl; Skovgaard, Marie; Brunak, Søren

    2002-01-01

    The completely sequenced archaeal genomes potentially encode, among their many functionally uncharacterized genes, novel enzymes of biotechnological interest. We have developed a prediction method for detection and classification of enzymes from sequence alone (available at http://www.cbs.dtu.dk/......The completely sequenced archaeal genomes potentially encode, among their many functionally uncharacterized genes, novel enzymes of biotechnological interest. We have developed a prediction method for detection and classification of enzymes from sequence alone (available at http......://www.cbs.dtu.dk/services/ArchaeaFun/). The method does not make use of sequence similarity; rather, it relies on predicted protein features like cotranslational and posttranslational modifications, secondary structure, and simple physical/chemical properties....

  12. QUASAR--scoring and ranking of sequence-structure alignments.

    Science.gov (United States)

    Birzele, Fabian; Gewehr, Jan E; Zimmer, Ralf

    2005-12-15

    Sequence-structure alignments are a common means for protein structure prediction in the fields of fold recognition and homology modeling, and there is a broad variety of programs that provide such alignments based on sequence similarity, secondary structure or contact potentials. Nevertheless, finding the best sequence-structure alignment in a pool of alignments remains a difficult problem. QUASAR (quality of sequence-structure alignments ranking) provides a unifying framework for scoring sequence-structure alignments that aids finding well-performing combinations of well-known and custom-made scoring schemes. Those scoring functions can be benchmarked against widely accepted quality scores like MaxSub, TMScore, Touch and APDB, thus enabling users to test their own alignment scores against 'standard-of-truth' structure-based scores. Furthermore, individual score combinations can be optimized with respect to benchmark sets based on known structural relationships using QUASAR's in-built optimization routines.

  13. Neutrosophic Refined Similarity Measure Based on Cosine Function

    Directory of Open Access Journals (Sweden)

    Said Broumi

    2014-12-01

    Full Text Available In this paper, the cosine similarity measure of neutrosophic refined (multi- sets is proposed and its properties are studied. The concept of this cosine similarity measure of neutrosophic refined sets is the extension of improved cosine similarity measure of single valued neutrosophic. Finally, using this cosine similarity measure of neutrosophic refined set, the application of medical diagnosis is presented.

  14. Do birds of a feather flock together? The variable bases for African American, Asian American, and European American adolescents' selection of similar friends.

    Science.gov (United States)

    Hamm, J V

    2000-03-01

    Variability in adolescent-friend similarity is documented in a diverse sample of African American, Asian American, and European American adolescents. Similarity was greatest for substance use, modest for academic orientations, and low for ethnic identity. Compared with Asian American and European American adolescents, African American adolescents chose friends who were less similar with respect to academic orientation or substance use but more similar with respect to ethnic identity. For all three ethnic groups, personal endorsement of the dimension in question and selection of cross-ethnic-group friends heightened similarity. Similarity was a relative rather than an absolute selection criterion: Adolescents did not choose friends with identical orientations. These findings call for a comprehensive theory of friendship selection sensitive to diversity in adolescents' experiences. Implications for peer influence and self-development are discussed.

  15. Characterization of promoter sequence of toll-like receptor genes in Vechur cattle

    Directory of Open Access Journals (Sweden)

    R. Lakshmi

    2016-06-01

    Full Text Available Aim: To analyze the promoter sequence of toll-like receptor (TLR genes in Vechur cattle, an indigenous breed of Kerala with the sequence of Bos taurus and access the differences that could be attributed to innate immune responses against bovine mastitis. Materials and Methods: Blood samples were collected from Jugular vein of Vechur cattle, maintained at Vechur cattle conservation center of Kerala Veterinary and Animal Sciences University, using an acid-citrate-dextrose anticoagulant. The genomic DNA was extracted, and polymerase chain reaction was carried out to amplify the promoter region of TLRs. The amplified product of TLR2, 4, and 9 promoter regions was sequenced by Sanger enzymatic DNA sequencing technique. Results: The sequence of promoter region of TLR2 of Vechur cattle with the B. taurus sequence present in GenBank showed 98% similarity and revealed variants for four sequence motifs. The sequence of the promoter region of TLR4 of Vechur cattle revealed 99% similarity with that of B. taurus sequence but not reveals significant variant in motifregions. However, two heterozygous loci were observed from the chromatogram. Promoter sequence of TLR9 gene also showed 99% similarity to B. taurus sequence and revealed variants for four sequence motifs. Conclusion: The results of this study indicate that significant variation in the promoter of TLR2 and 9 genes in Vechur cattle breed and may potentially link the influence the innate immunity response against mastitis diseases.

  16. Assessing Analytical Similarity of Proposed Amgen Biosimilar ABP 501 to Adalimumab.

    Science.gov (United States)

    Liu, Jennifer; Eris, Tamer; Li, Cynthia; Cao, Shawn; Kuhns, Scott

    2016-08-01

    ABP 501 is being developed as a biosimilar to adalimumab. Comprehensive comparative analytical characterization studies have been conducted and completed. The objective of this study was to assess analytical similarity between ABP 501 and two adalimumab reference products (RPs), licensed by the United States Food and Drug Administration (adalimumab [US]) and authorized by the European Union (adalimumab [EU]), using state-of-the-art analytical methods. Comprehensive analytical characterization incorporating orthogonal analytical techniques was used to compare products. Physicochemical property comparisons comprised the primary structure related to amino acid sequence and post-translational modifications including glycans; higher-order structure; primary biological properties mediated by target and receptor binding; product-related substances and impurities; host-cell impurities; general properties of the finished drug product, including strength and formulation; subvisible and submicron particles and aggregates; and forced thermal degradation. ABP 501 had the same amino acid sequence and similar post-translational modification profiles compared with adalimumab RPs. Primary structure, higher-order structure, and biological activities were similar for the three products. Product-related size and charge variants and aggregate and particle levels were also similar. ABP 501 had very low residual host-cell protein and DNA. The finished ABP 501 drug product has the same strength with regard to protein concentration and fill volume as adalimumab RPs. ABP 501 and the RPs had a similar stability profile both in normal storage and thermal stress conditions. Based on the comprehensive analytical similarity assessment, ABP 501 was found to be similar to adalimumab with respect to physicochemical and biological properties.

  17. Automated degenerate PCR primer design for high-throughput sequencing improves efficiency of viral sequencing

    Directory of Open Access Journals (Sweden)

    Li Kelvin

    2012-11-01

    Full Text Available Abstract Background In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally. Results We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute’s (JCVI high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates. Conclusions Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus

  18. GuiTope: an application for mapping random-sequence peptides to protein sequences.

    Science.gov (United States)

    Halperin, Rebecca F; Stafford, Phillip; Emery, Jack S; Navalkar, Krupa Arun; Johnston, Stephen Albert

    2012-01-03

    Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC) at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.

  19. GuiTope: an application for mapping random-sequence peptides to protein sequences

    Directory of Open Access Journals (Sweden)

    Halperin Rebecca F

    2012-01-01

    Full Text Available Abstract Background Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. Results GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. Conclusions GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.

  20. Neural mechanisms of sequence generation in songbirds

    Science.gov (United States)

    Langford, Bruce

    Animal models in research are useful for studying more complex behavior. For example, motor sequence generation of actions requiring good muscle coordination such as writing with a pen, playing an instrument, or speaking, may involve the interaction of many areas in the brain, each a complex system in itself; thus it can be difficult to determine causal relationships between neural behavior and the behavior being studied. Birdsong, however, provides an excellent model behavior for motor sequence learning, memory, and generation. The song consists of learned sequences of notes that are spectrographically stereotyped over multiple renditions of the song, similar to syllables in human speech. The main areas of the songbird brain involve in singing are known, however, the mechanisms by which these systems store and produce song are not well understood. We used a custom built, head-mounted, miniature motorized microdrive to chronically record the neural firing patterns of identified neurons in HVC, a pre-motor cortical nucleus which has been shown to be important in song timing. These were done in Bengalese finch which generate a song made up of stereotyped notes but variable note sequences. We observed song related bursting in neurons projecting to Area X, a homologue to basal ganglia, and tonic firing in HVC interneurons. Interneuron had firing rate patterns that were consistent over multiple renditions of the same note sequence. We also designed and built a light-weight, low-powered wireless programmable neural stimulator using Bluetooth Low Energy Protocol. It was able to generate perturbations in the song when current pulses were administered to RA, which projects to the brainstem nucleus responsible for syringeal muscle control.