WorldWideScience

Sample records for mtrap pairwise sequence

  1. Pairwise Sequence Alignment Library

    Energy Technology Data Exchange (ETDEWEB)

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.

  2. Pareto optimal pairwise sequence alignment.

    Science.gov (United States)

    DeRonne, Kevin W; Karypis, George

    2013-01-01

    Sequence alignment using evolutionary profiles is a commonly employed tool when investigating a protein. Many profile-profile scoring functions have been developed for use in such alignments, but there has not yet been a comprehensive study of Pareto optimal pairwise alignments for combining multiple such functions. We show that the problem of generating Pareto optimal pairwise alignments has an optimal substructure property, and develop an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. All possible sets of two, three, and four profile scoring functions are used from a pool of 11 functions and applied to 588 pairs of proteins in the ce_ref data set. The performance of the best objective combinations on ce_ref is also evaluated on an independent set of 913 protein pairs extracted from the BAliBASE RV11 data set. Our dynamic-programming-based heuristic approach produces approximated Pareto optimal frontiers of pairwise alignments that contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments whose quality is better than the alignments obtained by single objectives. However, the task of identifying a single high-quality alignment among those in the Pareto frontier remains challenging.

  3. Analysis of Neuronal Sequences Using Pairwise Biases

    Science.gov (United States)

    2015-08-27

    semantic memory (knowledge of facts) and implicit memory (e.g., how to ride a bike ). Evidence for the participation of the hippocampus in the formation of...hippocampal formation in an attempt to be cured of severe epileptic seizures. Although the surgery was successful in regards to reducing the frequency and...very different from each other in many ways including duration and number of spikes. Still, these sequences share a similar trend in the general order

  4. Revision of Begomovirus taxonomy based on pairwise sequence comparisons

    KAUST Repository

    Brown, Judith K.

    2015-04-18

    Viruses of the genus Begomovirus (family Geminiviridae) are emergent pathogens of crops throughout the tropical and subtropical regions of the world. By virtue of having a small DNA genome that is easily cloned, and due to the recent innovations in cloning and low-cost sequencing, there has been a dramatic increase in the number of available begomovirus genome sequences. Even so, most of the available sequences have been obtained from cultivated plants and are likely a small and phylogenetically unrepresentative sample of begomovirus diversity, a factor constraining taxonomic decisions such as the establishment of operationally useful species demarcation criteria. In addition, problems in assigning new viruses to established species have highlighted shortcomings in the previously recommended mechanism of species demarcation. Based on the analysis of 3,123 full-length begomovirus genome (or DNA-A component) sequences available in public databases as of December 2012, a set of revised guidelines for the classification and nomenclature of begomoviruses are proposed. The guidelines primarily consider a) genus-level biological characteristics and b) results obtained using a standardized classification tool, Sequence Demarcation Tool, which performs pairwise sequence alignments and identity calculations. These guidelines are consistent with the recently published recommendations for the genera Mastrevirus and Curtovirus of the family Geminiviridae. Genome-wide pairwise identities of 91 % and 94 % are proposed as the demarcation threshold for begomoviruses belonging to different species and strains, respectively. Procedures and guidelines are outlined for resolving conflicts that may arise when assigning species and strains to categories wherever the pairwise identity falls on or very near the demarcation threshold value.

  5. Revision of Begomovirus taxonomy based on pairwise sequence comparisons

    KAUST Repository

    Brown, Judith K.; Zerbini, F. Murilo; Navas-Castillo, Jesú s; Moriones, Enrique; Ramos-Sobrinho, Roberto; Silva, José C. F.; Fiallo-Olivé , Elvira; Briddon, Rob W.; Herná ndez-Zepeda, Cecilia; Idris, Ali; Malathi, V. G.; Martin, Darren P.; Rivera-Bustamante, Rafael; Ueda, Shigenori; Varsani, Arvind

    2015-01-01

    Viruses of the genus Begomovirus (family Geminiviridae) are emergent pathogens of crops throughout the tropical and subtropical regions of the world. By virtue of having a small DNA genome that is easily cloned, and due to the recent innovations in cloning and low-cost sequencing, there has been a dramatic increase in the number of available begomovirus genome sequences. Even so, most of the available sequences have been obtained from cultivated plants and are likely a small and phylogenetically unrepresentative sample of begomovirus diversity, a factor constraining taxonomic decisions such as the establishment of operationally useful species demarcation criteria. In addition, problems in assigning new viruses to established species have highlighted shortcomings in the previously recommended mechanism of species demarcation. Based on the analysis of 3,123 full-length begomovirus genome (or DNA-A component) sequences available in public databases as of December 2012, a set of revised guidelines for the classification and nomenclature of begomoviruses are proposed. The guidelines primarily consider a) genus-level biological characteristics and b) results obtained using a standardized classification tool, Sequence Demarcation Tool, which performs pairwise sequence alignments and identity calculations. These guidelines are consistent with the recently published recommendations for the genera Mastrevirus and Curtovirus of the family Geminiviridae. Genome-wide pairwise identities of 91 % and 94 % are proposed as the demarcation threshold for begomoviruses belonging to different species and strains, respectively. Procedures and guidelines are outlined for resolving conflicts that may arise when assigning species and strains to categories wherever the pairwise identity falls on or very near the demarcation threshold value.

  6. Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%

    DEFF Research Database (Denmark)

    Havgaard, Jakob Hull; Lyngsø, Rune B.; Stormo, Gary D.

    2005-01-01

    detect two genes with low sequence similarity, where the genes are part of a larger genomic region. Results: Here we present such an approach for pairwise local alignment which is based on FILDALIGN and the Sankoff algorithm for simultaneous structural alignment of multiple sequences. We include...... the ability to conduct mutual scans of two sequences of arbitrary length while searching for common local structural motifs of some maximum length. This drastically reduces the complexity of the algorithm. The scoring scheme includes structural parameters corresponding to those available for free energy....... The structure prediction performance for a family is typically around 0.7 using Matthews correlation coefficient. In case (2), the algorithm is successful at locating RNA families with an average sensitivity of 0.8 and a positive predictive value of 0.9 using a BLAST-like hit selection scheme. Availability...

  7. SDT: a virus classification tool based on pairwise sequence alignment and identity calculation.

    Directory of Open Access Journals (Sweden)

    Brejnev Muhizi Muhire

    Full Text Available The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV. There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT, a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences. SDT can produce publication quality pairwise identity plots and colour-coded distance matrices to further aid the classification of sequences according to ICTV approved taxonomic demarcation criteria. Besides a graphical interface version of the program for Windows computers, command-line versions of the program are available for a variety of different operating systems (including a parallel version for cluster computing platforms.

  8. Image ranking in video sequences using pairwise image comparisons and temporal smoothing

    CSIR Research Space (South Africa)

    Burke, Michael

    2016-12-01

    Full Text Available The ability to predict the importance of an image is highly desirable in computer vision. This work introduces an image ranking scheme suitable for use in video or image sequences. Pairwise image comparisons are used to determine image ‘interest...

  9. GapMis: a tool for pairwise sequence alignment with a single gap.

    Science.gov (United States)

    Flouri, Tomás; Frousios, Kimon; Iliopoulos, Costas S; Park, Kunsoo; Pissis, Solon P; Tischler, German

    2013-08-01

    Pairwise sequence alignment has received a new motivation due to the advent of recent patents in next-generation sequencing technologies, particularly so for the application of re-sequencing---the assembly of a genome directed by a reference sequence. After the fast alignment between a factor of the reference sequence and a high-quality fragment of a short read by a short-read alignment programme, an important problem is to find the alignment between a relatively short succeeding factor of the reference sequence and the remaining low-quality part of the read allowing a number of mismatches and the insertion of a single gap in the alignment. We present GapMis, a tool for pairwise sequence alignment with a single gap. It is based on a simple algorithm, which computes a different version of the traditional dynamic programming matrix. The presented experimental results demonstrate that GapMis is more suitable and efficient than most popular tools for this task.

  10. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.

    Science.gov (United States)

    Daily, Jeff

    2016-02-10

    Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar's 'striped' approach. Rognes's SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail's prefix scan implementation is generally the fastest, faster even than Farrar's 'striped' approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.

  11. Memory-efficient dynamic programming backtrace and pairwise local sequence alignment.

    Science.gov (United States)

    Newberg, Lee A

    2008-08-15

    A backtrace through a dynamic programming algorithm's intermediate results in search of an optimal path, or to sample paths according to an implied probability distribution, or as the second stage of a forward-backward algorithm, is a task of fundamental importance in computational biology. When there is insufficient space to store all intermediate results in high-speed memory (e.g. cache) existing approaches store selected stages of the computation, and recompute missing values from these checkpoints on an as-needed basis. Here we present an optimal checkpointing strategy, and demonstrate its utility with pairwise local sequence alignment of sequences of length 10,000. Sample C++-code for optimal backtrace is available in the Supplementary Materials. Supplementary data is available at Bioinformatics online.

  12. Improving pairwise comparison of protein sequences with domain co-occurrence

    Science.gov (United States)

    Gascuel, Olivier

    2018-01-01

    Comparing and aligning protein sequences is an essential task in bioinformatics. More specifically, local alignment tools like BLAST are widely used for identifying conserved protein sub-sequences, which likely correspond to protein domains or functional motifs. However, to limit the number of false positives, these tools are used with stringent sequence-similarity thresholds and hence can miss several hits, especially for species that are phylogenetically distant from reference organisms. A solution to this problem is then to integrate additional contextual information to the procedure. Here, we propose to use domain co-occurrence to increase the sensitivity of pairwise sequence comparisons. Domain co-occurrence is a strong feature of proteins, since most protein domains tend to appear with a limited number of other domains on the same protein. We propose a method to take this information into account in a typical BLAST analysis and to construct new domain families on the basis of these results. We used Plasmodium falciparum as a case study to evaluate our method. The experimental findings showed an increase of 14% of the number of significant BLAST hits and an increase of 25% of the proteome area that can be covered with a domain. Our method identified 2240 new domains for which, in most cases, no model of the Pfam database could be linked. Moreover, our study of the quality of the new domains in terms of alignment and physicochemical properties show that they are close to that of standard Pfam domains. Source code of the proposed approach and supplementary data are available at: https://gite.lirmm.fr/menichelli/pairwise-comparison-with-cooccurrence PMID:29293498

  13. DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors

    Directory of Open Access Journals (Sweden)

    Kaufmann Michael

    2004-09-01

    Full Text Available Abstract Background Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Results Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. Conclusions By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.

  14. Beyond Solar-B: MTRAP, the Magnetic TRAnsition Region Probe

    Science.gov (United States)

    Davis, J. M.; Moore, R. L.; Hathaway, D. H.; Science Definition CommitteeHigh-Resolution Solar Magnetography Beyond Solar-B Team

    2003-05-01

    The next generation of solar missions will reveal and measure fine-scale solar magnetic fields and their effects in the solar atmosphere at heights, small scales, sensitivities, and fields of view well beyond the reach of Solar-B. The necessity for, and potential of, such observations for understanding solar magnetic fields, their generation in and below the photosphere, and their control of the solar atmosphere and heliosphere, were the focus of a science definition workshop, "High-Resolution Solar Magnetography from Space: Beyond Solar-B," held in Huntsville Alabama in April 2001. Forty internationally prominent scientists active in solar research involving fine-scale solar magnetism participated in this Workshop and reached consensus that the key science objective to be pursued beyond Solar-B is a physical understanding of the fine-scale magnetic structure and activity in the magnetic transition region, defined as the region between the photosphere and corona where neither the plasma nor the magnetic field strongly dominates the other. The observational objective requires high cadence (x 16K pixels) with high QE at 150 nm, and extendable spacecraft structures. The Science Organizing Committee of the Beyond Solar-B Workshop recommends that: 1. Science and Technology Definition Teams should be established in FY04 to finalize the science requirements and to define technology development efforts needed to ensure the practicality of MTRAP's observational goals. 2. The necessary technology development funding should be included in Code S budgets for FY06 and beyond to prepare MTRAP for a new start no later than the nominal end of the Solar-B mission, around 2010.

  15. High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP

    Directory of Open Access Journals (Sweden)

    Khaled Benkrid

    2012-01-01

    Full Text Available This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs, Graphics Processor Units (GPUs, and IBM’s Cell Broadband Engine (Cell BE, in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools, FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs.

  16. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

    Science.gov (United States)

    Bastien, Olivier; Maréchal, Eric

    2008-08-07

    Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the

  17. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores

    Directory of Open Access Journals (Sweden)

    Maréchal Eric

    2008-08-01

    Full Text Available Abstract Background Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2 following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. Results We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure. Homologous sequences were considered as systems 1 having a high redundancy of information reflected by the magnitude of their alignment scores, 2 which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a

  18. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Support Vector Machine-Pairwise Algorithm Utilizing LZ-Complexity

    Directory of Open Access Journals (Sweden)

    Xin Yi Ng

    2015-01-01

    Full Text Available This study concerns an attempt to establish a new method for predicting antimicrobial peptides (AMPs which are important to the immune system. Recently, researchers are interested in designing alternative drugs based on AMPs because they have found that a large number of bacterial strains have become resistant to available antibiotics. However, researchers have encountered obstacles in the AMPs designing process as experiments to extract AMPs from protein sequences are costly and require a long set-up time. Therefore, a computational tool for AMPs prediction is needed to resolve this problem. In this study, an integrated algorithm is newly introduced to predict AMPs by integrating sequence alignment and support vector machine- (SVM- LZ complexity pairwise algorithm. It was observed that, when all sequences in the training set are used, the sensitivity of the proposed algorithm is 95.28% in jackknife test and 87.59% in independent test, while the sensitivity obtained for jackknife test and independent test is 88.74% and 78.70%, respectively, when only the sequences that has less than 70% similarity are used. Applying the proposed algorithm may allow researchers to effectively predict AMPs from unknown protein peptide sequences with higher sensitivity.

  19. SubVis: an interactive R package for exploring the effects of multiple substitution matrices on pairwise sequence alignment

    Directory of Open Access Journals (Sweden)

    Scott Barlowe

    2017-06-01

    Full Text Available Understanding how proteins mutate is critical to solving a host of biological problems. Mutations occur when an amino acid is substituted for another in a protein sequence. The set of likelihoods for amino acid substitutions is stored in a matrix and input to alignment algorithms. The quality of the resulting alignment is used to assess the similarity of two or more sequences and can vary according to assumptions modeled by the substitution matrix. Substitution strategies with minor parameter variations are often grouped together in families. For example, the BLOSUM and PAM matrix families are commonly used because they provide a standard, predefined way of modeling substitutions. However, researchers often do not know if a given matrix family or any individual matrix within a family is the most suitable. Furthermore, predefined matrix families may inaccurately reflect a particular hypothesis that a researcher wishes to model or otherwise result in unsatisfactory alignments. In these cases, the ability to compare the effects of one or more custom matrices may be needed. This laborious process is often performed manually because the ability to simultaneously load multiple matrices and then compare their effects on alignments is not readily available in current software tools. This paper presents SubVis, an interactive R package for loading and applying multiple substitution matrices to pairwise alignments. Users can simultaneously explore alignments resulting from multiple predefined and custom substitution matrices. SubVis utilizes several of the alignment functions found in R, a common language among protein scientists. Functions are tied together with the Shiny platform which allows the modification of input parameters. Information regarding alignment quality and individual amino acid substitutions is displayed with the JavaScript language which provides interactive visualizations for revealing both high-level and low-level alignment

  20. Metabolic network prediction through pairwise rational kernels.

    Science.gov (United States)

    Roche-Lima, Abiel; Domaratzki, Michael; Fristensky, Brian

    2014-09-26

    Metabolic networks are represented by the set of metabolic pathways. Metabolic pathways are a series of biochemical reactions, in which the product (output) from one reaction serves as the substrate (input) to another reaction. Many pathways remain incompletely characterized. One of the major challenges of computational biology is to obtain better models of metabolic pathways. Existing models are dependent on the annotation of the genes. This propagates error accumulation when the pathways are predicted by incorrectly annotated genes. Pairwise classification methods are supervised learning methods used to classify new pair of entities. Some of these classification methods, e.g., Pairwise Support Vector Machines (SVMs), use pairwise kernels. Pairwise kernels describe similarity measures between two pairs of entities. Using pairwise kernels to handle sequence data requires long processing times and large storage. Rational kernels are kernels based on weighted finite-state transducers that represent similarity measures between sequences or automata. They have been effectively used in problems that handle large amount of sequence information such as protein essentiality, natural language processing and machine translations. We create a new family of pairwise kernels using weighted finite-state transducers (called Pairwise Rational Kernel (PRK)) to predict metabolic pathways from a variety of biological data. PRKs take advantage of the simpler representations and faster algorithms of transducers. Because raw sequence data can be used, the predictor model avoids the errors introduced by incorrect gene annotations. We then developed several experiments with PRKs and Pairwise SVM to validate our methods using the metabolic network of Saccharomyces cerevisiae. As a result, when PRKs are used, our method executes faster in comparison with other pairwise kernels. Also, when we use PRKs combined with other simple kernels that include evolutionary information, the accuracy

  1. Pairwise Choice Markov Chains

    OpenAIRE

    Ragain, Stephen; Ugander, Johan

    2016-01-01

    As datasets capturing human choices grow in richness and scale---particularly in online domains---there is an increasing need for choice models that escape traditional choice-theoretic axioms such as regularity, stochastic transitivity, and Luce's choice axiom. In this work we introduce the Pairwise Choice Markov Chain (PCMC) model of discrete choice, an inferentially tractable model that does not assume any of the above axioms while still satisfying the foundational axiom of uniform expansio...

  2. SVM-dependent pairwise HMM: an application to protein pairwise alignments.

    Science.gov (United States)

    Orlando, Gabriele; Raimondi, Daniele; Khan, Taushif; Lenaerts, Tom; Vranken, Wim F

    2017-12-15

    Methods able to provide reliable protein alignments are crucial for many bioinformatics applications. In the last years many different algorithms have been developed and various kinds of information, from sequence conservation to secondary structure, have been used to improve the alignment performances. This is especially relevant for proteins with highly divergent sequences. However, recent works suggest that different features may have different importance in diverse protein classes and it would be an advantage to have more customizable approaches, capable to deal with different alignment definitions. Here we present Rigapollo, a highly flexible pairwise alignment method based on a pairwise HMM-SVM that can use any type of information to build alignments. Rigapollo lets the user decide the optimal features to align their protein class of interest. It outperforms current state of the art methods on two well-known benchmark datasets when aligning highly divergent sequences. A Python implementation of the algorithm is available at http://ibsquare.be/rigapollo. wim.vranken@vub.be. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  3. Pairwise harmonics for shape analysis

    KAUST Repository

    Zheng, Youyi

    2013-07-01

    This paper introduces a simple yet effective shape analysis mechanism for geometry processing. Unlike traditional shape analysis techniques which compute descriptors per surface point up to certain neighborhoods, we introduce a shape analysis framework in which the descriptors are based on pairs of surface points. Such a pairwise analysis approach leads to a new class of shape descriptors that are more global, discriminative, and can effectively capture the variations in the underlying geometry. Specifically, we introduce new shape descriptors based on the isocurves of harmonic functions whose global maximum and minimum occur at the point pair. We show that these shape descriptors can infer shape structures and consistently lead to simpler and more efficient algorithms than the state-of-the-art methods for three applications: intrinsic reflectional symmetry axis computation, matching shape extremities, and simultaneous surface segmentation and skeletonization. © 2012 IEEE.

  4. Theory of pairwise lesion interaction

    International Nuclear Information System (INIS)

    Harder, Dietrich; Virsik-Peuckert, Patricia; Bartels, Ernst

    1992-01-01

    A comparison between repair time constants measured both at the molecular and cellular levels has shown that the DNA double strand break is the molecular change of key importance in the causation of cellular effects such as chromosome aberrations and cell inactivation. Cell fusion experiments provided the evidence that it needs the pairwise interaction between two double strand breaks - or more exactly between the two ''repair sites'' arising from them in the course of enzymatic repair - to provide the faulty chromatin crosslink which leads to cytogenetic and cytolethal effects. These modern experiments have confirmed the classical assumption of pairwise lesion interaction (PLI) on which the models of Lea and Neary were based. It seems worthwhile to continue and complete the mathematical treatment of their proposed mechanism in order to show in quantitative terms that the well-known fractionation, protraction and linear energy transfer (LET) irradiation effects are consequences of or can at least be partly attributed to PLI. Arithmetic treatment of PLI - a second order reaction - has also the advantage of providing a prerequisite for further investigations into the stages of development of misrepair products such as chromatin crosslinks. It has been possible to formulate a completely arithmetic theory of PLI by consequently applying three biophysically permitted approximations - pure first order lesion repair kinetics, dose-independent repair time constants and low yield of the ionization/lesion conversion. The mathematical approach will be summarized here, including several formulae not elaborated at the time of previous publications. We will also study an application which sheds light on the chain of events involved in PLI. (author)

  5. Doctoral Program Selection Using Pairwise Comparisons.

    Science.gov (United States)

    Tadisina, Suresh K.; Bhasin, Vijay

    1989-01-01

    The application of a pairwise comparison methodology (Saaty's Analytic Hierarchy Process) to the doctoral program selection process is illustrated. A hierarchy for structuring and facilitating the doctoral program selection decision is described. (Author/MLW)

  6. Statistical physics of pairwise probability models

    Directory of Open Access Journals (Sweden)

    Yasser Roudi

    2009-11-01

    Full Text Available Statistical models for describing the probability distribution over the states of biological systems are commonly used for dimensional reduction. Among these models, pairwise models are very attractive in part because they can be fit using a reasonable amount of data: knowledge of the means and correlations between pairs of elements in the system is sufficient. Not surprisingly, then, using pairwise models for studying neural data has been the focus of many studies in recent years. In this paper, we describe how tools from statistical physics can be employed for studying and using pairwise models. We build on our previous work on the subject and study the relation between different methods for fitting these models and evaluating their quality. In particular, using data from simulated cortical networks we study how the quality of various approximate methods for inferring the parameters in a pairwise model depends on the time bin chosen for binning the data. We also study the effect of the size of the time bin on the model quality itself, again using simulated data. We show that using finer time bins increases the quality of the pairwise model. We offer new ways of deriving the expressions reported in our previous work for assessing the quality of pairwise models.

  7. Preference Learning and Ranking by Pairwise Comparison

    Science.gov (United States)

    Fürnkranz, Johannes; Hüllermeier, Eyke

    This chapter provides an overview of recent work on preference learning and ranking via pairwise classification. The learning by pairwise comparison (LPC) paradigm is the natural machine learning counterpart to the relational approach to preference modeling and decision making. From a machine learning point of view, LPC is especially appealing as it decomposes a possibly complex prediction problem into a certain number of learning problems of the simplest type, namely binary classification. We explain how to approach different preference learning problems, such as label and instance ranking, within the framework of LPC. We primarily focus on methodological aspects, but also address theoretical questions as well as algorithmic and complexity issues.

  8. Statistical physics of pairwise probability models

    DEFF Research Database (Denmark)

    Roudi, Yasser; Aurell, Erik; Hertz, John

    2009-01-01

    (dansk abstrakt findes ikke) Statistical models for describing the probability distribution over the states of biological systems are commonly used for dimensional reduction. Among these models, pairwise models are very attractive in part because they can be fit using a reasonable amount of  data......: knowledge of the means and correlations between pairs of elements in the system is sufficient. Not surprisingly, then, using pairwise models for studying neural data has been the focus of many studies in recent years. In this paper, we describe how tools from statistical physics can be employed for studying...

  9. Unjamming in models with analytic pairwise potentials

    NARCIS (Netherlands)

    Kooij, S.; Lerner, E.

    Canonical models for studying the unjamming scenario in systems of soft repulsive particles assume pairwise potentials with a sharp cutoff in the interaction range. The sharp cutoff renders the potential nonanalytic but makes it possible to describe many properties of the solid in terms of the

  10. A decomposition of pairwise continuity via ideals

    Directory of Open Access Journals (Sweden)

    Mahes Wari

    2016-02-01

    Full Text Available In this paper, we introduce and study the notions of (i, j - regular - ℐ -closed sets, (i, j - Aℐ -sets, (i, j - ℐ -locally closed sets, p- Aℐ -continuous functions and p- ℐ -LC-continuous functions in ideal bitopological spaces and investigate some of their properties. Also, a new decomposition of pairwise continuity is obtained using these sets.

  11. PAIRWISE BLENDING OF HIGH LEVEL WASTE

    International Nuclear Information System (INIS)

    CERTA, P.J.

    2006-01-01

    The primary objective of this study is to demonstrate a mission scenario that uses pairwise and incidental blending of high level waste (HLW) to reduce the total mass of HLW glass. Secondary objectives include understanding how recent refinements to the tank waste inventory and solubility assumptions affect the mass of HLW glass and how logistical constraints may affect the efficacy of HLW blending

  12. Pairwise conjoint analysis of activity engagement choice

    NARCIS (Netherlands)

    Wang, Donggen; Oppewal, H.; Timmermans, H.J.P.

    2000-01-01

    Information overload is a well-known problem of conjoint choice models when respondents have to evaluate a large number of attributes and/or attribute levels. In this paper we develop an alternative conjoint modelling approach, called pairwise conjoint analysis. It differs from conventional conjoint

  13. Supplier Evaluation Process by Pairwise Comparisons

    Directory of Open Access Journals (Sweden)

    Arkadiusz Kawa

    2015-01-01

    Full Text Available We propose to assess suppliers by using consistency-driven pairwise comparisons for tangible and intangible criteria. The tangible criteria are simpler to compare (e.g., the price of a service is lower than that of another service with identical characteristics. Intangible criteria are more difficult to assess. The proposed model combines assessments of both types of criteria. The main contribution of this paper is the presentation of an extension framework for the selection of suppliers in a procurement process. The final weights are computed from relative pairwise comparisons. For the needs of the paper, surveys were conducted among Polish managers dealing with cooperation with suppliers in their enterprises. The Polish practice and restricted bidding are discussed, too.

  14. Selecting numerical scales for pairwise comparisons

    International Nuclear Information System (INIS)

    Elliott, Michael A.

    2010-01-01

    It is often desirable in decision analysis problems to elicit from an individual the rankings of a population of attributes according to the individual's preference and to understand the degree to which each attribute is preferred to the others. A common method for obtaining this information involves the use of pairwise comparisons, which allows an analyst to convert subjective expressions of preference between two attributes into numerical values indicating preferences across the entire population of attributes. Key to the use of pairwise comparisons is the underlying numerical scale that is used to convert subjective linguistic expressions of preference into numerical values. This scale represents the psychological manner in which individuals perceive increments of preference among abstract attributes and it has important implications about the distribution and consistency of an individual's preferences. Three popular scale types, the traditional integer scales, balanced scales and power scales are examined. Results of a study of 64 individuals responding to a hypothetical decision problem show that none of these scales can accurately capture the preferences of all individuals. A study of three individuals working on an actual engineering decision problem involving the design of a decay heat removal system for a nuclear fission reactor show that the choice of scale can affect the preferred decision. It is concluded that applications of pairwise comparisons would benefit from permitting participants to choose the scale that best models their own particular way of thinking about the relative preference of attributes.

  15. Extension of Pairwise Broadcast Clock Synchronization for Multicluster Sensor Networks

    Directory of Open Access Journals (Sweden)

    Bruce W. Suter

    2008-01-01

    Full Text Available Time synchronization is crucial for wireless sensor networks (WSNs in performing a number of fundamental operations such as data coordination, power management, security, and localization. The Pairwise Broadcast Synchronization (PBS protocol was recently proposed to minimize the number of timing messages required for global network synchronization, which enables the design of highly energy-efficient WSNs. However, PBS requires all nodes in the network to lie within the communication ranges of two leader nodes, a condition which might not be available in some applications. This paper proposes an extension of PBS to the more general class of sensor networks. Based on the hierarchical structure of the network, an energy-efficient pair selection algorithm is proposed to select the best pairwise synchronization sequence to reduce the overall energy consumption. It is shown that in a multicluster networking environment, PBS requires a far less number of timing messages than other well-known synchronization protocols and incurs no loss in synchronization accuracy. Moreover, the proposed scheme presents significant energy savings for densely deployed WSNs.

  16. Unjamming in models with analytic pairwise potentials

    Science.gov (United States)

    Kooij, Stefan; Lerner, Edan

    2017-06-01

    Canonical models for studying the unjamming scenario in systems of soft repulsive particles assume pairwise potentials with a sharp cutoff in the interaction range. The sharp cutoff renders the potential nonanalytic but makes it possible to describe many properties of the solid in terms of the coordination number z , which has an unambiguous definition in these cases. Pairwise potentials without a sharp cutoff in the interaction range have not been studied in this context, but should in fact be considered to understand the relevance of the unjamming phenomenology in systems where such a cutoff is not present. In this work we explore two systems with such interactions: an inverse power law and an exponentially decaying pairwise potential, with the control parameters being the exponent (of the inverse power law) for the former and the number density for the latter. Both systems are shown to exhibit the characteristic features of the unjamming transition, among which are the vanishing of the shear-to-bulk modulus ratio and the emergence of an excess of low-frequency vibrational modes. We establish a relation between the pressure-to-bulk modulus ratio and the distance to unjamming in each of our model systems. This allows us to predict the dependence of other key observables on the distance to unjamming. Our results provide the means for a quantitative estimation of the proximity of generic glass-forming models to the unjamming transition in the absence of a clear-cut definition of the coordination number and highlight the general irrelevance of nonaffine contributions to the bulk modulus.

  17. Pairwise Trajectory Management (PTM): Concept Overview

    Science.gov (United States)

    Jones, Kenneth M.; Graff, Thomas J.; Chartrand, Ryan C.; Carreno, Victor; Kibler, Jennifer L.

    2017-01-01

    Pairwise Trajectory Management (PTM) is an Interval Management (IM) concept that utilizes airborne and ground-based capabilities to enable the implementation of airborne pairwise spacing capabilities in oceanic regions. The goal of PTM is to use airborne surveillance and tools to manage an "at or greater than" inter-aircraft spacing. Due to the precision of Automatic Dependent Surveillance-Broadcast (ADS-B) information and the use of airborne spacing guidance, the PTM minimum spacing distance will be less than distances a controller can support with current automation systems that support oceanic operations. Ground tools assist the controller in evaluating the traffic picture and determining appropriate PTM clearances to be issued. Avionics systems provide guidance information that allows the flight crew to conform to the PTM clearance issued by the controller. The combination of a reduced minimum distance and airborne spacing management will increase the capacity and efficiency of aircraft operations at a given altitude or volume of airspace. This paper provides an overview of the proposed application, description of a few key scenarios, high level discussion of expected air and ground equipment and procedure changes, overview of a potential flight crew human-machine interface that would support PTM operations and some initial PTM benefits results.

  18. Fast and accurate estimation of the covariance between pairwise maximum likelihood distances

    Directory of Open Access Journals (Sweden)

    Manuel Gil

    2014-09-01

    Full Text Available Pairwise evolutionary distances are a model-based summary statistic for a set of molecular sequences. They represent the leaf-to-leaf path lengths of the underlying phylogenetic tree. Estimates of pairwise distances with overlapping paths covary because of shared mutation events. It is desirable to take these covariance structure into account to increase precision in any process that compares or combines distances. This paper introduces a fast estimator for the covariance of two pairwise maximum likelihood distances, estimated under general Markov models. The estimator is based on a conjecture (going back to Nei & Jin, 1989 which links the covariance to path lengths. It is proven here under a simple symmetric substitution model. A simulation shows that the estimator outperforms previously published ones in terms of the mean squared error.

  19. Fast and accurate estimation of the covariance between pairwise maximum likelihood distances.

    Science.gov (United States)

    Gil, Manuel

    2014-01-01

    Pairwise evolutionary distances are a model-based summary statistic for a set of molecular sequences. They represent the leaf-to-leaf path lengths of the underlying phylogenetic tree. Estimates of pairwise distances with overlapping paths covary because of shared mutation events. It is desirable to take these covariance structure into account to increase precision in any process that compares or combines distances. This paper introduces a fast estimator for the covariance of two pairwise maximum likelihood distances, estimated under general Markov models. The estimator is based on a conjecture (going back to Nei & Jin, 1989) which links the covariance to path lengths. It is proven here under a simple symmetric substitution model. A simulation shows that the estimator outperforms previously published ones in terms of the mean squared error.

  20. Revisiting the classification of curtoviruses based on genome-wide pairwise identity

    KAUST Repository

    Varsani, Arvind

    2014-01-25

    Members of the genus Curtovirus (family Geminiviridae) are important pathogens of many wild and cultivated plant species. Until recently, relatively few full curtovirus genomes have been characterised. However, with the 19 full genome sequences now available in public databases, we revisit the proposed curtovirus species and strain classification criteria. Using pairwise identities coupled with phylogenetic evidence, revised species and strain demarcation guidelines have been instituted. Specifically, we have established 77% genome-wide pairwise identity as a species demarcation threshold and 94% genome-wide pairwise identity as a strain demarcation threshold. Hence, whereas curtovirus sequences with >77% genome-wide pairwise identity would be classified as belonging to the same species, those sharing >94% identity would be classified as belonging to the same strain. We provide step-by-step guidelines to facilitate the classification of newly discovered curtovirus full genome sequences and a set of defined criteria for naming new species and strains. The revision yields three curtovirus species: Beet curly top virus (BCTV), Spinach severe surly top virus (SpSCTV) and Horseradish curly top virus (HrCTV). © 2014 Springer-Verlag Wien.

  1. Revisiting the classification of curtoviruses based on genome-wide pairwise identity

    KAUST Repository

    Varsani, Arvind; Martin, Darren Patrick; Navas-Castillo, Jesú s; Moriones, Enrique; Herná ndez-Zepeda, Cecilia; Idris, Ali; Murilo Zerbini, F.; Brown, Judith K.

    2014-01-01

    Members of the genus Curtovirus (family Geminiviridae) are important pathogens of many wild and cultivated plant species. Until recently, relatively few full curtovirus genomes have been characterised. However, with the 19 full genome sequences now available in public databases, we revisit the proposed curtovirus species and strain classification criteria. Using pairwise identities coupled with phylogenetic evidence, revised species and strain demarcation guidelines have been instituted. Specifically, we have established 77% genome-wide pairwise identity as a species demarcation threshold and 94% genome-wide pairwise identity as a strain demarcation threshold. Hence, whereas curtovirus sequences with >77% genome-wide pairwise identity would be classified as belonging to the same species, those sharing >94% identity would be classified as belonging to the same strain. We provide step-by-step guidelines to facilitate the classification of newly discovered curtovirus full genome sequences and a set of defined criteria for naming new species and strains. The revision yields three curtovirus species: Beet curly top virus (BCTV), Spinach severe surly top virus (SpSCTV) and Horseradish curly top virus (HrCTV). © 2014 Springer-Verlag Wien.

  2. Nonparametric predictive pairwise comparison with competing risks

    International Nuclear Information System (INIS)

    Coolen-Maturi, Tahani

    2014-01-01

    In reliability, failure data often correspond to competing risks, where several failure modes can cause a unit to fail. This paper presents nonparametric predictive inference (NPI) for pairwise comparison with competing risks data, assuming that the failure modes are independent. These failure modes could be the same or different among the two groups, and these can be both observed and unobserved failure modes. NPI is a statistical approach based on few assumptions, with inferences strongly based on data and with uncertainty quantified via lower and upper probabilities. The focus is on the lower and upper probabilities for the event that the lifetime of a future unit from one group, say Y, is greater than the lifetime of a future unit from the second group, say X. The paper also shows how the two groups can be compared based on particular failure mode(s), and the comparison of the two groups when some of the competing risks are combined is discussed

  3. Locating one pairwise interaction: Three recursive constructions

    Directory of Open Access Journals (Sweden)

    Charles J. Colbourn

    2016-09-01

    Full Text Available In a complex component-based system, choices (levels for components (factors may interact tocause faults in the system behaviour. When faults may be caused by interactions among few factorsat specific levels, covering arrays provide a combinatorial test suite for discovering the presence offaults. While well studied, covering arrays do not enable one to determine the specific levels of factorscausing the faults; locating arrays ensure that the results from test suite execution suffice to determinethe precise levels and factors causing faults, when the number of such causes is small. Constructionsfor locating arrays are at present limited to heuristic computational methods and quite specific directconstructions. In this paper three recursive constructions are developed for locating arrays to locateone pairwise interaction causing a fault.

  4. The FOLDALIGN web server for pairwise structural RNA alignment and mutual motif search

    DEFF Research Database (Denmark)

    Havgaard, Jakob Hull; Lyngsø, Rune B.; Gorodkin, Jan

    2005-01-01

    FOLDALIGN is a Sankoff-based algorithm for making structural alignments of RNA sequences. Here, we present a web server for making pairwise alignments between two RNA sequences, using the recently updated version of FOLDALIGN. The server can be used to scan two sequences for a common structural RNA...... motif of limited size, or the entire sequences can be aligned locally or globally. The web server offers a graphical interface, which makes it simple to make alignments and manually browse the results. the web server can be accessed at http://foldalign.kvl.dk...

  5. SFESA: a web server for pairwise alignment refinement by secondary structure shifts.

    Science.gov (United States)

    Tong, Jing; Pei, Jimin; Grishin, Nick V

    2015-09-03

    Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate. We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software. SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa.

  6. Statistical pairwise interaction model of stock market

    Science.gov (United States)

    Bury, Thomas

    2013-03-01

    Financial markets are a classical example of complex systems as they are compound by many interacting stocks. As such, we can obtain a surprisingly good description of their structure by making the rough simplification of binary daily returns. Spin glass models have been applied and gave some valuable results but at the price of restrictive assumptions on the market dynamics or they are agent-based models with rules designed in order to recover some empirical behaviors. Here we show that the pairwise model is actually a statistically consistent model with the observed first and second moments of the stocks orientation without making such restrictive assumptions. This is done with an approach only based on empirical data of price returns. Our data analysis of six major indices suggests that the actual interaction structure may be thought as an Ising model on a complex network with interaction strengths scaling as the inverse of the system size. This has potentially important implications since many properties of such a model are already known and some techniques of the spin glass theory can be straightforwardly applied. Typical behaviors, as multiple equilibria or metastable states, different characteristic time scales, spatial patterns, order-disorder, could find an explanation in this picture.

  7. Predicting community composition from pairwise interactions

    Science.gov (United States)

    Friedman, Jonathan; Higgins, Logan; Gore, Jeff

    The ability to predict the structure of complex, multispecies communities is crucial for understanding the impact of species extinction and invasion on natural communities, as well as for engineering novel, synthetic communities. Communities are often modeled using phenomenological models, such as the classical generalized Lotka-Volterra (gLV) model. While a lot of our intuition comes from such models, their predictive power has rarely been tested experimentally. To directly assess the predictive power of this approach, we constructed synthetic communities comprised of up to 8 soil bacteria. We measured the outcome of competition between all species pairs, and used these measurements to predict the composition of communities composed of more than 2 species. The pairwise competitions resulted in a diverse set of outcomes, including coexistence, exclusion, and bistability, and displayed evidence for both interference and facilitation. Most pair outcomes could be captured by the gLV framework, and the composition of multispecies communities could be predicted for communities composed solely of such pairs. Our results demonstrate the predictive ability and utility of simple phenomenology, which enables accurate predictions in the absence of mechanistic details.

  8. A scalable pairwise class interaction framework for multidimensional classification

    DEFF Research Database (Denmark)

    Arias, Jacinto; Gámez, Jose A.; Nielsen, Thomas Dyhre

    2016-01-01

    We present a general framework for multidimensional classification that cap- tures the pairwise interactions between class variables. The pairwise class inter- actions are encoded using a collection of base classifiers (Phase 1), for which the class predictions are combined in a Markov random fie...

  9. A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities

    OpenAIRE

    Maréchal Eric; Ortet Philippe; Roy Sylvaine; Bastien Olivier

    2005-01-01

    Abstract Background Popular methods to reconstruct molecular phylogenies are based on multiple sequence alignments, in which addition or removal of data may change the resulting tree topology. We have sought a representation of homologous proteins that would conserve the information of pair-wise sequence alignments, respect probabilistic properties of Z-scores (Monte Carlo methods applied to pair-wise comparisons) and be the basis for a novel method of consistent and stable phylogenetic recon...

  10. BETASCAN: probable beta-amyloids identified by pairwise probabilistic analysis.

    Directory of Open Access Journals (Sweden)

    Allen W Bryan

    2009-03-01

    Full Text Available Amyloids and prion proteins are clinically and biologically important beta-structures, whose supersecondary structures are difficult to determine by standard experimental or computational means. In addition, significant conformational heterogeneity is known or suspected to exist in many amyloid fibrils. Recent work has indicated the utility of pairwise probabilistic statistics in beta-structure prediction. We develop here a new strategy for beta-structure prediction, emphasizing the determination of beta-strands and pairs of beta-strands as fundamental units of beta-structure. Our program, BETASCAN, calculates likelihood scores for potential beta-strands and strand-pairs based on correlations observed in parallel beta-sheets. The program then determines the strands and pairs with the greatest local likelihood for all of the sequence's potential beta-structures. BETASCAN suggests multiple alternate folding patterns and assigns relative a priori probabilities based solely on amino acid sequence, probability tables, and pre-chosen parameters. The algorithm compares favorably with the results of previous algorithms (BETAPRO, PASTA, SALSA, TANGO, and Zyggregator in beta-structure prediction and amyloid propensity prediction. Accurate prediction is demonstrated for experimentally determined amyloid beta-structures, for a set of known beta-aggregates, and for the parallel beta-strands of beta-helices, amyloid-like globular proteins. BETASCAN is able both to detect beta-strands with higher sensitivity and to detect the edges of beta-strands in a richly beta-like sequence. For two proteins (Abeta and Het-s, there exist multiple sets of experimental data implying contradictory structures; BETASCAN is able to detect each competing structure as a potential structure variant. The ability to correlate multiple alternate beta-structures to experiment opens the possibility of computational investigation of prion strains and structural heterogeneity of amyloid

  11. Structural profiles of human miRNA families from pairwise clustering

    DEFF Research Database (Denmark)

    Kaczkowski, Bogumil; Þórarinsson, Elfar; Reiche, Kristin

    2009-01-01

    secondary structure already predicted, little is known about the patterns of structural conservation among pre-miRNAs. We address this issue by clustering the human pre-miRNA sequences based on pairwise, sequence and secondary structure alignment using FOLDALIGN, followed by global multiple alignment...... of obtained clusters by WAR. As a result, the common secondary structure was successfully determined for four FOLDALIGN clusters: the RF00027 structural family of the Rfam database and three clusters with previously undescribed consensus structures. Availability: http://genome.ku.dk/resources/mirclust...

  12. Pseudo inputs for pairwise learning with Gaussian processes

    DEFF Research Database (Denmark)

    Nielsen, Jens Brehm; Jensen, Bjørn Sand; Larsen, Jan

    2012-01-01

    We consider learning and prediction of pairwise comparisons between instances. The problem is motivated from a perceptual view point, where pairwise comparisons serve as an effective and extensively used paradigm. A state-of-the-art method for modeling pairwise data in high dimensional domains...... is based on a classical pairwise probit likelihood imposed with a Gaussian process prior. While extremely flexible, this non-parametric method struggles with an inconvenient O(n3) scaling in terms of the n input instances which limits the method only to smaller problems. To overcome this, we derive...... to other similar approximations that have been applied in standard Gaussian process regression and classification problems such as FI(T)C and PI(T)C....

  13. Pairwise Constraint-Guided Sparse Learning for Feature Selection.

    Science.gov (United States)

    Liu, Mingxia; Zhang, Daoqiang

    2016-01-01

    Feature selection aims to identify the most informative features for a compact and accurate data representation. As typical supervised feature selection methods, Lasso and its variants using L1-norm-based regularization terms have received much attention in recent studies, most of which use class labels as supervised information. Besides class labels, there are other types of supervised information, e.g., pairwise constraints that specify whether a pair of data samples belong to the same class (must-link constraint) or different classes (cannot-link constraint). However, most of existing L1-norm-based sparse learning methods do not take advantage of the pairwise constraints that provide us weak and more general supervised information. For addressing that problem, we propose a pairwise constraint-guided sparse (CGS) learning method for feature selection, where the must-link and the cannot-link constraints are used as discriminative regularization terms that directly concentrate on the local discriminative structure of data. Furthermore, we develop two variants of CGS, including: 1) semi-supervised CGS that utilizes labeled data, pairwise constraints, and unlabeled data and 2) ensemble CGS that uses the ensemble of pairwise constraint sets. We conduct a series of experiments on a number of data sets from University of California-Irvine machine learning repository, a gene expression data set, two real-world neuroimaging-based classification tasks, and two large-scale attribute classification tasks. Experimental results demonstrate the efficacy of our proposed methods, compared with several established feature selection methods.

  14. Pairwise structure alignment specifically tuned for surface pockets and interaction interfaces

    KAUST Repository

    Cui, Xuefeng

    2015-09-09

    To detect and evaluate the similarities between the three-dimensional (3D) structures of two molecules, various kinds of methods have been proposed for the pairwise structure alignment problem [6, 9, 7, 11]. The problem plays important roles when studying the function and the evolution of biological molecules. Recently, pairwise structure alignment methods have been extended and applied on surface pocket structures [10, 3, 5] and interaction interface structures [8, 4]. The results show that, even when there are no global similarities discovered between the global sequences and the global structures, biological molecules or complexes could share similar functions because of well conserved pockets and interfaces. Thus, pairwise pocket and interface structure alignments are promising to unveil such shared functions that cannot be discovered by the well-studied global sequence and global structure alignments. State-of-the-art methods for pairwise pocket and interface structure alignments [4, 5] are direct extensions of the classic pairwise protein structure alignment methods, and thus such methods share a few limitations. First, the goal of the classic protein structure alignment methods is to align single-chain protein structures (i.e., a single fragment of residues connected by peptide bonds). However, we observed that pockets and interfaces tend to consist of tens of extremely short backbone fragments (i.e., three or fewer residues connected by peptide bonds). Thus, existing pocket and interface alignment methods based on the protein structure alignment methods still rely on the existence of long-enough backbone fragments, and the fragmentation issue of pockets and interfaces rises the risk of missing the optimal alignments. Moreover, existing interface structure alignment methods focus on protein-protein interfaces, and require a "blackbox preprocessing" before aligning protein-DNA and protein-RNA interfaces. Therefore, we introduce the PROtein STucture Alignment

  15. Automatic Camera Calibration Using Multiple Sets of Pairwise Correspondences.

    Science.gov (United States)

    Vasconcelos, Francisco; Barreto, Joao P; Boyer, Edmond

    2018-04-01

    We propose a new method to add an uncalibrated node into a network of calibrated cameras using only pairwise point correspondences. While previous methods perform this task using triple correspondences, these are often difficult to establish when there is limited overlap between different views. In such challenging cases we must rely on pairwise correspondences and our solution becomes more advantageous. Our method includes an 11-point minimal solution for the intrinsic and extrinsic calibration of a camera from pairwise correspondences with other two calibrated cameras, and a new inlier selection framework that extends the traditional RANSAC family of algorithms to sampling across multiple datasets. Our method is validated on different application scenarios where a lack of triple correspondences might occur: addition of a new node to a camera network; calibration and motion estimation of a moving camera inside a camera network; and addition of views with limited overlap to a Structure-from-Motion model.

  16. A predictive model of music preference using pairwise comparisons

    DEFF Research Database (Denmark)

    Jensen, Bjørn Sand; Gallego, Javier Saez; Larsen, Jan

    2012-01-01

    Music recommendation is an important aspect of many streaming services and multi-media systems, however, it is typically based on so-called collaborative filtering methods. In this paper we consider the recommendation task from a personal viewpoint and examine to which degree music preference can...... be elicited and predicted using simple and robust queries such as pairwise comparisons. We propose to model - and in turn predict - the pairwise music preference using a very flexible model based on Gaussian Process priors for which we describe the required inference. We further propose a specific covariance...

  17. Dynamics of pairwise motions in the Cosmic Web

    Science.gov (United States)

    Hellwing, Wojciech A.

    2016-10-01

    We present results of analysis of the dark matter (DM) pairwise velocity statistics in different Cosmic Web environments. We use the DM velocity and density field from the Millennium 2 simulation together with the NEXUS+ algorithm to segment the simulation volume into voxels uniquely identifying one of the four possible environments: nodes, filaments, walls or cosmic voids. We show that the PDFs of the mean infall velocities v 12 as well as its spatial dependence together with the perpendicular and parallel velocity dispersions bear a significant signal of the large-scale structure environment in which DM particle pairs are embedded. The pairwise flows are notably colder and have smaller mean magnitude in wall and voids, when compared to much denser environments of filaments and nodes. We discuss on our results, indicating that they are consistent with a simple theoretical predictions for pairwise motions as induced by gravitational instability mechanism. Our results indicate that the Cosmic Web elements are coherent dynamical entities rather than just temporal geometrical associations. In addition it should be possible to observationally test various Cosmic Web finding algorithms by segmenting available peculiar velocity data and studying resulting pairwise velocity statistics.

  18. Determinants of sovereign debt yield spreads under EMU: Pairwise approach

    NARCIS (Netherlands)

    Fazlioglu, S.

    2013-01-01

    This study aims at providing an empirical analysis of long-term determinants of sovereign debt yield spreads under European EMU (Economic and Monetary Union) through pairwise approach within panel framework. Panel gravity models are increasingly used in the cross-market correlation literature while

  19. Modeling Expressed Emotions in Music using Pairwise Comparisons

    DEFF Research Database (Denmark)

    Madsen, Jens; Nielsen, Jens Brehm; Jensen, Bjørn Sand

    2012-01-01

    We introduce a two-alternative forced-choice experimental paradigm to quantify expressed emotions in music using the two wellknown arousal and valence (AV) dimensions. In order to produce AV scores from the pairwise comparisons and to visualize the locations of excerpts in the AV space, we...

  20. An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast

    Directory of Open Access Journals (Sweden)

    Taneda Akito

    2008-12-01

    Full Text Available Abstract Background Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Although many sophisticated algorithms for the purpose have been proposed to date, further improvement in efficiency is necessary to accelerate its large-scale applications including non-coding RNA (ncRNA discovery. Results We developed a new genetic algorithm, Cofolga2, for simultaneously computing pairwise RNA sequence alignment and consensus folding, and benchmarked it using BRAliBase 2.1. The benchmark results showed that our new algorithm is accurate and efficient in both time and memory usage. Then, combining with the originally trained SVM, we applied the new algorithm to novel ncRNA discovery where we compared S. cerevisiae genome with six related genomes in a pairwise manner. By focusing our search to the relatively short regions (50 bp to 2,000 bp sandwiched by conserved sequences, we successfully predict 714 intergenic and 1,311 sense or antisense ncRNA candidates, which were found in the pairwise alignments with stable consensus secondary structure and low sequence identity (≤ 50%. By comparing with the previous predictions, we found that > 92% of the candidates is novel candidates. The estimated rate of false positives in the predicted candidates is 51%. Twenty-five percent of the intergenic candidates has supports for expression in cell, i.e. their genomic positions overlap those of the experimentally determined transcripts in literature. By manual inspection of the results, moreover, we obtained four multiple alignments with low sequence identity which reveal consensus structures shared by three species/sequences. Conclusion The present method gives an efficient tool complementary to sequence-alignment-based ncRNA finders.

  1. Solution to urn models of pairwise interaction with application to social, physical, and biological sciences

    Science.gov (United States)

    Pickering, William; Lim, Chjan

    2017-07-01

    We investigate a family of urn models that correspond to one-dimensional random walks with quadratic transition probabilities that have highly diverse applications. Well-known instances of these two-urn models are the Ehrenfest model of molecular diffusion, the voter model of social influence, and the Moran model of population genetics. We also provide a generating function method for diagonalizing the corresponding transition matrix that is valid if and only if the underlying mean density satisfies a linear differential equation and express the eigenvector components as terms of ordinary hypergeometric functions. The nature of the models lead to a natural extension to interaction between agents in a general network topology. We analyze the dynamics on uncorrelated heterogeneous degree sequence networks and relate the convergence times to the moments of the degree sequences for various pairwise interaction mechanisms.

  2. Dynamics of pairwise entanglement between two Tavis-Cummings atoms

    International Nuclear Information System (INIS)

    Guo Jinliang; Song Heshan

    2008-01-01

    We investigate the time evolution of pairwise entanglement between two Tavis-Cummings atoms for various entangled initial states, including pure and mixed states. We find that the phenomenon of entanglement sudden death behaviors is distinct in the evolution of entanglement for different initial states. What deserves mentioning here is that the initial portion of the excited state in the initial state is responsible for the sudden death of entanglement, and the degree of this effect also depends on the initial states

  3. Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels.

    Science.gov (United States)

    Fu, Yanwei; Hospedales, Timothy M; Xiang, Tao; Xiong, Jiechao; Gong, Shaogang; Wang, Yizhou; Yao, Yuan

    2016-03-01

    The problem of estimating subjective visual properties from image and video has attracted increasing interest. A subjective visual property is useful either on its own (e.g. image and video interestingness) or as an intermediate representation for visual recognition (e.g. a relative attribute). Due to its ambiguous nature, annotating the value of a subjective visual property for learning a prediction model is challenging. To make the annotation more reliable, recent studies employ crowdsourcing tools to collect pairwise comparison labels. However, using crowdsourced data also introduces outliers. Existing methods rely on majority voting to prune the annotation outliers/errors. They thus require a large amount of pairwise labels to be collected. More importantly as a local outlier detection method, majority voting is ineffective in identifying outliers that can cause global ranking inconsistencies. In this paper, we propose a more principled way to identify annotation outliers by formulating the subjective visual property prediction task as a unified robust learning to rank problem, tackling both the outlier detection and learning to rank jointly. This differs from existing methods in that (1) the proposed method integrates local pairwise comparison labels together to minimise a cost that corresponds to global inconsistency of ranking order, and (2) the outlier detection and learning to rank problems are solved jointly. This not only leads to better detection of annotation outliers but also enables learning with extremely sparse annotations.

  4. Pairwise contact energy statistical potentials can help to find probability of point mutations.

    Science.gov (United States)

    Saravanan, K M; Suvaithenamudhan, S; Parthasarathy, S; Selvaraj, S

    2017-01-01

    To adopt a particular fold, a protein requires several interactions between its amino acid residues. The energetic contribution of these residue-residue interactions can be approximated by extracting statistical potentials from known high resolution structures. Several methods based on statistical potentials extracted from unrelated proteins are found to make a better prediction of probability of point mutations. We postulate that the statistical potentials extracted from known structures of similar folds with varying sequence identity can be a powerful tool to examine probability of point mutation. By keeping this in mind, we have derived pairwise residue and atomic contact energy potentials for the different functional families that adopt the (α/β) 8 TIM-Barrel fold. We carried out computational point mutations at various conserved residue positions in yeast Triose phosphate isomerase enzyme for which experimental results are already reported. We have also performed molecular dynamics simulations on a subset of point mutants to make a comparative study. The difference in pairwise residue and atomic contact energy of wildtype and various point mutations reveals probability of mutations at a particular position. Interestingly, we found that our computational prediction agrees with the experimental studies of Silverman et al. (Proc Natl Acad Sci 2001;98:3092-3097) and perform better prediction than i Mutant and Cologne University Protein Stability Analysis Tool. The present work thus suggests deriving pairwise contact energy potentials and molecular dynamics simulations of functionally important folds could help us to predict probability of point mutations which may ultimately reduce the time and cost of mutation experiments. Proteins 2016; 85:54-64. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  5. PairWise Neighbours database: overlaps and spacers among prokaryote genomes

    Directory of Open Access Journals (Sweden)

    Garcia-Vallvé Santiago

    2009-06-01

    Full Text Available Abstract Background Although prokaryotes live in a variety of habitats and possess different metabolic and genomic complexity, they have several genomic architectural features in common. The overlapping genes are a common feature of the prokaryote genomes. The overlapping lengths tend to be short because as the overlaps become longer they have more risk of deleterious mutations. The spacers between genes tend to be short too because of the tendency to reduce the non coding DNA among prokaryotes. However they must be long enough to maintain essential regulatory signals such as the Shine-Dalgarno (SD sequence, which is responsible of an efficient translation. Description PairWise Neighbours is an interactive and intuitive database used for retrieving information about the spacers and overlapping genes among bacterial and archaeal genomes. It contains 1,956,294 gene pairs from 678 fully sequenced prokaryote genomes and is freely available at the URL http://genomes.urv.cat/pwneigh. This database provides information about the overlaps and their conservation across species. Furthermore, it allows the wide analysis of the intergenic regions providing useful information such as the location and strength of the SD sequence. Conclusion There are experiments and bioinformatic analysis that rely on correct annotations of the initiation site. Therefore, a database that studies the overlaps and spacers among prokaryotes appears to be desirable. PairWise Neighbours database permits the reliability analysis of the overlapping structures and the study of the SD presence and location among the adjacent genes, which may help to check the annotation of the initiation sites.

  6. Pairwise Trajectory Management (PTM): Concept Description and Documentation

    Science.gov (United States)

    Jones, Kenneth M.; Graff, Thomas J.; Carreno, Victor; Chartrand, Ryan C.; Kibler, Jennifer L.

    2018-01-01

    Pairwise Trajectory Management (PTM) is an Interval Management (IM) concept that utilizes airborne and ground-based capabilities to enable the implementation of airborne pairwise spacing capabilities in oceanic regions. The goal of PTM is to use airborne surveillance and tools to manage an "at or greater than" inter-aircraft spacing. Due to the accuracy of Automatic Dependent Surveillance-Broadcast (ADS-B) information and the use of airborne spacing guidance, the minimum PTM spacing distance will be less than distances a controller can support with current automation systems that support oceanic operations. Ground tools assist the controller in evaluating the traffic picture and determining appropriate PTM clearances to be issued. Avionics systems provide guidance information that allows the flight crew to conform to the PTM clearance issued by the controller. The combination of a reduced minimum distance and airborne spacing management will increase the capacity and efficiency of aircraft operations at a given altitude or volume of airspace. This document provides an overview of the proposed application, a description of several key scenarios, a high level discussion of expected air and ground equipment and procedure changes, a description of a NASA human-machine interface (HMI) prototype for the flight crew that would support PTM operations, and initial benefits analysis results. Additionally, included as appendices, are the following documents: the PTM Operational Services and Environment Definition (OSED) document and a companion "Future Considerations for the Pairwise Trajectory Management (PTM) Concept: Potential Future Updates for the PTM OSED" paper, a detailed description of the PTM algorithm and PTM Limit Mach rules, initial PTM safety requirements and safety assessment documents, a detailed description of the design, development, and initial evaluations of the proposed flight crew HMI, an overview of the methodology and results of PTM pilot training

  7. ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.

    Science.gov (United States)

    Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

    2010-03-01

    Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. The database can be accessed through http://proteinworlddb.org

  8. Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix

    DEFF Research Database (Denmark)

    Havgaard, Jakob Hull; Torarinsson, Elfar; Gorodkin, Jan

    2007-01-01

    and backtracked in a normal fashion. Finally, the FOLDALIGN algorithm has also been updated with a better memory implementation and an improved energy model. With these improvements in the algorithm, the FOLDALIGN software package provides the molecular biologist with an efficient and user-friendly tool...... the advantage of providing the constraints dynamically. This has been included in a new implementation of the FOLDALIGN algorithm for pairwise local or global structural alignment of RNA sequences. It is shown that time and memory requirements are dramatically lowered while overall performance is maintained....... Furthermore, a new divide and conquer method is introduced to limit the memory requirement during global alignment and backtrack of local alignment. All branch points in the computed RNA structure are found and used to divide the structure into smaller unbranched segments. Each segment is then realigned...

  9. Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models.

    Directory of Open Access Journals (Sweden)

    Richard R Stein

    2015-07-01

    Full Text Available Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

  10. Measuring pair-wise molecular interactions in a complex mixture

    Science.gov (United States)

    Chakraborty, Krishnendu; Varma, Manoj M.; Venkatapathi, Murugesan

    2016-03-01

    Complex biological samples such as serum contain thousands of proteins and other molecules spanning up to 13 orders of magnitude in concentration. Present measurement techniques do not permit the analysis of all pair-wise interactions between the components of such a complex mixture to a given target molecule. In this work we explore the use of nanoparticle tags which encode the identity of the molecule to obtain the statistical distribution of pair-wise interactions using their Localized Surface Plasmon Resonance (LSPR) signals. The nanoparticle tags are chosen such that the binding between two molecules conjugated to the respective nanoparticle tags can be recognized by the coupling of their LSPR signals. This numerical simulation is done by DDA to investigate this approach using a reduced system consisting of three nanoparticles (a gold ellipsoid with aspect ratio 2.5 and short axis 16 nm, and two silver ellipsoids with aspect ratios 3 and 2 and short axes 8 nm and 10 nm respectively) and the set of all possible dimers formed between them. Incident light was circularly polarized and all possible particle and dimer orientations were considered. We observed that minimum peak separation between two spectra is 5 nm while maximum is 184nm.

  11. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene...

  12. Calibration of Smartphone-Based Weather Measurements Using Pairwise Gossip

    Directory of Open Access Journals (Sweden)

    Jane Louie Fresco Zamora

    2015-01-01

    Full Text Available Accurate and reliable daily global weather reports are necessary for weather forecasting and climate analysis. However, the availability of these reports continues to decline due to the lack of economic support and policies in maintaining ground weather measurement systems from where these reports are obtained. Thus, to mitigate data scarcity, it is required to utilize weather information from existing sensors and built-in smartphone sensors. However, as smartphone usage often varies according to human activity, it is difficult to obtain accurate measurement data. In this paper, we present a heuristic-based pairwise gossip algorithm that will calibrate smartphone-based pressure sensors with respect to fixed weather stations as our referential ground truth. Based on actual measurements, we have verified that smartphone-based readings are unstable when observed during movement. Using our calibration algorithm on actual smartphone-based pressure readings, the updated values were significantly closer to the ground truth values.

  13. Calibration of Smartphone-Based Weather Measurements Using Pairwise Gossip.

    Science.gov (United States)

    Zamora, Jane Louie Fresco; Kashihara, Shigeru; Yamaguchi, Suguru

    2015-01-01

    Accurate and reliable daily global weather reports are necessary for weather forecasting and climate analysis. However, the availability of these reports continues to decline due to the lack of economic support and policies in maintaining ground weather measurement systems from where these reports are obtained. Thus, to mitigate data scarcity, it is required to utilize weather information from existing sensors and built-in smartphone sensors. However, as smartphone usage often varies according to human activity, it is difficult to obtain accurate measurement data. In this paper, we present a heuristic-based pairwise gossip algorithm that will calibrate smartphone-based pressure sensors with respect to fixed weather stations as our referential ground truth. Based on actual measurements, we have verified that smartphone-based readings are unstable when observed during movement. Using our calibration algorithm on actual smartphone-based pressure readings, the updated values were significantly closer to the ground truth values.

  14. Pairwise comparisons and visual perceptions of equal area polygons.

    Science.gov (United States)

    Adamic, P; Babiy, V; Janicki, R; Kakiashvili, T; Koczkodaj, W W; Tadeusiewicz, R

    2009-02-01

    The number of studies related to visual perception has been plentiful in recent years. Participants rated the areas of five randomly generated shapes of equal area, using a reference unit area that was displayed together with the shapes. Respondents were 179 university students from Canada and Poland. The average error estimated by respondents using the unit square was 25.75%. The error was substantially decreased to 5.51% when the shapes were compared to one another in pairs. This gain of 20.24% for this two-dimensional experiment was substantially better than the 11.78% gain reported in the previous one-dimensional experiments. This is the first statistically sound two-dimensional experiment demonstrating that pairwise comparisons improve accuracy.

  15. AllerHunter: a SVM-pairwise system for assessment of allergenicity and allergic cross-reactivity in proteins.

    Directory of Open Access Journals (Sweden)

    Hon Cheng Muh

    Full Text Available Allergy is a major health problem in industrialized countries. The number of transgenic food crops is growing rapidly creating the need for allergenicity assessment before they are introduced into human food chain. While existing bioinformatic methods have achieved good accuracies for highly conserved sequences, the discrimination of allergens and non-allergens from allergen-like non-allergen sequences remains difficult. We describe AllerHunter, a web-based computational system for the assessment of potential allergenicity and allergic cross-reactivity in proteins. It combines an iterative pairwise sequence similarity encoding scheme with SVM as the discriminating engine. The pairwise vectorization framework allows the system to model essential features in allergens that are involved in cross-reactivity, but not limited to distinct sets of physicochemical properties. The system was rigorously trained and tested using 1,356 known allergen and 13,449 putative non-allergen sequences. Extensive testing was performed for validation of the prediction models. The system is effective for distinguishing allergens and non-allergens from allergen-like non-allergen sequences. Testing results showed that AllerHunter, with a sensitivity of 83.4% and specificity of 96.4% (accuracy = 95.3%, area under the receiver operating characteristic curve AROC = 0.928+/-0.004 and Matthew's correlation coefficient MCC = 0.738, performs significantly better than a number of existing methods using an independent dataset of 1443 protein sequences. AllerHunter is available at (http://tiger.dbs.nus.edu.sg/AllerHunter.

  16. Convergent cross-mapping and pairwise asymmetric inference.

    Science.gov (United States)

    McCracken, James M; Weigel, Robert S

    2014-12-01

    Convergent cross-mapping (CCM) is a technique for computing specific kinds of correlations between sets of times series. It was introduced by Sugihara et al. [Science 338, 496 (2012).] and is reported to be "a necessary condition for causation" capable of distinguishing causality from standard correlation. We show that the relationships between CCM correlations proposed by Sugihara et al. do not, in general, agree with intuitive concepts of "driving" and as such should not be considered indicative of causality. It is shown that the fact that the CCM algorithm implies causality is a function of system parameters for simple linear and nonlinear systems. For example, in a circuit containing a single resistor and inductor, both voltage and current can be identified as the driver depending on the frequency of the source voltage. It is shown that the CCM algorithm, however, can be modified to identify relationships between pairs of time series that are consistent with intuition for the considered example systems for which CCM causality analysis provided nonintuitive driver identifications. This modification of the CCM algorithm is introduced as "pairwise asymmetric inference" (PAI) and examples of its use are presented.

  17. Identifying the Academic Rising Stars via Pairwise Citation Increment Ranking

    KAUST Repository

    Zhang, Chuxu

    2017-08-02

    Predicting the fast-rising young researchers (the Academic Rising Stars) in the future provides useful guidance to the research community, e.g., offering competitive candidates to university for young faculty hiring as they are expected to have success academic careers. In this work, given a set of young researchers who have published the first first-author paper recently, we solve the problem of how to effectively predict the top k% researchers who achieve the highest citation increment in Δt years. We explore a series of factors that can drive an author to be fast-rising and design a novel pairwise citation increment ranking (PCIR) method that leverages those factors to predict the academic rising stars. Experimental results on the large ArnetMiner dataset with over 1.7 million authors demonstrate the effectiveness of PCIR. Specifically, it outperforms all given benchmark methods, with over 8% average improvement. Further analysis demonstrates that temporal features are the best indicators for rising stars prediction, while venue features are less relevant.

  18. A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities.

    Science.gov (United States)

    Bastien, Olivier; Ortet, Philippe; Roy, Sylvaine; Maréchal, Eric

    2005-03-10

    Popular methods to reconstruct molecular phylogenies are based on multiple sequence alignments, in which addition or removal of data may change the resulting tree topology. We have sought a representation of homologous proteins that would conserve the information of pair-wise sequence alignments, respect probabilistic properties of Z-scores (Monte Carlo methods applied to pair-wise comparisons) and be the basis for a novel method of consistent and stable phylogenetic reconstruction. We have built up a spatial representation of protein sequences using concepts from particle physics (configuration space) and respecting a frame of constraints deduced from pair-wise alignment score properties in information theory. The obtained configuration space of homologous proteins (CSHP) allows the representation of real and shuffled sequences, and thereupon an expression of the TULIP theorem for Z-score probabilities. Based on the CSHP, we propose a phylogeny reconstruction using Z-scores. Deduced trees, called TULIP trees, are consistent with multiple-alignment based trees. Furthermore, the TULIP tree reconstruction method provides a solution for some previously reported incongruent results, such as the apicomplexan enolase phylogeny. The CSHP is a unified model that conserves mutual information between proteins in the way physical models conserve energy. Applications include the reconstruction of evolutionary consistent and robust trees, the topology of which is based on a spatial representation that is not reordered after addition or removal of sequences. The CSHP and its assigned phylogenetic topology, provide a powerful and easily updated representation for massive pair-wise genome comparisons based on Z-score computations.

  19. A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities

    Directory of Open Access Journals (Sweden)

    Maréchal Eric

    2005-03-01

    Full Text Available Abstract Background Popular methods to reconstruct molecular phylogenies are based on multiple sequence alignments, in which addition or removal of data may change the resulting tree topology. We have sought a representation of homologous proteins that would conserve the information of pair-wise sequence alignments, respect probabilistic properties of Z-scores (Monte Carlo methods applied to pair-wise comparisons and be the basis for a novel method of consistent and stable phylogenetic reconstruction. Results We have built up a spatial representation of protein sequences using concepts from particle physics (configuration space and respecting a frame of constraints deduced from pair-wise alignment score properties in information theory. The obtained configuration space of homologous proteins (CSHP allows the representation of real and shuffled sequences, and thereupon an expression of the TULIP theorem for Z-score probabilities. Based on the CSHP, we propose a phylogeny reconstruction using Z-scores. Deduced trees, called TULIP trees, are consistent with multiple-alignment based trees. Furthermore, the TULIP tree reconstruction method provides a solution for some previously reported incongruent results, such as the apicomplexan enolase phylogeny. Conclusion The CSHP is a unified model that conserves mutual information between proteins in the way physical models conserve energy. Applications include the reconstruction of evolutionary consistent and robust trees, the topology of which is based on a spatial representation that is not reordered after addition or removal of sequences. The CSHP and its assigned phylogenetic topology, provide a powerful and easily updated representation for massive pair-wise genome comparisons based on Z-score computations.

  20. The Role of Middlemen inEfficient and Strongly Pairwise Stable Networks

    NARCIS (Netherlands)

    Gilles, R.P.; Chakrabarti, S.; Sarangi, S.; Badasyan, N.

    2004-01-01

    We examine the strong pairwise stability concept in network formation theory under collective network benefits.Strong pairwise stability considers a pair of players to add a link through mutual consent while permitting them to unilaterally delete any subset of links under their control.We examine

  1. Screening synteny blocks in pairwise genome comparisons through integer programming.

    Science.gov (United States)

    Tang, Haibao; Lyons, Eric; Pedersen, Brent; Schnable, James C; Paterson, Andrew H; Freeling, Michael

    2011-04-18

    It is difficult to accurately interpret chromosomal correspondences such as true orthology and paralogy due to significant divergence of genomes from a common ancestor. Analyses are particularly problematic among lineages that have repeatedly experienced whole genome duplication (WGD) events. To compare multiple "subgenomes" derived from genome duplications, we need to relax the traditional requirements of "one-to-one" syntenic matchings of genomic regions in order to reflect "one-to-many" or more generally "many-to-many" matchings. However this relaxation may result in the identification of synteny blocks that are derived from ancient shared WGDs that are not of interest. For many downstream analyses, we need to eliminate weak, low scoring alignments from pairwise genome comparisons. Our goal is to objectively select subset of synteny blocks whose total scores are maximized while respecting the duplication history of the genomes in comparison. We call this "quota-based" screening of synteny blocks in order to appropriately fill a quota of syntenic relationships within one genome or between two genomes having WGD events. We have formulated the synteny block screening as an optimization problem known as "Binary Integer Programming" (BIP), which is solved using existing linear programming solvers. The computer program QUOTA-ALIGN performs this task by creating a clear objective function that maximizes the compatible set of synteny blocks under given constraints on overlaps and depths (corresponding to the duplication history in respective genomes). Such a procedure is useful for any pairwise synteny alignments, but is most useful in lineages affected by multiple WGDs, like plants or fish lineages. For example, there should be a 1:2 ploidy relationship between genome A and B if genome B had an independent WGD subsequent to the divergence of the two genomes. We show through simulations and real examples using plant genomes in the rosid superorder that the quota

  2. GraphAlignment: Bayesian pairwise alignment of biological networks

    Directory of Open Access Journals (Sweden)

    Kolář Michal

    2012-11-01

    Full Text Available Abstract Background With increased experimental availability and accuracy of bio-molecular networks, tools for their comparative and evolutionary analysis are needed. A key component for such studies is the alignment of networks. Results We introduce the Bioconductor package GraphAlignment for pairwise alignment of bio-molecular networks. The alignment incorporates information both from network vertices and network edges and is based on an explicit evolutionary model, allowing inference of all scoring parameters directly from empirical data. We compare the performance of our algorithm to an alternative algorithm, Græmlin 2.0. On simulated data, GraphAlignment outperforms Græmlin 2.0 in several benchmarks except for computational complexity. When there is little or no noise in the data, GraphAlignment is slower than Græmlin 2.0. It is faster than Græmlin 2.0 when processing noisy data containing spurious vertex associations. Its typical case complexity grows approximately as O(N2.6. On empirical bacterial protein-protein interaction networks (PIN and gene co-expression networks, GraphAlignment outperforms Græmlin 2.0 with respect to coverage and specificity, albeit by a small margin. On large eukaryotic PIN, Græmlin 2.0 outperforms GraphAlignment. Conclusions The GraphAlignment algorithm is robust to spurious vertex associations, correctly resolves paralogs, and shows very good performance in identification of homologous vertices defined by high vertex and/or interaction similarity. The simplicity and generality of GraphAlignment edge scoring makes the algorithm an appropriate choice for global alignment of networks.

  3. Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix.

    Directory of Open Access Journals (Sweden)

    Jakob H Havgaard

    2007-10-01

    Full Text Available It has become clear that noncoding RNAs (ncRNA play important roles in cells, and emerging studies indicate that there might be a large number of unknown ncRNAs in mammalian genomes. There exist computational methods that can be used to search for ncRNAs by comparing sequences from different genomes. One main problem with these methods is their computational complexity, and heuristics are therefore employed. Two heuristics are currently very popular: pre-folding and pre-aligning. However, these heuristics are not ideal, as pre-aligning is dependent on sequence similarity that may not be present and pre-folding ignores the comparative information. Here, pruning of the dynamical programming matrix is presented as an alternative novel heuristic constraint. All subalignments that do not exceed a length-dependent minimum score are discarded as the matrix is filled out, thus giving the advantage of providing the constraints dynamically. This has been included in a new implementation of the FOLDALIGN algorithm for pairwise local or global structural alignment of RNA sequences. It is shown that time and memory requirements are dramatically lowered while overall performance is maintained. Furthermore, a new divide and conquer method is introduced to limit the memory requirement during global alignment and backtrack of local alignment. All branch points in the computed RNA structure are found and used to divide the structure into smaller unbranched segments. Each segment is then realigned and backtracked in a normal fashion. Finally, the FOLDALIGN algorithm has also been updated with a better memory implementation and an improved energy model. With these improvements in the algorithm, the FOLDALIGN software package provides the molecular biologist with an efficient and user-friendly tool for searching for new ncRNAs. The software package is available for download at http://foldalign.ku.dk.

  4. constNJ: an algorithm to reconstruct sets of phylogenetic trees satisfying pairwise topological constraints.

    Science.gov (United States)

    Matsen, Frederick A

    2010-06-01

    This article introduces constNJ (constrained neighbor-joining), an algorithm for phylogenetic reconstruction of sets of trees with constrained pairwise rooted subtree-prune-regraft (rSPR) distance. We are motivated by the problem of constructing sets of trees that must fit into a recombination, hybridization, or similar network. Rather than first finding a set of trees that are optimal according to a phylogenetic criterion (e.g., likelihood or parsimony) and then attempting to fit them into a network, constNJ estimates the trees while enforcing specified rSPR distance constraints. The primary input for constNJ is a collection of distance matrices derived from sequence blocks which are assumed to have evolved in a tree-like manner, such as blocks of an alignment which do not contain any recombination breakpoints. The other input is a set of rSPR constraint inequalities for any set of pairs of trees. constNJ is consistent and a strict generalization of the neighbor-joining algorithm; it uses the new notion of maximum agreement partitions (MAPs) to assure that the resulting trees satisfy the given rSPR distance constraints.

  5. Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA

    OpenAIRE

    Kelly, Brendan J.; Gross, Robert; Bittinger, Kyle; Sherrill-Mix, Scott; Lewis, James D.; Collman, Ronald G.; Bushman, Frederic D.; Li, Hongzhe

    2015-01-01

    Motivation: The variation in community composition between microbiome samples, termed beta diversity, can be measured by pairwise distance based on either presence–absence or quantitative species abundance data. PERMANOVA, a permutation-based extension of multivariate analysis of variance to a matrix of pairwise distances, partitions within-group and between-group distances to permit assessment of the effect of an exposure or intervention (grouping factor) upon the sampled microbiome. Within-...

  6. Pairwise Comparison and Distance Measure of Hesitant Fuzzy Linguistic Term Sets

    Directory of Open Access Journals (Sweden)

    Han-Chen Huang

    2014-01-01

    Full Text Available A hesitant fuzzy linguistic term set (HFLTS, allowing experts using several possible linguistic terms to assess a qualitative linguistic variable, is very useful to express people’s hesitancy in practical decision-making problems. Up to now, a little research has been done on the comparison and distance measure of HFLTSs. In this paper, we present a comparison method for HFLTSs based on pairwise comparisons of each linguistic term in the two HFLTSs. Then, a distance measure method based on the pairwise comparison matrix of HFLTSs is proposed, and we prove that this distance is equal to the distance of the average values of HFLTSs, which makes the distance measure much more simple. Finally, the pairwise comparison and distance measure methods are utilized to develop two multicriteria decision-making approaches under hesitant fuzzy linguistic environments. The results analysis shows that our methods in this paper are more reasonable.

  7. Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA.

    Science.gov (United States)

    Kelly, Brendan J; Gross, Robert; Bittinger, Kyle; Sherrill-Mix, Scott; Lewis, James D; Collman, Ronald G; Bushman, Frederic D; Li, Hongzhe

    2015-08-01

    The variation in community composition between microbiome samples, termed beta diversity, can be measured by pairwise distance based on either presence-absence or quantitative species abundance data. PERMANOVA, a permutation-based extension of multivariate analysis of variance to a matrix of pairwise distances, partitions within-group and between-group distances to permit assessment of the effect of an exposure or intervention (grouping factor) upon the sampled microbiome. Within-group distance and exposure/intervention effect size must be accurately modeled to estimate statistical power for a microbiome study that will be analyzed with pairwise distances and PERMANOVA. We present a framework for PERMANOVA power estimation tailored to marker-gene microbiome studies that will be analyzed by pairwise distances, which includes: (i) a novel method for distance matrix simulation that permits modeling of within-group pairwise distances according to pre-specified population parameters; (ii) a method to incorporate effects of different sizes within the simulated distance matrix; (iii) a simulation-based method for estimating PERMANOVA power from simulated distance matrices; and (iv) an R statistical software package that implements the above. Matrices of pairwise distances can be efficiently simulated to satisfy the triangle inequality and incorporate group-level effects, which are quantified by the adjusted coefficient of determination, omega-squared (ω2). From simulated distance matrices, available PERMANOVA power or necessary sample size can be estimated for a planned microbiome study. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Perceptron learning of pairwise contact energies for proteins incorporating the amino acid environment

    Science.gov (United States)

    Heo, Muyoung; Kim, Suhkmann; Moon, Eun-Joung; Cheon, Mookyung; Chung, Kwanghoon; Chang, Iksoo

    2005-07-01

    Although a coarse-grained description of proteins is a simple and convenient way to attack the protein folding problem, the construction of a global pairwise energy function which can simultaneously recognize the native folds of many proteins has resulted in partial success. We have sought the possibility of a systematic improvement of this pairwise-contact energy function as we extended the parameter space of amino acids, incorporating local environments of amino acids, beyond a 20×20 matrix. We have studied the pairwise contact energy functions of 20×20 , 60×60 , and 180×180 matrices depending on the extent of parameter space, and compared their effect on the learnability of energy parameters in the context of a gapless threading, bearing in mind that a 20×20 pairwise contact matrix has been shown to be too simple to recognize the native folds of many proteins. In this paper, we show that the construction of a global pairwise energy function was achieved using 1006 training proteins of a homology of less than 30%, which include all representatives of different protein classes. After parametrizing the local environments of the amino acids into nine categories depending on three secondary structures and three kinds of hydrophobicity (desolvation), the 16290 pairwise contact energies (scores) of the amino acids could be determined by perceptron learning and protein threading. These could simultaneously recognize all the native folds of the 1006 training proteins. When these energy parameters were tested on the 382 test proteins of a homology of less than 90%, 370 (96.9%) proteins could recognize their native folds. We set up a simple thermodynamic framework in the conformational space of decoys to calculate the unfolded fraction and the specific heat of real proteins. The different thermodynamic stabilities of E.coli ribonuclease H (RNase H) and its mutants were well described in our calculation, agreeing with the experiment.

  9. Progressive multiple sequence alignments from triplets

    Directory of Open Access Journals (Sweden)

    Stadler Peter F

    2007-07-01

    Full Text Available Abstract Background The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors. Research Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the "once a gap, always a gap" problem of progressive alignment procedures. Conclusion The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mismatch scores.

  10. Document Level Assessment of Document Retrieval Systems in a Pairwise System Evaluation

    Science.gov (United States)

    Rajagopal, Prabha; Ravana, Sri Devi

    2017-01-01

    Introduction: The use of averaged topic-level scores can result in the loss of valuable data and can cause misinterpretation of the effectiveness of system performance. This study aims to use the scores of each document to evaluate document retrieval systems in a pairwise system evaluation. Method: The chosen evaluation metrics are document-level…

  11. Exact p-values for pairwise comparison of Friedman rank sums, with application to comparing classifiers

    NARCIS (Netherlands)

    Eisinga, R.N.; Heskes, T.M.; Pelzer, B.J.; Grotenhuis, H.F. te

    2017-01-01

    Background: The Friedman rank sum test is a widely-used nonparametric method in computational biology. In addition to examining the overall null hypothesis of no significant difference among any of the rank sums, it is typically of interest to conduct pairwise comparison tests. Current approaches to

  12. A gradient approximation for calculating Debye temperatures from pairwise interatomic potentials

    International Nuclear Information System (INIS)

    Jackson, D.P.

    1975-09-01

    A simple gradient approximation is given for calculating the effective Debye temperature of a cubic crystal from central pairwise interatomic potentials. For examples of the Morse potential applied to cubic metals the results are in generally good agreement with experiment. (author)

  13. A Comparative Study of Pairwise Learning Methods Based on Kernel Ridge Regression.

    Science.gov (United States)

    Stock, Michiel; Pahikkala, Tapio; Airola, Antti; De Baets, Bernard; Waegeman, Willem

    2018-06-12

    Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction, or network inference problems. During the past decade, kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression, and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency, and spectral filtering properties. Our theoretical results provide valuable insights into assessing the advantages and limitations of existing pairwise learning methods.

  14. On the calculation of x-ray scattering signals from pairwise radial distribution functions

    DEFF Research Database (Denmark)

    Dohn, Asmus Ougaard; Biasin, Elisa; Haldrup, Kristoffer

    2015-01-01

    We derive a formulation for evaluating (time-resolved) x-ray scattering signals of solvated chemical systems, based on pairwise radial distribution functions, with the aim of this formulation to accompany molecular dynamics simulations. The derivation is described in detail to eliminate any possi...

  15. Pairwise comparisons of ten porcine tissues identify differential transcriptional regulation at the gene, isoform, promoter and transcription start site level

    International Nuclear Information System (INIS)

    Farajzadeh, Leila; Hornshøj, Henrik; Momeni, Jamal; Thomsen, Bo; Larsen, Knud; Hedegaard, Jakob; Bendixen, Christian; Madsen, Lone Bruhn

    2013-01-01

    Highlights: •Transcriptome sequencing yielded 223 mill porcine RNA-seq reads, and 59,000 transcribed locations. •Establishment of unique transcription profiles for ten porcine tissues including four brain tissues. •Comparison of transcription profiles at gene, isoform, promoter and transcription start site level. •Highlights a high level of regulation of neuro-related genes at both gene, isoform, and TSS level. •Our results emphasize the pig as a valuable animal model with respect to human biological issues. -- Abstract: The transcriptome is the absolute set of transcripts in a tissue or cell at the time of sampling. In this study RNA-Seq is employed to enable the differential analysis of the transcriptome profile for ten porcine tissues in order to evaluate differences between the tissues at the gene and isoform expression level, together with an analysis of variation in transcription start sites, promoter usage, and splicing. Totally, 223 million RNA fragments were sequenced leading to the identification of 59,930 transcribed gene locations and 290,936 transcript variants using Cufflinks with similarity to approximately 13,899 annotated human genes. Pairwise analysis of tissues for differential expression at the gene level showed that the smallest differences were between tissues originating from the porcine brain. Interestingly, the relative level of differential expression at the isoform level did generally not vary between tissue contrasts. Furthermore, analysis of differential promoter usage between tissues, revealed a proportionally higher variation between cerebellum (CBE) versus frontal cortex and cerebellum versus hypothalamus (HYP) than in the remaining comparisons. In addition, the comparison of differential transcription start sites showed that the number of these sites is generally increased in comparisons including hypothalamus in contrast to other pairwise assessments. A comprehensive analysis of one of the tissue contrasts, i

  16. Classification between normal and tumor tissues based on the pair-wise gene expression ratio

    International Nuclear Information System (INIS)

    Yap, YeeLeng; Zhang, XueWu; Ling, MT; Wang, XiangHong; Wong, YC; Danchin, Antoine

    2004-01-01

    Precise classification of cancer types is critically important for early cancer diagnosis and treatment. Numerous efforts have been made to use gene expression profiles to improve precision of tumor classification. However, reliable cancer-related signals are generally lacking. Using recent datasets on colon and prostate cancer, a data transformation procedure from single gene expression to pair-wise gene expression ratio is proposed. Making use of the internal consistency of each expression profiling dataset this transformation improves the signal to noise ratio of the dataset and uncovers new relevant cancer-related signals (features). The efficiency in using the transformed dataset to perform normal/tumor classification was investigated using feature partitioning with informative features (gene annotation) as discriminating axes (single gene expression or pair-wise gene expression ratio). Classification results were compared to the original datasets for up to 10-feature model classifiers. 82 and 262 genes that have high correlation to tissue phenotype were selected from the colon and prostate datasets respectively. Remarkably, data transformation of the highly noisy expression data successfully led to lower the coefficient of variation (CV) for the within-class samples as well as improved the correlation with tissue phenotypes. The transformed dataset exhibited lower CV when compared to that of single gene expression. In the colon cancer set, the minimum CV decreased from 45.3% to 16.5%. In prostate cancer, comparable CV was achieved with and without transformation. This improvement in CV, coupled with the improved correlation between the pair-wise gene expression ratio and tissue phenotypes, yielded higher classification efficiency, especially with the colon dataset – from 87.1% to 93.5%. Over 90% of the top ten discriminating axes in both datasets showed significant improvement after data transformation. The high classification efficiency achieved suggested

  17. Scalable Bayesian nonparametric measures for exploring pairwise dependence via Dirichlet Process Mixtures.

    Science.gov (United States)

    Filippi, Sarah; Holmes, Chris C; Nieto-Barajas, Luis E

    2016-11-16

    In this article we propose novel Bayesian nonparametric methods using Dirichlet Process Mixture (DPM) models for detecting pairwise dependence between random variables while accounting for uncertainty in the form of the underlying distributions. A key criteria is that the procedures should scale to large data sets. In this regard we find that the formal calculation of the Bayes factor for a dependent-vs.-independent DPM joint probability measure is not feasible computationally. To address this we present Bayesian diagnostic measures for characterising evidence against a "null model" of pairwise independence. In simulation studies, as well as for a real data analysis, we show that our approach provides a useful tool for the exploratory nonparametric Bayesian analysis of large multivariate data sets.

  18. Pairwise correlations via quantum discord and its geometric measure in a four-qubit spin chain

    Directory of Open Access Journals (Sweden)

    Abdel-Baset A. Mohamed

    2013-04-01

    Full Text Available The dynamic of pairwise correlations, including quantum entanglement (QE and discord (QD with geometric measure of quantum discord (GMQD, are shown in the four-qubit Heisenberg XX spin chain. The results show that the effect of the entanglement degree of the initial state on the pairwise correlations is stronger for alternate qubits than it is for nearest-neighbor qubits. This parameter results in sudden death for QE, but it cannot do so for QD and GMQD. With different values for this entanglement parameter of the initial state, QD and GMQD differ and are sensitive for any change in this parameter. It is found that GMQD is more robust than both QD and QE to describe correlations with nonzero values, which offers a valuable resource for quantum computation.

  19. A Relative-Localization Algorithm Using Incomplete Pairwise Distance Measurements for Underwater Applications

    Directory of Open Access Journals (Sweden)

    Kae Y. Foo

    2010-01-01

    Full Text Available The task of localizing underwater assets involves the relative localization of each unit using only pairwise distance measurements, usually obtained from time-of-arrival or time-delay-of-arrival measurements. In the fluctuating underwater environment, a complete set of pair-wise distance measurements can often be difficult to acquire, thus hindering a straightforward closed-form solution in deriving the assets' relative coordinates. An iterative multidimensional scaling approach is presented based upon a weighted-majorization algorithm that tolerates missing or inaccurate distance measurements. Substantial modifications are proposed to optimize the algorithm, while the effects of refractive propagation paths are considered. A parametric study of the algorithm based upon simulation results is shown. An acoustic field-trial was then carried out, presenting field measurements to highlight the practical implementation of this algorithm.

  20. A pragmatic pairwise group-decision method for selection of sites for nuclear power plants

    International Nuclear Information System (INIS)

    Kutbi, I.I.

    1987-01-01

    A pragmatic pairwise group-decision approach is applied to compare two regions in order to select the more suitable one for construction of nulcear power plants in the Kingdom of Saudi Arabia. The selection methodology is based on pairwise comparison by forced choice. The method facilitates rating of the regions or sites using simple calculations. Two regions, one close to Dhahran on the Arabian Gulf and another close to Jeddah on the Red Sea, are evaluated. No specific site in either region is considered at this stage. The comparison is based on a set of selection criteria which include (i) topography, (ii) geology, (iii) seismology, (iv) meteorology, (v) oceanography, (vi) hydrology and (vii) proximetry to oil and gas fields. The comparison shows that the Jeddah region is more suitable than the Dhahran region. (orig.)

  1. Geometric measure of pairwise quantum discord for superpositions of multipartite generalized coherent states

    International Nuclear Information System (INIS)

    Daoud, M.; Ahl Laamara, R.

    2012-01-01

    We give the explicit expressions of the pairwise quantum correlations present in superpositions of multipartite coherent states. A special attention is devoted to the evaluation of the geometric quantum discord. The dynamics of quantum correlations under a dephasing channel is analyzed. A comparison of geometric measure of quantum discord with that of concurrence shows that quantum discord in multipartite coherent states is more resilient to dissipative environments than is quantum entanglement. To illustrate our results, we consider some special superpositions of Weyl–Heisenberg, SU(2) and SU(1,1) coherent states which interpolate between Werner and Greenberger–Horne–Zeilinger states. -- Highlights: ► Pairwise quantum correlations multipartite coherent states. ► Explicit expression of geometric quantum discord. ► Entanglement sudden death and quantum discord robustness. ► Generalized coherent states interpolating between Werner and Greenberger–Horne–Zeilinger states

  2. Geometric measure of pairwise quantum discord for superpositions of multipartite generalized coherent states

    Energy Technology Data Exchange (ETDEWEB)

    Daoud, M., E-mail: m_daoud@hotmail.com [Department of Physics, Faculty of Sciences, University Ibnou Zohr, Agadir (Morocco); Ahl Laamara, R., E-mail: ahllaamara@gmail.com [LPHE-Modeling and Simulation, Faculty of Sciences, University Mohammed V, Rabat (Morocco); Centre of Physics and Mathematics, CPM, CNESTEN, Rabat (Morocco)

    2012-07-16

    We give the explicit expressions of the pairwise quantum correlations present in superpositions of multipartite coherent states. A special attention is devoted to the evaluation of the geometric quantum discord. The dynamics of quantum correlations under a dephasing channel is analyzed. A comparison of geometric measure of quantum discord with that of concurrence shows that quantum discord in multipartite coherent states is more resilient to dissipative environments than is quantum entanglement. To illustrate our results, we consider some special superpositions of Weyl–Heisenberg, SU(2) and SU(1,1) coherent states which interpolate between Werner and Greenberger–Horne–Zeilinger states. -- Highlights: ► Pairwise quantum correlations multipartite coherent states. ► Explicit expression of geometric quantum discord. ► Entanglement sudden death and quantum discord robustness. ► Generalized coherent states interpolating between Werner and Greenberger–Horne–Zeilinger states.

  3. PTM Along Track Algorithm to Maintain Spacing During Same Direction Pair-Wise Trajectory Management Operations

    Science.gov (United States)

    Carreno, Victor A.

    2015-01-01

    Pair-wise Trajectory Management (PTM) is a cockpit based delegated responsibility separation standard. When an air traffic service provider gives a PTM clearance to an aircraft and the flight crew accepts the clearance, the flight crew will maintain spacing and separation from a designated aircraft. A PTM along track algorithm will receive state information from the designated aircraft and from the own ship to produce speed guidance for the flight crew to maintain spacing and separation

  4. Criteria for the singularity of a pairwise l1-distance matrix and their generalizations

    International Nuclear Information System (INIS)

    D'yakonov, Alexander G

    2012-01-01

    We study the singularity problem for the pairwise distance matrix of a system of points, as well as generalizations of this problem that are connected with applications to interpolation theory and with an algebraic approach to recognition problems. We obtain necessary and sufficient conditions on a system under which the dimension of the range space of polynomials of bounded degree over the columns of the distance matrix is less than the number of points in the system.

  5. Criteria for the singularity of a pairwise l{sub 1}-distance matrix and their generalizations

    Energy Technology Data Exchange (ETDEWEB)

    D' yakonov, Alexander G [M. V. Lomonosov Moscow State University, Faculty of Computational Mathematics and Cybernetics, Moscow (Russian Federation)

    2012-06-30

    We study the singularity problem for the pairwise distance matrix of a system of points, as well as generalizations of this problem that are connected with applications to interpolation theory and with an algebraic approach to recognition problems. We obtain necessary and sufficient conditions on a system under which the dimension of the range space of polynomials of bounded degree over the columns of the distance matrix is less than the number of points in the system.

  6. Improving prediction of heterodimeric protein complexes using combination with pairwise kernel.

    Science.gov (United States)

    Ruan, Peiying; Hayashida, Morihiro; Akutsu, Tatsuya; Vert, Jean-Philippe

    2018-02-19

    Since many proteins become functional only after they interact with their partner proteins and form protein complexes, it is essential to identify the sets of proteins that form complexes. Therefore, several computational methods have been proposed to predict complexes from the topology and structure of experimental protein-protein interaction (PPI) network. These methods work well to predict complexes involving at least three proteins, but generally fail at identifying complexes involving only two different proteins, called heterodimeric complexes or heterodimers. There is however an urgent need for efficient methods to predict heterodimers, since the majority of known protein complexes are precisely heterodimers. In this paper, we use three promising kernel functions, Min kernel and two pairwise kernels, which are Metric Learning Pairwise Kernel (MLPK) and Tensor Product Pairwise Kernel (TPPK). We also consider the normalization forms of Min kernel. Then, we combine Min kernel or its normalization form and one of the pairwise kernels by plugging. We applied kernels based on PPI, domain, phylogenetic profile, and subcellular localization properties to predicting heterodimers. Then, we evaluate our method by employing C-Support Vector Classification (C-SVC), carrying out 10-fold cross-validation, and calculating the average F-measures. The results suggest that the combination of normalized-Min-kernel and MLPK leads to the best F-measure and improved the performance of our previous work, which had been the best existing method so far. We propose new methods to predict heterodimers, using a machine learning-based approach. We train a support vector machine (SVM) to discriminate interacting vs non-interacting protein pairs, based on informations extracted from PPI, domain, phylogenetic profiles and subcellular localization. We evaluate in detail new kernel functions to encode these data, and report prediction performance that outperforms the state-of-the-art.

  7. Analysis of Geographic and Pairwise Distances among Chinese Cashmere Goat Populations

    OpenAIRE

    Liu, Jian-Bin; Wang, Fan; Lang, Xia; Zha, Xi; Sun, Xiao-Ping; Yue, Yao-Jing; Feng, Rui-Lin; Yang, Bo-Hui; Guo, Jian

    2013-01-01

    This study investigated the geographic and pairwise distances of nine Chinese local Cashmere goat populations through the analysis of 20 microsatellite DNA markers. Fluorescence PCR was used to identify the markers, which were selected based on their significance as identified by the Food and Agriculture Organization of the United Nations (FAO) and the International Society for Animal Genetics (ISAG). In total, 206 alleles were detected; the average allele number was 10.30; the polymorphism i...

  8. Simultaneous-Fault Diagnosis of Gas Turbine Generator Systems Using a Pairwise-Coupled Probabilistic Classifier

    Directory of Open Access Journals (Sweden)

    Zhixin Yang

    2013-01-01

    Full Text Available A reliable fault diagnostic system for gas turbine generator system (GTGS, which is complicated and inherent with many types of component faults, is essential to avoid the interruption of electricity supply. However, the GTGS diagnosis faces challenges in terms of the existence of simultaneous-fault diagnosis and high cost in acquiring the exponentially increased simultaneous-fault vibration signals for constructing the diagnostic system. This research proposes a new diagnostic framework combining feature extraction, pairwise-coupled probabilistic classifier, and decision threshold optimization. The feature extraction module adopts wavelet packet transform and time-domain statistical features to extract vibration signal features. Kernel principal component analysis is then applied to further reduce the redundant features. The features of single faults in a simultaneous-fault pattern are extracted and then detected using a probabilistic classifier, namely, pairwise-coupled relevance vector machine, which is trained with single-fault patterns only. Therefore, the training dataset of simultaneous-fault patterns is unnecessary. To optimize the decision threshold, this research proposes to use grid search method which can ensure a global solution as compared with traditional computational intelligence techniques. Experimental results show that the proposed framework performs well for both single-fault and simultaneous-fault diagnosis and is superior to the frameworks without feature extraction and pairwise coupling.

  9. Design of Long Period Pseudo-Random Sequences from the Addition of -Sequences over

    Directory of Open Access Journals (Sweden)

    Ren Jian

    2004-01-01

    Full Text Available Pseudo-random sequence with good correlation property and large linear span is widely used in code division multiple access (CDMA communication systems and cryptology for reliable and secure information transmission. In this paper, sequences with long period, large complexity, balance statistics, and low cross-correlation property are constructed from the addition of -sequences with pairwise-prime linear spans (AMPLS. Using -sequences as building blocks, the proposed method proved to be an efficient and flexible approach to construct long period pseudo-random sequences with desirable properties from short period sequences. Applying the proposed method to , a signal set is constructed.

  10. Pairwise and higher-order correlations among drug-resistance mutations in HIV-1 subtype B protease

    Directory of Open Access Journals (Sweden)

    Morozov Alexandre V

    2009-08-01

    Full Text Available Abstract Background The reaction of HIV protease to inhibitor therapy is characterized by the emergence of complex mutational patterns which confer drug resistance. The response of HIV protease to drugs often involves both primary mutations that directly inhibit the action of the drug, and a host of accessory resistance mutations that may occur far from the active site but may contribute to restoring the fitness or stability of the enzyme. Here we develop a probabilistic approach based on connected information that allows us to study residue, pair level and higher-order correlations within the same framework. Results We apply our methodology to a database of approximately 13,000 sequences which have been annotated by the treatment history of the patients from which the samples were obtained. We show that including pair interactions is essential for agreement with the mutational data, since neglect of these interactions results in order-of-magnitude errors in the probabilities of the simultaneous occurence of many mutations. The magnitude of these pair correlations changes dramatically between sequences obtained from patients that were or were not exposed to drugs. Higher-order effects make a contribution of as much as 10% for residues taken three at a time, but increase to more than twice that for 10 to 15-residue groups. The sequence data is insufficient to determine the higher-order effects for larger groups. We find that higher-order interactions have a significant effect on the predicted frequencies of sequences with large numbers of mutations. While relatively rare, such sequences are more prevalent after multi-drug therapy. The relative importance of these higher-order interactions increases with the number of drugs the patient had been exposed to. Conclusion Correlations are critical for the understanding of mutation patterns in HIV protease. Pair interactions have substantial qualitative effects, while higher-order interactions are

  11. Mapping sequences by parts

    Directory of Open Access Journals (Sweden)

    Guziolowski Carito

    2007-09-01

    Full Text Available Abstract Background: We present the N-map method, a pairwise and asymmetrical approach which allows us to compare sequences by taking into account evolutionary events that produce shuffled, reversed or repeated elements. Basically, the optimal N-map of a sequence s over a sequence t is the best way of partitioning the first sequence into N parts and placing them, possibly complementary reversed, over the second sequence in order to maximize the sum of their gapless alignment scores. Results: We introduce an algorithm computing an optimal N-map with time complexity O (|s| × |t| × N using O (|s| × |t| × N memory space. Among all the numbers of parts taken in a reasonable range, we select the value N for which the optimal N-map has the most significant score. To evaluate this significance, we study the empirical distributions of the scores of optimal N-maps and show that they can be approximated by normal distributions with a reasonable accuracy. We test the functionality of the approach over random sequences on which we apply artificial evolutionary events. Practical Application: The method is illustrated with four case studies of pairs of sequences involving non-standard evolutionary events.

  12. The pairwise disconnectivity index as a new metric for the topological analysis of regulatory networks

    Directory of Open Access Journals (Sweden)

    Wingender Edgar

    2008-05-01

    Full Text Available Abstract Background Currently, there is a gap between purely theoretical studies of the topology of large bioregulatory networks and the practical traditions and interests of experimentalists. While the theoretical approaches emphasize the global characterization of regulatory systems, the practical approaches focus on the role of distinct molecules and genes in regulation. To bridge the gap between these opposite approaches, one needs to combine 'general' with 'particular' properties and translate abstract topological features of large systems into testable functional characteristics of individual components. Here, we propose a new topological parameter – the pairwise disconnectivity index of a network's element – that is capable of such bridging. Results The pairwise disconnectivity index quantifies how crucial an individual element is for sustaining the communication ability between connected pairs of vertices in a network that is displayed as a directed graph. Such an element might be a vertex (i.e., molecules, genes, an edge (i.e., reactions, interactions, as well as a group of vertices and/or edges. The index can be viewed as a measure of topological redundancy of regulatory paths which connect different parts of a given network and as a measure of sensitivity (robustness of this network to the presence (absence of each individual element. Accordingly, we introduce the notion of a path-degree of a vertex in terms of its corresponding incoming, outgoing and mediated paths, respectively. The pairwise disconnectivity index has been applied to the analysis of several regulatory networks from various organisms. The importance of an individual vertex or edge for the coherence of the network is determined by the particular position of the given element in the whole network. Conclusion Our approach enables to evaluate the effect of removing each element (i.e., vertex, edge, or their combinations from a network. The greatest potential value of

  13. Bistability, non-ergodicity, and inhibition in pairwise maximum-entropy models.

    Science.gov (United States)

    Rostami, Vahid; Porta Mana, PierGianLuca; Grün, Sonja; Helias, Moritz

    2017-10-01

    Pairwise maximum-entropy models have been used in neuroscience to predict the activity of neuronal populations, given only the time-averaged correlations of the neuron activities. This paper provides evidence that the pairwise model, applied to experimental recordings, would produce a bimodal distribution for the population-averaged activity, and for some population sizes the second mode would peak at high activities, that experimentally would be equivalent to 90% of the neuron population active within time-windows of few milliseconds. Several problems are connected with this bimodality: 1. The presence of the high-activity mode is unrealistic in view of observed neuronal activity and on neurobiological grounds. 2. Boltzmann learning becomes non-ergodic, hence the pairwise maximum-entropy distribution cannot be found: in fact, Boltzmann learning would produce an incorrect distribution; similarly, common variants of mean-field approximations also produce an incorrect distribution. 3. The Glauber dynamics associated with the model is unrealistically bistable and cannot be used to generate realistic surrogate data. This bimodality problem is first demonstrated for an experimental dataset from 159 neurons in the motor cortex of macaque monkey. Evidence is then provided that this problem affects typical neural recordings of population sizes of a couple of hundreds or more neurons. The cause of the bimodality problem is identified as the inability of standard maximum-entropy distributions with a uniform reference measure to model neuronal inhibition. To eliminate this problem a modified maximum-entropy model is presented, which reflects a basic effect of inhibition in the form of a simple but non-uniform reference measure. This model does not lead to unrealistic bimodalities, can be found with Boltzmann learning, and has an associated Glauber dynamics which incorporates a minimal asymmetric inhibition.

  14. Effect of pairwise additivity on finite-temperature behavior of classical ideal gas

    Science.gov (United States)

    Shekaari, Ashkan; Jafari, Mahmoud

    2018-05-01

    Finite-temperature molecular dynamics simulations have been applied to inquire into the effect of pairwise additivity on the behavior of classical ideal gas within the temperature range of T = 250-4000 K via applying a variety of pair potentials and then examining the temperature dependence of a number of thermodynamical properties. Examining the compressibility factor reveals the most deviation from ideal-gas behavior for the Lennard-Jones system mainly due to the presence of both the attractive and repulsive terms. The systems with either attractive or repulsive intermolecular potentials are found to present no resemblance to real gases, but the most similarity to the ideal one as temperature rises.

  15. Computing the Skewness of the Phylogenetic Mean Pairwise Distance in Linear Time

    DEFF Research Database (Denmark)

    Tsirogiannis, Constantinos; Sandel, Brody Steven

    2014-01-01

    The phylogenetic Mean Pairwise Distance (MPD) is one of the most popular measures for computing the phylogenetic distance between a given group of species. More specifically, for a phylogenetic tree and for a set of species R represented by a subset of the leaf nodes of , the MPD of R is equal...... to the average cost of all possible simple paths in that connect pairs of nodes in R. Among other phylogenetic measures, the MPD is used as a tool for deciding if the species of a given group R are closely related. To do this, it is important to compute not only the value of the MPD for this group but also...

  16. Benefits of Using Pairwise Trajectory Management in the Central East Pacific

    Science.gov (United States)

    Chartrand, Ryan; Ballard, Kathryn

    2017-01-01

    Pairwise Trajectory Management (PTM) is a concept that utilizes airborne and ground-based capabilities to enable airborne spacing operations in procedural airspace. This concept makes use of updated ground automation, Automatic Dependent Surveillance-Broadcast (ADS-B) and on board avionics generating real time guidance. An experiment was conducted to examine the potential benefits of implementing PTM in the Central East Pacific oceanic region. An explanation of the experiment and some of the results are included in this paper. The PTM concept allowed for an increase in the average time an aircraft is able to spend at its desired flight level and a reduction in fuel burn.

  17. Atomic pairwise distribution function analysis of the amorphous phase prepared by different manufacturing routes

    DEFF Research Database (Denmark)

    Boetker, Johan P.; Koradia, Vishal; Rades, Thomas

    2012-01-01

    was subjected to quench cooling thereby creating an amorphous form of the drug from both starting materials. The milled and quench cooled samples were, together with the crystalline starting materials, analyzed with X-ray powder diffraction (XRPD), Raman spectroscopy and atomic pair-wise distribution function...... (PDF) analysis of the XRPD pattern. When compared to XRPD and Raman spectroscopy, the PDF analysis was superior in displaying the difference between the amorphous samples prepared by milling and quench cooling approaches of the two starting materials....

  18. Exact method for the simulation of Coulombic systems by spherically truncated, pairwise r-1 summation

    International Nuclear Information System (INIS)

    Wolf, D.; Keblinski, P.; Phillpot, S.R.; Eggebrecht, J.

    1999-01-01

    Based on a recent result showing that the net Coulomb potential in condensed ionic systems is rather short ranged, an exact and physically transparent method permitting the evaluation of the Coulomb potential by direct summation over the r -1 Coulomb pair potential is presented. The key observation is that the problems encountered in determining the Coulomb energy by pairwise, spherically truncated r -1 summation are a direct consequence of the fact that the system summed over is practically never neutral. A simple method is developed that achieves charge neutralization wherever the r -1 pair potential is truncated. This enables the extraction of the Coulomb energy, forces, and stresses from a spherically truncated, usually charged environment in a manner that is independent of the grouping of the pair terms. The close connection of our approach with the Ewald method is demonstrated and exploited, providing an efficient method for the simulation of even highly disordered ionic systems by direct, pairwise r -1 summation with spherical truncation at rather short range, i.e., a method which fully exploits the short-ranged nature of the interactions in ionic systems. The method is validated by simulations of crystals, liquids, and interfacial systems, such as free surfaces and grain boundaries. copyright 1999 American Institute of Physics

  19. Pairwise registration of TLS point clouds using covariance descriptors and a non-cooperative game

    Science.gov (United States)

    Zai, Dawei; Li, Jonathan; Guo, Yulan; Cheng, Ming; Huang, Pengdi; Cao, Xiaofei; Wang, Cheng

    2017-12-01

    It is challenging to automatically register TLS point clouds with noise, outliers and varying overlap. In this paper, we propose a new method for pairwise registration of TLS point clouds. We first generate covariance matrix descriptors with an adaptive neighborhood size from point clouds to find candidate correspondences, we then construct a non-cooperative game to isolate mutual compatible correspondences, which are considered as true positives. The method was tested on three models acquired by two different TLS systems. Experimental results demonstrate that our proposed adaptive covariance (ACOV) descriptor is invariant to rigid transformation and robust to noise and varying resolutions. The average registration errors achieved on three models are 0.46 cm, 0.32 cm and 1.73 cm, respectively. The computational times cost on these models are about 288 s, 184 s and 903 s, respectively. Besides, our registration framework using ACOV descriptors and a game theoretic method is superior to the state-of-the-art methods in terms of both registration error and computational time. The experiment on a large outdoor scene further demonstrates the feasibility and effectiveness of our proposed pairwise registration framework.

  20. Multilevel summation with B-spline interpolation for pairwise interactions in molecular dynamics simulations

    International Nuclear Information System (INIS)

    Hardy, David J.; Schulten, Klaus; Wolff, Matthew A.; Skeel, Robert D.; Xia, Jianlin

    2016-01-01

    The multilevel summation method for calculating electrostatic interactions in molecular dynamics simulations constructs an approximation to a pairwise interaction kernel and its gradient, which can be evaluated at a cost that scales linearly with the number of atoms. The method smoothly splits the kernel into a sum of partial kernels of increasing range and decreasing variability with the longer-range parts interpolated from grids of increasing coarseness. Multilevel summation is especially appropriate in the context of dynamics and minimization, because it can produce continuous gradients. This article explores the use of B-splines to increase the accuracy of the multilevel summation method (for nonperiodic boundaries) without incurring additional computation other than a preprocessing step (whose cost also scales linearly). To obtain accurate results efficiently involves technical difficulties, which are overcome by a novel preprocessing algorithm. Numerical experiments demonstrate that the resulting method offers substantial improvements in accuracy and that its performance is competitive with an implementation of the fast multipole method in general and markedly better for Hamiltonian formulations of molecular dynamics. The improvement is great enough to establish multilevel summation as a serious contender for calculating pairwise interactions in molecular dynamics simulations. In particular, the method appears to be uniquely capable for molecular dynamics in two situations, nonperiodic boundary conditions and massively parallel computation, where the fast Fourier transform employed in the particle–mesh Ewald method falls short.

  1. A water market simulator considering pair-wise trades between agents

    Science.gov (United States)

    Huskova, I.; Erfani, T.; Harou, J. J.

    2012-04-01

    In many basins in England no further water abstraction licences are available. Trading water between water rights holders has been recognized as a potentially effective and economically efficient strategy to mitigate increasing scarcity. A screening tool that could assess the potential for trade through realistic simulation of individual water rights holders would help assess the solution's potential contribution to local water management. We propose an optimisation-driven water market simulator that predicts pair-wise trade in a catchment and represents its interaction with natural hydrology and engineered infrastructure. A model is used to emulate licence-holders' willingness to engage in short-term trade transactions. In their simplest form agents are represented using an economic benefit function. The working hypothesis is that trading behaviour can be partially predicted based on differences in marginal values of water over space and time and estimates of transaction costs on pair-wise trades. We discuss the further possibility of embedding rules, norms and preferences of the different water user sectors to more realistically represent the behaviours, motives and constraints of individual licence holders. The potential benefits and limitations of such a social simulation (agent-based) approach is contrasted with our simulator where agents are driven by economic optimization. A case study based on the Dove River Basin (UK) demonstrates model inputs and outputs. The ability of the model to suggest impacts of water rights policy reforms on trading is discussed.

  2. Extraction of tacit knowledge from large ADME data sets via pairwise analysis.

    Science.gov (United States)

    Keefer, Christopher E; Chang, George; Kauffman, Gregory W

    2011-06-15

    Pharmaceutical companies routinely collect data across multiple projects for common ADME endpoints. Although at the time of collection the data is intended for use in decision making within a specific project, knowledge can be gained by data mining the entire cross-project data set for patterns of structure-activity relationships (SAR) that may be applied to any project. One such data mining method is pairwise analysis. This method has the advantage of being able to identify small structural changes that lead to significant changes in activity. In this paper, we describe the process for full pairwise analysis of our high-throughput ADME assays routinely used for compound discovery efforts at Pfizer (microsomal clearance, passive membrane permeability, P-gp efflux, and lipophilicity). We also describe multiple strategies for the application of these transforms in a prospective manner during compound design. Finally, a detailed analysis of the activity patterns in pairs of compounds that share the same molecular transformation reveals multiple types of transforms from an SAR perspective. These include bioisosteres, additives, multiplicatives, and a type we call switches as they act to either turn on or turn off an activity. Copyright © 2011 Elsevier Ltd. All rights reserved.

  3. Hierarchical ordering with partial pairwise hierarchical relationships on the macaque brain data sets.

    Directory of Open Access Journals (Sweden)

    Woosang Lim

    Full Text Available Hierarchical organizations of information processing in the brain networks have been known to exist and widely studied. To find proper hierarchical structures in the macaque brain, the traditional methods need the entire pairwise hierarchical relationships between cortical areas. In this paper, we present a new method that discovers hierarchical structures of macaque brain networks by using partial information of pairwise hierarchical relationships. Our method uses a graph-based manifold learning to exploit inherent relationship, and computes pseudo distances of hierarchical levels for every pair of cortical areas. Then, we compute hierarchy levels of all cortical areas by minimizing the sum of squared hierarchical distance errors with the hierarchical information of few cortical areas. We evaluate our method on the macaque brain data sets whose true hierarchical levels are known as the FV91 model. The experimental results show that hierarchy levels computed by our method are similar to the FV91 model, and its errors are much smaller than the errors of hierarchical clustering approaches.

  4. A general transformation to canonical form for potentials in pairwise interatomic interactions.

    Science.gov (United States)

    Walton, Jay R; Rivera-Rivera, Luis A; Lucchese, Robert R; Bevan, John W

    2015-06-14

    A generalized formulation of explicit force-based transformations is introduced to investigate the concept of a canonical potential in both fundamental chemical and intermolecular bonding. Different classes of representative ground electronic state pairwise interatomic interactions are referenced to a chosen canonical potential illustrating application of such transformations. Specifically, accurately determined potentials of the diatomic molecules H2, H2(+), HF, LiH, argon dimer, and one-dimensional dissociative coordinates in Ar-HBr, OC-HF, and OC-Cl2 are investigated throughout their bound potentials. Advantages of the current formulation for accurately evaluating equilibrium dissociation energies and a fundamentally different unified perspective on nature of intermolecular interactions will be emphasized. In particular, this canonical approach has significance to previous assertions that there is no very fundamental distinction between van der Waals bonding and covalent bonding or for that matter hydrogen and halogen bonds.

  5. Market Competitiveness Evaluation of Mechanical Equipment with a Pairwise Comparisons Hierarchical Model.

    Science.gov (United States)

    Hou, Fujun

    2016-01-01

    This paper provides a description of how market competitiveness evaluations concerning mechanical equipment can be made in the context of multi-criteria decision environments. It is assumed that, when we are evaluating the market competitiveness, there are limited number of candidates with some required qualifications, and the alternatives will be pairwise compared on a ratio scale. The qualifications are depicted as criteria in hierarchical structure. A hierarchical decision model called PCbHDM was used in this study based on an analysis of its desirable traits. Illustration and comparison shows that the PCbHDM provides a convenient and effective tool for evaluating the market competitiveness of mechanical equipment. The researchers and practitioners might use findings of this paper in application of PCbHDM.

  6. Linear VSS and Distributed Commitments Based on Secret Sharing and Pairwise Checks

    DEFF Research Database (Denmark)

    Fehr, Serge; Maurer, Ueli M.

    2002-01-01

    . VSS and DC are main building blocks for unconditional secure multi-party computation protocols. This general approach covers all known linear VSS and DC schemes. The main theorem states that the security of a scheme is equivalent to a pure linear-algebra condition on the linear mappings (e.......g. described as matrices and vectors) describing the scheme. The security of all known schemes follows as corollaries whose proofs are pure linear-algebra arguments, in contrast to some hybrid arguments used in the literature. Our approach is demonstrated for the CDM DC scheme, which we generalize to be secure......We present a general treatment of all non-cryptographic (i.e., information-theoretically secure) linear veriable-secret-sharing (VSS) and distributed-commitment (DC) schemes, based on an underlying secret sharing scheme, pairwise checks between players, complaints, and accusations of the dealer...

  7. The pairwise phase consistency in cortical network and its relationship with neuronal activation

    Directory of Open Access Journals (Sweden)

    Wang Daming

    2017-01-01

    Full Text Available Gamma-band neuronal oscillation and synchronization with the range of 30-90 Hz are ubiquitous phenomenon across numerous brain areas and various species, and correlated with plenty of cognitive functions. The phase of the oscillation, as one aspect of CTC (Communication through Coherence hypothesis, underlies various functions for feature coding, memory processing and behaviour performing. The PPC (Pairwise Phase Consistency, an improved coherence measure, statistically quantifies the strength of phase synchronization. In order to evaluate the PPC and its relationships with input stimulus, neuronal activation and firing rate, a simplified spiking neuronal network is constructed to simulate orientation columns in primary visual cortex. If the input orientation stimulus is preferred for a certain orientation column, neurons within this corresponding column will obtain higher firing rate and stronger neuronal activation, which consequently engender higher PPC values, with higher PPC corresponding to higher firing rate. In addition, we investigate the PPC in time resolved analysis with a sliding window.

  8. Video-based depression detection using local Curvelet binary patterns in pairwise orthogonal planes.

    Science.gov (United States)

    Pampouchidou, Anastasia; Marias, Kostas; Tsiknakis, Manolis; Simos, Panagiotis; Fan Yang; Lemaitre, Guillaume; Meriaudeau, Fabrice

    2016-08-01

    Depression is an increasingly prevalent mood disorder. This is the reason why the field of computer-based depression assessment has been gaining the attention of the research community during the past couple of years. The present work proposes two algorithms for depression detection, one Frame-based and the second Video-based, both employing Curvelet transform and Local Binary Patterns. The main advantage of these methods is that they have significantly lower computational requirements, as the extracted features are of very low dimensionality. This is achieved by modifying the previously proposed algorithm which considers Three-Orthogonal-Planes, to only Pairwise-Orthogonal-Planes. Performance of the algorithms was tested on the benchmark dataset provided by the Audio/Visual Emotion Challenge 2014, with the person-specific system achieving 97.6% classification accuracy, and the person-independed one yielding promising preliminary results of 74.5% accuracy. The paper concludes with open issues, proposed solutions, and future plans.

  9. Estimators of the Relations of Equivalence, Tolerance and Preference Based on Pairwise Comparisons with Random Errors

    Directory of Open Access Journals (Sweden)

    Leszek Klukowski

    2012-01-01

    Full Text Available This paper presents a review of results of the author in the area of estimation of the relations of equivalence, tolerance and preference within a finite set based on multiple, independent (in a stochastic way pairwise comparisons with random errors, in binary and multivalent forms. These estimators require weaker assumptions than those used in the literature on the subject. Estimates of the relations are obtained based on solutions to problems from discrete optimization. They allow application of both types of comparisons - binary and multivalent (this fact relates to the tolerance and preference relations. The estimates can be verified in a statistical way; in particular, it is possible to verify the type of the relation. The estimates have been applied by the author to problems regarding forecasting, financial engineering and bio-cybernetics. (original abstract

  10. Analysis of Geographic and Pairwise Distances among Chinese Cashmere Goat Populations

    Directory of Open Access Journals (Sweden)

    Jian-Bin Liu

    2013-03-01

    Full Text Available This study investigated the geographic and pairwise distances of nine Chinese local Cashmere goat populations through the analysis of 20 microsatellite DNA markers. Fluorescence PCR was used to identify the markers, which were selected based on their significance as identified by the Food and Agriculture Organization of the United Nations (FAO and the International Society for Animal Genetics (ISAG. In total, 206 alleles were detected; the average allele number was 10.30; the polymorphism information content of loci ranged from 0.5213 to 0.7582; the number of effective alleles ranged from 4.0484 to 4.6178; the observed heterozygosity was from 0.5023 to 0.5602 for the practical sample; the expected heterozygosity ranged from 0.5783 to 0.6464; and Allelic richness ranged from 4.7551 to 8.0693. These results indicated that Chinese Cashmere goat populations exhibited rich genetic diversity. Further, the Wright’s F-statistics of subpopulation within total (FST was 0.1184; the genetic differentiation coefficient (GST was 0.0940; and the average gene flow (Nm was 2.0415. All pairwise FST values among the populations were highly significant (p<0.01 or p<0.001, suggesting that the populations studied should all be considered to be separate breeds. Finally, the clustering analysis divided the Chinese Cashmere goat populations into at least four clusters, with the Hexi and Yashan goat populations alone in one cluster. These results have provided useful, practical, and important information for the future of Chinese Cashmere goat breeding.

  11. Gene ontology analysis of pairwise genetic associations in two genome-wide studies of sporadic ALS

    Directory of Open Access Journals (Sweden)

    Kim Nora

    2012-07-01

    Full Text Available Abstract Background It is increasingly clear that common human diseases have a complex genetic architecture characterized by both additive and nonadditive genetic effects. The goal of the present study was to determine whether patterns of both additive and nonadditive genetic associations aggregate in specific functional groups as defined by the Gene Ontology (GO. Results We first estimated all pairwise additive and nonadditive genetic effects using the multifactor dimensionality reduction (MDR method that makes few assumptions about the underlying genetic model. Statistical significance was evaluated using permutation testing in two genome-wide association studies of ALS. The detection data consisted of 276 subjects with ALS and 271 healthy controls while the replication data consisted of 221 subjects with ALS and 211 healthy controls. Both studies included genotypes from approximately 550,000 single-nucleotide polymorphisms (SNPs. Each SNP was mapped to a gene if it was within 500 kb of the start or end. Each SNP was assigned a p-value based on its strongest joint effect with the other SNPs. We then used the Exploratory Visual Analysis (EVA method and software to assign a p-value to each gene based on the overabundance of significant SNPs at the α = 0.05 level in the gene. We also used EVA to assign p-values to each GO group based on the overabundance of significant genes at the α = 0.05 level. A GO category was determined to replicate if that category was significant at the α = 0.05 level in both studies. We found two GO categories that replicated in both studies. The first, ‘Regulation of Cellular Component Organization and Biogenesis’, a GO Biological Process, had p-values of 0.010 and 0.014 in the detection and replication studies, respectively. The second, ‘Actin Cytoskeleton’, a GO Cellular Component, had p-values of 0.040 and 0.046 in the detection and replication studies, respectively. Conclusions Pathway

  12. Classification of forest-based ecotourism areas in Pocahontas County of West Virginia using GIS and pairwise comparison method

    Science.gov (United States)

    Ishwar Dhami; Jinyang. Deng

    2012-01-01

    Many previous studies have examined ecotourism primarily from the perspective of tourists while largely ignoring ecotourism destinations. This study used geographical information system (GIS) and pairwise comparison to identify forest-based ecotourism areas in Pocahontas County, West Virginia. The study adopted the criteria and scores developed by Boyd and Butler (1994...

  13. Pair-Wise Trajectory Management-Oceanic (PTM-O) . [Concept of Operations—Version 3.9

    Science.gov (United States)

    Jones, Kenneth M.

    2014-01-01

    This document describes the Pair-wise Trajectory Management-Oceanic (PTM-O) Concept of Operations (ConOps). Pair-wise Trajectory Management (PTM) is a concept that includes airborne and ground-based capabilities designed to enable and to benefit from, airborne pair-wise distance-monitoring capability. PTM includes the capabilities needed for the controller to issue a PTM clearance that resolves a conflict for a specific pair of aircraft. PTM avionics include the capabilities needed for the flight crew to manage their trajectory relative to specific designated aircraft. Pair-wise Trajectory Management PTM-Oceanic (PTM-O) is a regional specific application of the PTM concept. PTM is sponsored by the National Aeronautics and Space Administration (NASA) Concept and Technology Development Project (part of NASA's Airspace Systems Program). The goal of PTM is to use enhanced and distributed communications and surveillance along with airborne tools to permit reduced separation standards for given aircraft pairs, thereby increasing the capacity and efficiency of aircraft operations at a given altitude or volume of airspace.

  14. The structure of pairwise correlation in mouse primary visual cortex reveals functional organization in the absence of an orientation map.

    Science.gov (United States)

    Denman, Daniel J; Contreras, Diego

    2014-10-01

    Neural responses to sensory stimuli are not independent. Pairwise correlation can reduce coding efficiency, occur independent of stimulus representation, or serve as an additional channel of information, depending on the timescale of correlation and the method of decoding. Any role for correlation depends on its magnitude and structure. In sensory areas with maps, like the orientation map in primary visual cortex (V1), correlation is strongly related to the underlying functional architecture, but it is unclear whether this correlation structure is an essential feature of the system or arises from the arrangement of cells in the map. We assessed the relationship between functional architecture and pairwise correlation by measuring both synchrony and correlated spike count variability in mouse V1, which lacks an orientation map. We observed significant pairwise synchrony, which was organized by distance and relative orientation preference between cells. We also observed nonzero correlated variability in both the anesthetized (0.16) and awake states (0.18). Our results indicate that the structure of pairwise correlation is maintained in the absence of an underlying anatomical organization and may be an organizing principle of the mammalian visual system preserved by nonrandom connectivity within local networks. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. Theory of pairwise coupling embedded in more general local dispersion relations

    International Nuclear Information System (INIS)

    Fuchs, V.; Bers, A.; Harten, L.

    1985-01-01

    Earlier work on the mode conversion theory by Fuchs, Ko, and Bers is detailed and expanded upon, and its relation to energy conservation is discussed. Given a local dispersion relation, D(ω; k, z) = 0, describing stable waves excited at an externally imposed frequency ω, a pairwise mode-coupling event embedded therein is extracted by expanding D(k, z) around a contour k = k/sub c/(z) given by partialD/partialk = 0. The branch points of D(k, z) = 0 are the turning points of a second-order differential-equation representation. In obtaining the fraction of mode-converted energy, the connection formula and conservation of energy must be used together. Also, proper attention must be given to distinguish cases for which the coupling disappears or persists upon confluence of the branches, a property which is shown to depend on the forward (v/sub g/v/sub ph/>0) or backward (v/sub g/v/sub ph/<0) nature of the waves. Examples occurring in ion-cyclotron and lower-hybrid heating are presented, illustrating the use of the theory

  16. pyRMSD: a Python package for efficient pairwise RMSD matrix calculation and handling.

    Science.gov (United States)

    Gil, Víctor A; Guallar, Víctor

    2013-09-15

    We introduce pyRMSD, an open source standalone Python package that aims at offering an integrative and efficient way of performing Root Mean Square Deviation (RMSD)-related calculations of large sets of structures. It is specially tuned to do fast collective RMSD calculations, as pairwise RMSD matrices, implementing up to three well-known superposition algorithms. pyRMSD provides its own symmetric distance matrix class that, besides the fact that it can be used as a regular matrix, helps to save memory and increases memory access speed. This last feature can dramatically improve the overall performance of any Python algorithm using it. In addition, its extensibility, testing suites and documentation make it a good choice to those in need of a workbench for developing or testing new algorithms. The source code (under MIT license), installer, test suites and benchmarks can be found at https://pele.bsc.es/ under the tools section. victor.guallar@bsc.es Supplementary data are available at Bioinformatics online.

  17. A pairwise residue contact area-based mean force potential for discrimination of native protein structure

    Directory of Open Access Journals (Sweden)

    Pezeshk Hamid

    2010-01-01

    Full Text Available Abstract Background Considering energy function to detect a correct protein fold from incorrect ones is very important for protein structure prediction and protein folding. Knowledge-based mean force potentials are certainly the most popular type of interaction function for protein threading. They are derived from statistical analyses of interacting groups in experimentally determined protein structures. These potentials are developed at the atom or the amino acid level. Based on orientation dependent contact area, a new type of knowledge-based mean force potential has been developed. Results We developed a new approach to calculate a knowledge-based potential of mean-force, using pairwise residue contact area. To test the performance of our approach, we performed it on several decoy sets to measure its ability to discriminate native structure from decoys. This potential has been able to distinguish native structures from the decoys in the most cases. Further, the calculated Z-scores were quite high for all protein datasets. Conclusions This knowledge-based potential of mean force can be used in protein structure prediction, fold recognition, comparative modelling and molecular recognition. The program is available at http://www.bioinf.cs.ipm.ac.ir/softwares/surfield

  18. Optimal Inconsistency Repairing of Pairwise Comparison Matrices Using Integrated Linear Programming and Eigenvector Methods

    Directory of Open Access Journals (Sweden)

    Haiqing Zhang

    2014-01-01

    Full Text Available Satisfying consistency requirements of pairwise comparison matrix (PCM is a critical step in decision making methodologies. An algorithm has been proposed to find a new modified consistent PCM in which it can replace the original inconsistent PCM in analytic hierarchy process (AHP or in fuzzy AHP. This paper defines the modified consistent PCM by the original inconsistent PCM and an adjustable consistent PCM combined. The algorithm adopts a segment tree to gradually approach the greatest lower bound of the distance with the original PCM to obtain the middle value of an adjustable PCM. It also proposes a theorem to obtain the lower value and the upper value of an adjustable PCM based on two constraints. The experiments for crisp elements show that the proposed approach can preserve more of the original information than previous works of the same consistent value. The convergence rate of our algorithm is significantly faster than previous works with respect to different parameters. The experiments for fuzzy elements show that our method could obtain suitable modified fuzzy PCMs.

  19. Evaluation of advanced multiplex short tandem repeat systems in pairwise kinship analysis.

    Science.gov (United States)

    Tamura, Tomonori; Osawa, Motoki; Ochiai, Eriko; Suzuki, Takanori; Nakamura, Takashi

    2015-09-01

    The AmpFLSTR Identifiler Kit, comprising 15 autosomal short tandem repeat (STR) loci, is commonly employed in forensic practice for calculating match probabilities and parentage testing. The conventional system exhibits insufficient estimation for kinship analysis such as sibship testing because of shortness of examined loci. This study evaluated the power of the PowerPlex Fusion System, GlobalFiler Kit, and PowerPlex 21 System, which comprise more than 20 autosomal STR loci, to estimate pairwise blood relatedness (i.e., parent-child, full siblings, second-degree relatives, and first cousins). The genotypes of all 24 STR loci in 10,000 putative pedigrees were constructed by simulation. The likelihood ratio for each locus was calculated from joint probabilities for relatives and non-relatives. The combined likelihood ratio was calculated according to the product rule. The addition of STR loci improved separation between relatives and non-relatives. However, these systems were less effectively extended to the inference for first cousins. In conclusion, these advanced systems will be useful in forensic personal identification, especially in the evaluation of full siblings and second-degree relatives. Moreover, the additional loci may give rise to two major issues of more frequent mutational events and several pairs of linked loci on the same chromosome. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  20. The Dynamics of Multiple Pair-Wise Collisions in a Chain for Designing Optimal Shock Amplifiers

    Directory of Open Access Journals (Sweden)

    Bryan Rodgers

    2009-01-01

    Full Text Available The major focus of this work is to examine the dynamics of velocity amplification through pair-wise collisions between multiple masses in a chain, in order to develop useful machines. For instance low-cost machines based on this principle could be used for detailed, very-high acceleration shock-testing of MEMS devices. A theoretical basis for determining the number and mass of intermediate stages in such a velocity amplifier, based on simple rigid body mechanics, is proposed. The influence of mass ratios and the coefficient of restitution on the optimisation of the system is identified and investigated. In particular, two cases are examined: in the first, the velocity of the final mass in the chain (that would have the object under test mounted on it is maximised by defining the ratio of adjacent masses according to a power law relationship; in the second, the energy transfer efficiency of the system is maximised by choosing the mass ratios such that all masses except the final mass come to rest following impact. Comparisons are drawn between both cases and the results are used in proposing design guidelines for optimal shock amplifiers. It is shown that for most practical systems, a shock amplifier with mass ratios based on a power law relationship is optimal and can easily yield velocity amplifications of a factor 5–8 times. A prototype shock testing machine that was made using above principles is briefly introduced.

  1. Visualization of pairwise and multilocus linkage disequilibrium structure using latent forests.

    Directory of Open Access Journals (Sweden)

    Raphaël Mourad

    Full Text Available Linkage disequilibrium study represents a major issue in statistical genetics as it plays a fundamental role in gene mapping and helps us to learn more about human history. The linkage disequilibrium complex structure makes its exploratory data analysis essential yet challenging. Visualization methods, such as the triangular heat map implemented in Haploview, provide simple and useful tools to help understand complex genetic patterns, but remain insufficient to fully describe them. Probabilistic graphical models have been widely recognized as a powerful formalism allowing a concise and accurate modeling of dependences between variables. In this paper, we propose a method for short-range, long-range and chromosome-wide linkage disequilibrium visualization using forests of hierarchical latent class models. Thanks to its hierarchical nature, our method is shown to provide a compact view of both pairwise and multilocus linkage disequilibrium spatial structures for the geneticist. Besides, a multilocus linkage disequilibrium measure has been designed to evaluate linkage disequilibrium in hierarchy clusters. To learn the proposed model, a new scalable algorithm is presented. It constrains the dependence scope, relying on physical positions, and is able to deal with more than one hundred thousand single nucleotide polymorphisms. The proposed algorithm is fast and does not require phase genotypic data.

  2. Benefits of Using Pairwise Trajectory Management in the Central East Pacific

    Science.gov (United States)

    Chartrand, Ryan; Ballard, Kathryn

    2016-01-01

    Pairwise Trajectory Management (PTM) is a concept that utilizes airborne and ground-based capabilities to enable airborne spacing operations in oceanic regions. The goal of PTM is to use enhanced surveillance, along with airborne tools, to manage the spacing between aircraft. Due to the enhanced airborne surveillance of Automatic Dependent Surveillance-Broadcast (ADS-B) information and reduced communication, the PTM minimum spacing distance will be less than distances currently required of an air traffic controller. Reduced minimum distance will increase the capacity of aircraft operations at a given altitude or volume of airspace, thereby increasing time on desired trajectory and overall flight efficiency. PTM is designed to allow a flight crew to resolve a specific traffic conflict (or conflicts), identified by the air traffic controller, while maintaining the flight crew's desired altitude. The air traffic controller issues a PTM clearance to a flight crew authorized to conduct PTM operations in order to resolve a conflict for the pair (or pairs) of aircraft (i.e., the PTM aircraft and a designated target aircraft). This clearance requires the flight crew of the PTM aircraft to use their ADS-B-enabled onboard equipment to manage their spacing relative to the designated target aircraft to ensure spacing distances that are no closer than the PTM minimum distance. When the air traffic controller determines that PTM is no longer required, the controller issues a clearance to cancel the PTM operation.

  3. Prediction of microsleeps using pairwise joint entropy and mutual information between EEG channels.

    Science.gov (United States)

    Baseer, Abdul; Weddell, Stephen J; Jones, Richard D

    2017-07-01

    Microsleeps are involuntary and brief instances of complete loss of responsiveness, typically of 0.5-15 s duration. They adversely affect performance in extended attention-driven jobs and can be fatal. Our aim was to predict microsleeps from 16 channel EEG signals. Two information theoretic concepts - pairwise joint entropy and mutual information - were independently used to continuously extract features from EEG signals. k-nearest neighbor (kNN) with k = 3 was used to calculate both joint entropy and mutual information. Highly correlated features were discarded and the rest were ranked using Fisher score followed by an average of 3-fold cross-validation area under the curve of the receiver operating characteristic (AUC ROC ). Leave-one-out method (LOOM) was performed to test the performance of microsleep prediction system on independent data. The best prediction for 0.25 s ahead was AUCROC, sensitivity, precision, geometric mean (GM), and φ of 0.93, 0.68, 0.33, 0.75, and 0.38 respectively with joint entropy using single linear discriminant analysis (LDA) classifier.

  4. AlignMe—a membrane protein sequence alignment web server

    Science.gov (United States)

    Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.

    2014-01-01

    We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425

  5. Pharmacological treatments in asthma-affected horses: A pair-wise and network meta-analysis.

    Science.gov (United States)

    Calzetta, L; Roncada, P; di Cave, D; Bonizzi, L; Urbani, A; Pistocchini, E; Rogliani, P; Matera, M G

    2017-11-01

    Equine asthma is a disease characterised by reversible airflow obstruction, bronchial hyper-responsiveness and airway inflammation following exposure of susceptible horses to specific airborne agents. Although clinical remission can be achieved in a low-airborne dust environment, repeated exacerbations may lead to irreversible airway remodelling. The available data on the pharmacotherapy of equine asthma result from several small studies, and no head-to-head clinical trials have been conducted among the available medications. To assess the impact of the pharmacological interventions in equine asthma and compare the effect of different classes of drugs on lung function. Pair-wise and network meta-analysis. Literature searches for clinical trials on the pharmacotherapy of equine asthma were performed. The risk of publication bias was assessed by funnel plots and Egger's test. Changes in maximum transpulmonary or pleural pressure, pulmonary resistance and dynamic lung compliance vs. control were analysed via random-effects models and Bayesian networks. The results obtained from 319 equine asthma-affected horses were extracted from 32 studies. Bronchodilators, corticosteroids and chromones improved maximum transpulmonary or pleural pressure (range: -8.0 to -21.4 cmH 2 O; Ptherapies. Long-term treatments were more effective than short-term treatments. Weak publication bias was detected. This study demonstrates that long-term treatments with inhaled corticosteroids and long-acting β 2 -AR agonists may represent the first choice for treating equine asthma. Further high quality clinical trials are needed to clarify whether inhaled bronchodilators should be preferred to inhaled corticosteroids or vice versa, and to investigate the potential superiority of combination therapy in equine asthma. © 2017 EVJ Ltd.

  6. Further investigations of the W-test for pairwise epistasis testing [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Richard Howey

    2017-07-01

    Full Text Available Background: In a recent paper, a novel W-test for pairwise epistasis testing was proposed that appeared, in computer simulations, to have higher power than competing alternatives. Application to genome-wide bipolar data detected significant epistasis between SNPs in genes of relevant biological function. Network analysis indicated that the implicated genes formed two separate interaction networks, each containing genes highly related to autism and neurodegenerative disorders. Methods: Here we investigate further the properties and performance of the W-test via theoretical evaluation, computer simulations and application to real data. Results: We demonstrate that, for common variants, the W-test is closely related to several existing tests of association allowing for interaction, including logistic regression on 8 degrees of freedom, although logistic regression can show inflated type I error for low minor allele frequencies,  whereas the W-test shows good/conservative type I error control. Although in some situations the W-test can show higher power, logistic regression is not limited to tests on 8 degrees of freedom but can instead be taylored to impose greater structure on the assumed alternative hypothesis, offering a power advantage when the imposed structure matches the true structure. Conclusions: The W-test is a potentially useful method for testing for association - without necessarily implying interaction - between genetic variants disease, particularly when one or more of the genetic variants are rare. For common variants, the advantages of the W-test are less clear, and, indeed, there are situations where existing methods perform better. In our investigations, we further uncover a number of problems with the practical implementation and application of the W-test (to bipolar disorder previously described, apparently due to inadequate use of standard data quality-control procedures. This observation leads us to urge caution in

  7. Heisenberg coupling constant predicted for molecular magnets with pairwise spin-contamination correction

    Energy Technology Data Exchange (ETDEWEB)

    Masunov, Artëm E., E-mail: amasunov@ucf.edu [NanoScience Technology Center, Department of Chemistry, and Department of Physics, University of Central Florida, Orlando, FL 32826 (United States); Photochemistry Center RAS, ul. Novatorov 7a, Moscow 119421 (Russian Federation); Gangopadhyay, Shruba [Department of Physics, University of California, Davis, CA 95616 (United States); IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120 (United States)

    2015-12-15

    New method to eliminate the spin-contamination in broken symmetry density functional theory (BS DFT) calculations is introduced. Unlike conventional spin-purification correction, this method is based on canonical Natural Orbitals (NO) for each high/low spin coupled electron pair. We derive an expression to extract the energy of the pure singlet state given in terms of energy of BS DFT solution, the occupation number of the bonding NO, and the energy of the higher spin state built on these bonding and antibonding NOs (not self-consistent Kohn–Sham orbitals of the high spin state). Compared to the other spin-contamination correction schemes, spin-correction is applied to each correlated electron pair individually. We investigate two binuclear Mn(IV) molecular magnets using this pairwise correction. While one of the molecules is described by magnetic orbitals strongly localized on the metal centers, and spin gap is accurately predicted by Noodleman and Yamaguchi schemes, for the other one the gap is predicted poorly by these schemes due to strong delocalization of the magnetic orbitals onto the ligands. We show our new correction to yield more accurate results in both cases. - Highlights: • Magnetic orbitails obtained for high and low spin states are not related. • Spin-purification correction becomes inaccurate for delocalized magnetic orbitals. • We use the natural orbitals of the broken symmetry state to build high spin state. • This new correction is made separately for each electron pair. • Our spin-purification correction is more accurate for delocalised magnetic orbitals.

  8. A discrete model of Ostwald ripening based on multiple pairwise interactions

    Science.gov (United States)

    Di Nunzio, Paolo Emilio

    2018-06-01

    A discrete multi-particle model of Ostwald ripening based on direct pairwise interactions is developed for particles with incoherent interfaces as an alternative to the classical LSW mean field theory. The rate of matter exchange depends on the average surface-to-surface interparticle distance, a characteristic feature of the system which naturally incorporates the effect of volume fraction of second phase. The multi-particle diffusion is described through the definition of an interaction volume containing all the particles involved in the exchange of solute. At small volume fractions this is proportional to the size of the central particle, at higher volume fractions it gradually reduces as a consequence of diffusion screening described on a geometrical basis. The topological noise present in real systems is also included. For volume fractions below about 0.1 the model predicts broad and right-skewed stationary size distributions resembling a lognormal function. Above this value, a transition to sharper, more symmetrical but still right-skewed shapes occurs. An excellent agreement with experiments is obtained for 3D particle size distributions of solid-solid and solid-liquid systems with volume fraction 0.07, 0.30, 0.52 and 0.74. The kinetic constant of the model depends on the cube root of volume fraction up to about 0.1, then increases rapidly with an upward concavity. It is in good agreement with the available literature data on solid-liquid mixtures in the volume fraction range from 0.20 to about 0.75.

  9. Pairwise additivity in the nuclear magnetic resonance interactions of atomic xenon.

    Science.gov (United States)

    Hanni, Matti; Lantto, Perttu; Vaara, Juha

    2009-04-14

    Nuclear magnetic resonance (NMR) of atomic (129/131)Xe is used as a versatile probe of the structure and dynamics of various host materials, due to the sensitivity of the Xe NMR parameters to intermolecular interactions. The principles governing this sensitivity can be investigated using the prototypic system of interacting Xe atoms. In the pairwise additive approximation (PAA), the binary NMR chemical shift, nuclear quadrupole coupling (NQC), and spin-rotation (SR) curves for the xenon dimer are utilized for fast and efficient evaluation of the corresponding NMR tensors in small xenon clusters Xe(n) (n = 2-12). If accurate, the preparametrized PAA enables the analysis of the NMR properties of xenon clusters, condensed xenon phases, and xenon gas without having to resort to electronic structure calculations of instantaneous configurations for n > 2. The binary parameters for Xe(2) at different internuclear distances were obtained at the nonrelativistic Hartree-Fock level of theory. Quantum-chemical (QC) calculations at the corresponding level were used to obtain the NMR parameters of the Xe(n) (n = 2-12) clusters at the equilibrium geometries. Comparison of PAA and QC data indicates that the direct use of the binary property curves of Xe(2) can be expected to be well-suited for the analysis of Xe NMR in the gaseous phase dominated by binary collisions. For use in condensed phases where many-body effects should be considered, effective binary property functions were fitted using the principal components of QC tensors from Xe(n) clusters. Particularly, the chemical shift in Xe(n) is strikingly well-described by the effective PAA. The coordination number Z of the Xe site is found to be the most important factor determining the chemical shift, with the largest shifts being found for high-symmetry sites with the largest Z. This is rationalized in terms of the density of virtual electronic states available for response to magnetic perturbations.

  10. Pair-Wise and Many-Body Dispersive Interactions Coupled to an Optimally Tuned Range-Separated Hybrid Functional.

    Science.gov (United States)

    Agrawal, Piyush; Tkatchenko, Alexandre; Kronik, Leeor

    2013-08-13

    We propose a nonempirical, pair-wise or many-body dispersion-corrected, optimally tuned range-separated hybrid functional. This functional retains the advantages of the optimal-tuning approach in the prediction of the electronic structure. At the same time, it gains accuracy in the prediction of binding energies for dispersively bound systems, as demonstrated on the S22 and S66 benchmark sets of weakly bound dimers.

  11. Detection of the pairwise kinematic Sunyaev-Zel'dovich effect with BOSS DR11 and the Atacama Cosmology Telescope

    Energy Technology Data Exchange (ETDEWEB)

    Bernardis, F. De; Vavagiakis, E.M.; Niemack, M.D.; Gallardo, P.A. [Department of Physics, Cornell University, Ithaca, NY 14853 (United States); Aiola, S. [Department of Physics and Astronomy, University of Pittsburgh, and Pittsburgh Particle Physics, Astrophysics, and Cosmology Center, 3941 O' Hara Street, Pittsburgh, PA 15260 (United States); Battaglia, N. [Department of Astrophysical Sciences, Peyton Hall, Princeton University, Princeton, NJ 08544 (United States); Beall, J.; Becker, D.T.; Cho, H.; Fox, A. [National Institute of Standards and Technology, Boulder, CO 80305 (United States); Bond, J.R. [CITA, University of Toronto, 60 St. George St., Toronto, ON M5S 3H8 (Canada); Calabrese, E.; Dunkley, J. [Sub-Department of Astrophysics, University of Oxford, Keble Road, Oxford, OX1 3RH (United Kingdom); Coughlin, K.; Datta, R. [Department of Physics, University of Michigan Ann Arbor, MI 48109 (United States); Devlin, M. [Department of Physics and Astronomy, University of Pennsylvania, 209 South 33rd Street, Philadelphia, PA 19104 (United States); Dunner, R. [Instituto de Astrofísica and Centro de Astro-Ingeniería, Facultad de Física, Pontificia Universidad Católica de Chile, Av. Vicuña Mackenna 4860, 7820436 Macul, Santiago (Chile); Ferraro, S. [Miller Institute for Basic Research in Science, University of California, Berkeley, CA 94720 (United States); Halpern, M. [University of British Columbia, Department of Physics and Astronomy, 6224 Agricultural Road, Vancouver BC V6T 1Z1 (Canada); Hand, N., E-mail: fdeberna@gmail.com [Astronomy Department, University of California, Berkeley, CA 94720 (United States); and others

    2017-03-01

    We present a new measurement of the kinematic Sunyaev-Zel'dovich effect using data from the Atacama Cosmology Telescope (ACT) and the Baryon Oscillation Spectroscopic Survey (BOSS). Using 600 square degrees of overlapping sky area, we evaluate the mean pairwise baryon momentum associated with the positions of 50,000 bright galaxies in the BOSS DR11 Large Scale Structure catalog. A non-zero signal arises from the large-scale motions of halos containing the sample galaxies. The data fits an analytical signal model well, with the optical depth to microwave photon scattering as a free parameter determining the overall signal amplitude. We estimate the covariance matrix of the mean pairwise momentum as a function of galaxy separation, using microwave sky simulations, jackknife evaluation, and bootstrap estimates. The most conservative simulation-based errors give signal-to-noise estimates between 3.6 and 4.1 for varying galaxy luminosity cuts. We discuss how the other error determinations can lead to higher signal-to-noise values, and consider the impact of several possible systematic errors. Estimates of the optical depth from the average thermal Sunyaev-Zel'dovich signal at the sample galaxy positions are broadly consistent with those obtained from the mean pairwise momentum signal.

  12. Probing dark energy models with extreme pairwise velocities of galaxy clusters from the DEUS-FUR simulations

    Science.gov (United States)

    Bouillot, Vincent R.; Alimi, Jean-Michel; Corasaniti, Pier-Stefano; Rasera, Yann

    2015-06-01

    Observations of colliding galaxy clusters with high relative velocity probe the tail of the halo pairwise velocity distribution with the potential of providing a powerful test of cosmology. As an example it has been argued that the discovery of the Bullet Cluster challenges standard Λ cold dark matter (ΛCDM) model predictions. Halo catalogues from N-body simulations have been used to estimate the probability of Bullet-like clusters. However, due to simulation volume effects previous studies had to rely on a Gaussian extrapolation of the pairwise velocity distribution to high velocities. Here, we perform a detail analysis using the halo catalogues from the Dark Energy Universe Simulation Full Universe Runs (DEUS-FUR), which enables us to resolve the high-velocity tail of the distribution and study its dependence on the halo mass definition, redshift and cosmology. Building upon these results, we estimate the probability of Bullet-like systems in the framework of Extreme Value Statistics. We show that the tail of extreme pairwise velocities significantly deviates from that of a Gaussian, moreover it carries an imprint of the underlying cosmology. We find the Bullet Cluster probability to be two orders of magnitude larger than previous estimates, thus easing the tension with the ΛCDM model. Finally, the comparison of the inferred probabilities for the different DEUS-FUR cosmologies suggests that observations of extreme interacting clusters can provide constraints on dark energy models complementary to standard cosmological tests.

  13. Historical demography of common carp estimated from individuals collected from various parts of the world using the pairwise sequentially markovian coalescent approach.

    Science.gov (United States)

    Yuan, Zihao; Huang, Wei; Liu, Shikai; Xu, Peng; Dunham, Rex; Liu, Zhanjiang

    2018-04-01

    The inference of historical demography of a species is helpful for understanding species' differentiation and its population dynamics. However, such inference has been previously difficult due to the lack of proper analytical methods and availability of genetic data. A recently developed method called Pairwise Sequentially Markovian Coalescent (PSMC) offers the capability for estimation of the trajectories of historical populations over considerable time periods using genomic sequences. In this study, we applied this approach to infer the historical demography of the common carp using samples collected from Europe, Asia and the Americas. Comparison between Asian and European common carp populations showed that the last glacial period starting 100 ka BP likely caused a significant decline in population size of the wild common carp in Europe, while it did not have much of an impact on its counterparts in Asia. This was probably caused by differences in glacial activities in East Asia and Europe, and suggesting a separation of the European and Asian clades before the last glacial maximum. The North American clade which is an invasive population shared a similar demographic history as those from Europe, consistent with the idea that the North American common carp probably had European ancestral origins. Our analysis represents the first reconstruction of the historical population demography of the common carp, which is important to elucidate the separation of European and Asian common carp clades during the Quaternary glaciation, as well as the dispersal of common carp across the world.

  14. High-throughput detection of prostate cancer in histological sections using probabilistic pairwise Markov models.

    Science.gov (United States)

    Monaco, James P; Tomaszewski, John E; Feldman, Michael D; Hagemann, Ian; Moradi, Mehdi; Mousavi, Parvin; Boag, Alexander; Davidson, Chris; Abolmaesumi, Purang; Madabhushi, Anant

    2010-08-01

    In this paper we present a high-throughput system for detecting regions of carcinoma of the prostate (CaP) in HSs from radical prostatectomies (RPs) using probabilistic pairwise Markov models (PPMMs), a novel type of Markov random field (MRF). At diagnostic resolution a digitized HS can contain 80Kx70K pixels - far too many for current automated Gleason grading algorithms to process. However, grading can be separated into two distinct steps: (1) detecting cancerous regions and (2) then grading these regions. The detection step does not require diagnostic resolution and can be performed much more quickly. Thus, we introduce a CaP detection system capable of analyzing an entire digitized whole-mount HS (2x1.75cm(2)) in under three minutes (on a desktop computer) while achieving a CaP detection sensitivity and specificity of 0.87 and 0.90, respectively. We obtain this high-throughput by tailoring the system to analyze the HSs at low resolution (8microm per pixel). This motivates the following algorithm: (Step 1) glands are segmented, (Step 2) the segmented glands are classified as malignant or benign, and (Step 3) the malignant glands are consolidated into continuous regions. The classification of individual glands leverages two features: gland size and the tendency for proximate glands to share the same class. The latter feature describes a spatial dependency which we model using a Markov prior. Typically, Markov priors are expressed as the product of potential functions. Unfortunately, potential functions are mathematical abstractions, and constructing priors through their selection becomes an ad hoc procedure, resulting in simplistic models such as the Potts. Addressing this problem, we introduce PPMMs which formulate priors in terms of probability density functions, allowing the creation of more sophisticated models. To demonstrate the efficacy of our CaP detection system and assess the advantages of using a PPMM prior instead of the Potts, we alternately

  15. Pairwise protein expression classifier for candidate biomarker discovery for early detection of human disease prognosis

    Directory of Open Access Journals (Sweden)

    Kaur Parminder

    2012-08-01

    spectrometry data from “bottom up” proteomics methods, functionally related proteins/peptide pairs exhibiting co-ordinated changes expression profile are discovered, which represent a signature for patients progressing to various disease conditions. The method has been tested against clinical data from patients progressing to idiopthatic pneumonia syndrome (IPS following a bone marrow transplant. The data indicates that patients with improper regulation in the concentration of specific acute phase response proteins at the time of bone marrow transplant are highly likely to develop IPS within few weeks. The results lead to a specific set of protein pairs that can be efficiently verified by investigating the pairwise abundance change in independent cohorts using ELISA or targeted mass spectrometry techniques. This generalized classifier can be extended to other clinical problems in a variety of contexts.

  16. Living network meta-analysis compared with pairwise meta-analysis in comparative effectiveness research: empirical study.

    Science.gov (United States)

    Nikolakopoulou, Adriani; Mavridis, Dimitris; Furukawa, Toshi A; Cipriani, Andrea; Tricco, Andrea C; Straus, Sharon E; Siontis, George C M; Egger, Matthias; Salanti, Georgia

    2018-02-28

    To examine whether the continuous updating of networks of prospectively planned randomised controlled trials (RCTs) ("living" network meta-analysis) provides strong evidence against the null hypothesis in comparative effectiveness of medical interventions earlier than the updating of conventional, pairwise meta-analysis. Empirical study of the accumulating evidence about the comparative effectiveness of clinical interventions. Database of network meta-analyses of RCTs identified through searches of Medline, Embase, and the Cochrane Database of Systematic Reviews until 14 April 2015. Network meta-analyses published after January 2012 that compared at least five treatments and included at least 20 RCTs. Clinical experts were asked to identify in each network the treatment comparison of greatest clinical interest. Comparisons were excluded for which direct and indirect evidence disagreed, based on side, or node, splitting test (Pmeta-analyses were performed for each selected comparison. Monitoring boundaries of statistical significance were constructed and the evidence against the null hypothesis was considered to be strong when the monitoring boundaries were crossed. A significance level was defined as α=5%, power of 90% (β=10%), and an anticipated treatment effect to detect equal to the final estimate from the network meta-analysis. The frequency and time to strong evidence was compared against the null hypothesis between pairwise and network meta-analyses. 49 comparisons of interest from 44 networks were included; most (n=39, 80%) were between active drugs, mainly from the specialties of cardiology, endocrinology, psychiatry, and rheumatology. 29 comparisons were informed by both direct and indirect evidence (59%), 13 by indirect evidence (27%), and 7 by direct evidence (14%). Both network and pairwise meta-analysis provided strong evidence against the null hypothesis for seven comparisons, but for an additional 10 comparisons only network meta-analysis provided

  17. Living network meta-analysis compared with pairwise meta-analysis in comparative effectiveness research: empirical study

    Science.gov (United States)

    Nikolakopoulou, Adriani; Mavridis, Dimitris; Furukawa, Toshi A; Cipriani, Andrea; Tricco, Andrea C; Straus, Sharon E; Siontis, George C M; Egger, Matthias

    2018-01-01

    Abstract Objective To examine whether the continuous updating of networks of prospectively planned randomised controlled trials (RCTs) (“living” network meta-analysis) provides strong evidence against the null hypothesis in comparative effectiveness of medical interventions earlier than the updating of conventional, pairwise meta-analysis. Design Empirical study of the accumulating evidence about the comparative effectiveness of clinical interventions. Data sources Database of network meta-analyses of RCTs identified through searches of Medline, Embase, and the Cochrane Database of Systematic Reviews until 14 April 2015. Eligibility criteria for study selection Network meta-analyses published after January 2012 that compared at least five treatments and included at least 20 RCTs. Clinical experts were asked to identify in each network the treatment comparison of greatest clinical interest. Comparisons were excluded for which direct and indirect evidence disagreed, based on side, or node, splitting test (Pmeta-analysis. The frequency and time to strong evidence was compared against the null hypothesis between pairwise and network meta-analyses. Results 49 comparisons of interest from 44 networks were included; most (n=39, 80%) were between active drugs, mainly from the specialties of cardiology, endocrinology, psychiatry, and rheumatology. 29 comparisons were informed by both direct and indirect evidence (59%), 13 by indirect evidence (27%), and 7 by direct evidence (14%). Both network and pairwise meta-analysis provided strong evidence against the null hypothesis for seven comparisons, but for an additional 10 comparisons only network meta-analysis provided strong evidence against the null hypothesis (P=0.002). The median time to strong evidence against the null hypothesis was 19 years with living network meta-analysis and 23 years with living pairwise meta-analysis (hazard ratio 2.78, 95% confidence interval 1.00 to 7.72, P=0.05). Studies directly comparing

  18. Bispectral pairwise interacting source analysis for identifying systems of cross-frequency interacting brain sources from electroencephalographic or magnetoencephalographic signals

    Science.gov (United States)

    Chella, Federico; Pizzella, Vittorio; Zappasodi, Filippo; Nolte, Guido; Marzetti, Laura

    2016-05-01

    Brain cognitive functions arise through the coordinated activity of several brain regions, which actually form complex dynamical systems operating at multiple frequencies. These systems often consist of interacting subsystems, whose characterization is of importance for a complete understanding of the brain interaction processes. To address this issue, we present a technique, namely the bispectral pairwise interacting source analysis (biPISA), for analyzing systems of cross-frequency interacting brain sources when multichannel electroencephalographic (EEG) or magnetoencephalographic (MEG) data are available. Specifically, the biPISA makes it possible to identify one or many subsystems of cross-frequency interacting sources by decomposing the antisymmetric components of the cross-bispectra between EEG or MEG signals, based on the assumption that interactions are pairwise. Thanks to the properties of the antisymmetric components of the cross-bispectra, biPISA is also robust to spurious interactions arising from mixing artifacts, i.e., volume conduction or field spread, which always affect EEG or MEG functional connectivity estimates. This method is an extension of the pairwise interacting source analysis (PISA), which was originally introduced for investigating interactions at the same frequency, to the study of cross-frequency interactions. The effectiveness of this approach is demonstrated in simulations for up to three interacting source pairs and for real MEG recordings of spontaneous brain activity. Simulations show that the performances of biPISA in estimating the phase difference between the interacting sources are affected by the increasing level of noise rather than by the number of the interacting subsystems. The analysis of real MEG data reveals an interaction between two pairs of sources of central mu and beta rhythms, localizing in the proximity of the left and right central sulci.

  19. Multilevel summation methods for efficient evaluation of long-range pairwise interactions in atomistic and coarse-grained molecular simulation.

    Energy Technology Data Exchange (ETDEWEB)

    Bond, Stephen D.

    2014-01-01

    The availability of efficient algorithms for long-range pairwise interactions is central to the success of numerous applications, ranging in scale from atomic-level modeling of materials to astrophysics. This report focuses on the implementation and analysis of the multilevel summation method for approximating long-range pairwise interactions. The computational cost of the multilevel summation method is proportional to the number of particles, N, which is an improvement over FFTbased methods whos cost is asymptotically proportional to N logN. In addition to approximating electrostatic forces, the multilevel summation method can be use to efficiently approximate convolutions with long-range kernels. As an application, we apply the multilevel summation method to a discretized integral equation formulation of the regularized generalized Poisson equation. Numerical results are presented using an implementation of the multilevel summation method in the LAMMPS software package. Preliminary results show that the computational cost of the method scales as expected, but there is still a need for further optimization.

  20. ChIP-PIT: Enhancing the Analysis of ChIP-Seq Data Using Convex-Relaxed Pair-Wise Interaction Tensor Decomposition.

    Science.gov (United States)

    Zhu, Lin; Guo, Wei-Li; Deng, Su-Ping; Huang, De-Shuang

    2016-01-01

    In recent years, thanks to the efforts of individual scientists and research consortiums, a huge amount of chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experimental data have been accumulated. Instead of investigating them independently, several recent studies have convincingly demonstrated that a wealth of scientific insights can be gained by integrative analysis of these ChIP-seq data. However, when used for the purpose of integrative analysis, a serious drawback of current ChIP-seq technique is that it is still expensive and time-consuming to generate ChIP-seq datasets of high standard. Most researchers are therefore unable to obtain complete ChIP-seq data for several TFs in a wide variety of cell lines, which considerably limits the understanding of transcriptional regulation pattern. In this paper, we propose a novel method called ChIP-PIT to overcome the aforementioned limitation. In ChIP-PIT, ChIP-seq data corresponding to a diverse collection of cell types, TFs and genes are fused together using the three-mode pair-wise interaction tensor (PIT) model, and the prediction of unperformed ChIP-seq experimental results is formulated as a tensor completion problem. Computationally, we propose efficient first-order method based on extensions of coordinate descent method to learn the optimal solution of ChIP-PIT, which makes it particularly suitable for the analysis of massive scale ChIP-seq data. Experimental evaluation the ENCODE data illustrate the usefulness of the proposed model.

  1. Analysis of pairwise correlations in multi-parametric PET/MR data for biological tumor characterization and treatment individualization strategies

    Energy Technology Data Exchange (ETDEWEB)

    Leibfarth, Sara; Moennich, David; Thorwarth, Daniela [University Hospital Tuebingen, Section for Biomedical Physics, Department of Radiation Oncology, Tuebingen (Germany); Simoncic, Urban [University Hospital Tuebingen, Section for Biomedical Physics, Department of Radiation Oncology, Tuebingen (Germany); University of Ljubljana, Faculty of Mathematics and Physics, Ljubljana (Slovenia); Jozef Stefan Institute, Ljubljana (Slovenia); Welz, Stefan; Zips, Daniel [University Hospital Tuebingen, Department of Radiation Oncology, Tuebingen (Germany); Schmidt, Holger; Schwenzer, Nina [University Hospital Tuebingen, Department of Diagnostic and Interventional Radiology, Tuebingen (Germany)

    2016-07-15

    The aim of this pilot study was to explore simultaneous functional PET/MR for biological characterization of tumors and potential future treatment adaptations. To investigate the extent of complementarity between different PET/MR-based functional datasets, a pairwise correlation analysis was performed. Functional datasets of N=15 head and neck (HN) cancer patients were evaluated. For patients of group A (N=7), combined PET/MR datasets including FDG-PET and ADC maps were available. Patients of group B (N=8) had FMISO-PET, DCE-MRI and ADC maps from combined PET/MRI, an additional dynamic FMISO-PET/CT acquired directly after FMISO tracer injection as well as an FDG-PET/CT acquired a few days earlier. From DCE-MR, parameter maps K{sup trans}, v{sub e} and v{sub p} were obtained with the extended Tofts model. Moreover, parameter maps of mean DCE enhancement, ΔS{sub DCE}, and mean FMISO signal 0-4 min p.i., anti A{sub FMISO}, were derived. Pairwise correlations were quantified using the Spearman correlation coefficient (r) on both a voxel and a regional level within the gross tumor volume. Between some pairs of functional imaging modalities moderate correlations were observed with respect to the median over all patient datasets, whereas distinct correlations were only present on an individual basis. Highest inter-modality median correlations on the voxel level were obtained for FDG/FMISO (r = 0.56), FDG/ anti A{sub FMISO} (r = 0.55), anti A{sub FMISO}/ΔS{sub DCE} (r = 0.46), and FDG/ADC (r = -0.39). Correlations on the regional level showed comparable results. The results of this study suggest that the examined functional datasets provide complementary information. However, only pairwise correlations were examined, and correlations could still exist between combinations of three or more datasets. These results might contribute to the future design of individually adapted treatment approaches based on multiparametric functional imaging.

  2. Design of Long Period Pseudo-Random Sequences from the Addition of m -Sequences over 𝔽 p

    Directory of Open Access Journals (Sweden)

    Ren Jian

    2004-01-01

    Full Text Available Pseudo-random sequence with good correlation property and large linear span is widely used in code division multiple access (CDMA communication systems and cryptology for reliable and secure information transmission. In this paper, sequences with long period, large complexity, balance statistics, and low cross-correlation property are constructed from the addition of m -sequences with pairwise-prime linear spans (AMPLS. Using m -sequences as building blocks, the proposed method proved to be an efficient and flexible approach to construct long period pseudo-random sequences with desirable properties from short period sequences. Applying the proposed method to 𝔽 2 , a signal set ( ( 2 n − 1 ( 2 m − 1 , ( 2 n + 1 ( 2 m + 1 , ( 2 ( n + 1 / 2 + 1 ( 2 ( m + 1 / 2 + 1 is constructed.

  3. Ancestral sequence alignment under optimal conditions

    Directory of Open Access Journals (Sweden)

    Brown Daniel G

    2005-11-01

    Full Text Available Abstract Background Multiple genome alignment is an important problem in bioinformatics. An important subproblem used by many multiple alignment approaches is that of aligning two multiple alignments. Many popular alignment algorithms for DNA use the sum-of-pairs heuristic, where the score of a multiple alignment is the sum of its induced pairwise alignment scores. However, the biological meaning of the sum-of-pairs of pairs heuristic is not obvious. Additionally, many algorithms based on the sum-of-pairs heuristic are complicated and slow, compared to pairwise alignment algorithms. An alternative approach to aligning alignments is to first infer ancestral sequences for each alignment, and then align the two ancestral sequences. In addition to being fast, this method has a clear biological basis that takes into account the evolution implied by an underlying phylogenetic tree. In this study we explore the accuracy of aligning alignments by ancestral sequence alignment. We examine the use of both maximum likelihood and parsimony to infer ancestral sequences. Additionally, we investigate the effect on accuracy of allowing ambiguity in our ancestral sequences. Results We use synthetic sequence data that we generate by simulating evolution on a phylogenetic tree. We use two different types of phylogenetic trees: trees with a period of rapid growth followed by a period of slow growth, and trees with a period of slow growth followed by a period of rapid growth. We examine the alignment accuracy of four ancestral sequence reconstruction and alignment methods: parsimony, maximum likelihood, ambiguous parsimony, and ambiguous maximum likelihood. Additionally, we compare against the alignment accuracy of two sum-of-pairs algorithms: ClustalW and the heuristic of Ma, Zhang, and Wang. Conclusion We find that allowing ambiguity in ancestral sequences does not lead to better multiple alignments. Regardless of whether we use parsimony or maximum likelihood, the

  4. Socio-economic scenario development for the assessment of climate change impacts on agricultural land use: a pairwise comparison approach

    DEFF Research Database (Denmark)

    Abildtrup, Jens; Audsley, E.; Fekete-Farkas, M.

    2006-01-01

    Assessment of the vulnerability of agriculture to climate change is strongly dependent on concurrent changes in socio-economic development pathways. This paper presents an integrated approach to the construction of socio-economic scenarios required for the analysis of climate change impacts...... on European agricultural land use. The scenarios are interpreted from the storylines described in the intergovernmental panel on climate change (IPCC) special report on emission scenarios (SRES), which ensures internal consistency between the evolution of socio-economics and climate change. A stepwise...... downscaling procedure based on expert-judgement and pairwise comparison is presented to obtain quantitative socio-economic parameters, e.g. prices and productivity estimates that are input to the ACCELERATES integrated land use model. In the first step, the global driving forces are identified and quantified...

  5. Dynamical pairwise entanglement and two-point correlations in the three-ligand spin-star structure

    Science.gov (United States)

    Motamedifar, M.

    2017-10-01

    We consider the three-ligand spin-star structure through homogeneous Heisenberg interactions (XXX-3LSSS) in the framework of dynamical pairwise entanglement. It is shown that the time evolution of the central qubit ;one-particle; state (COPS) brings about the generation of quantum W states at periodical time instants. On the contrary, W states cannot be generated from the time evolution of a ligand ;one-particle; state (LOPS). We also investigate the dynamical behavior of two-point quantum correlations as well as the expectation values of the different spin-components for each element in the XXX-3LSSS. It is found that when a W state is generated, the same value of the concurrence between any two arbitrary qubits arises from the xx and yy two-point quantum correlations. On the opposite, zz quantum correlation between any two qubits vanishes at these time instants.

  6. Pairwise NMR experiments for the determination of protein backbone dihedral angle Φ based on cross-correlated spin relaxation

    International Nuclear Information System (INIS)

    Takahashi, Hideo; Shimada, Ichio

    2007-01-01

    Novel cross-correlated spin relaxation (CCR) experiments are described, which measure pairwise CCR rates for obtaining peptide dihedral angles Φ. The experiments utilize intra-HNCA type coherence transfer to refocus 2-bond J NCα coupling evolution and generate the N (i)-C α (i) or C'(i-1)-C α (i) multiple quantum coherences which are required for measuring the desired CCR rates. The contribution from other coherences is also discussed and an appropriate setting of the evolution delays is presented. These CCR experiments were applied to 15 N- and 13 C-labeled human ubiquitin. The relevant CCR rates showed a high degree of correlation with the Φ angles observed in the X-ray structure. By utilizing these CCR experiments in combination with those previously established for obtaining dihedral angle Ψ, we can determine high resolution structures of peptides that bind weakly to large target molecules

  7. Assessment of crystalline disorder in cryo-milled samples of indomethacin using atomic pair-wise distribution functions

    DEFF Research Database (Denmark)

    Bøtker, Johan P; Karmwar, Pranav; Strachan, Clare J

    2011-01-01

    to analyse the cryo-milled samples. The high similarity between the ¿-indomethacin cryogenic ball milled samples and the crude ¿-indomethacin indicated that milled samples retained residual order of the ¿-form. The PDF analysis encompassed the capability of achieving a correlation with the physical......The aim of this study was to investigate the usefulness of the atomic pair-wise distribution function (PDF) to detect the extension of disorder/amorphousness induced into a crystalline drug using a cryo-milling technique, and to determine the optimal milling times to achieve amorphisation. The PDF...... properties determined from DSC, ss-NMR and stability experiments. Multivariate data analysis (MVDA) was used to visualize the differences in the PDF and XRPD data. The MVDA approach revealed that PDF is more efficient in assessing the introduced degree of disorder in ¿-indomethacin after cryo-milling than...

  8. Plant lock and ant key: pairwise coevolution of an exclusion filter in an ant-plant mutualism.

    Science.gov (United States)

    Brouat, C; Garcia, N; Andary, C; McKey, D

    2001-10-22

    Although observations suggest pairwise coevolution in specific ant-plant symbioses, coevolutionary processes have rarely been demonstrated. We report on, what is to the authors' knowledge, the strongest evidence yet for reciprocal adaptation of morphological characters in a species-specific ant-plant mutualism. The plant character is the prostoma, which is a small unlignified organ at the apex of the domatia in which symbiotic ants excavate an entrance hole. Each myrmecophyte in the genus Leonardoxa has evolved a prostoma with a different shape. By performing precise measurements on the prostomata of three related myrmecophytes, on their specific associated ants and on the entrance holes excavated by symbiotic ants at the prostomata, we showed that correspondence of the plant and ant traits forms a morphological and behavioural filter. We have strong evidence for coevolution between the dimensions and shape of the symbiotic ants and the prostoma in one of the three ant-Leonardoxa associations.

  9. Galaxy and Mass Assembly (GAMA): small-scale anisotropic galaxy clustering and the pairwise velocity dispersion of galaxies

    Science.gov (United States)

    Loveday, J.; Christodoulou, L.; Norberg, P.; Peacock, J. A.; Baldry, I. K.; Bland-Hawthorn, J.; Brown, M. J. I.; Colless, M.; Driver, S. P.; Holwerda, B. W.; Hopkins, A. M.; Kafle, P. R.; Liske, J.; Lopez-Sanchez, A. R.; Taylor, E. N.

    2018-03-01

    The galaxy pairwise velocity dispersion (PVD) can provide important tests of non-standard gravity and galaxy formation models. We describe measurements of the PVD of galaxies in the Galaxy and Mass Assembly (GAMA) survey as a function of projected separation and galaxy luminosity. Due to the faint magnitude limit (r PVD to smaller scales (r⊥ = 0.01 h - 1 Mpc) than previous work. The measured PVD at projected separations r⊥ ≲ 1 h - 1 Mpc increases near monotonically with increasing luminosity from σ12 ≈ 200 km s - 1 at Mr = -17 mag to σ12 ≈ 600 km s - 1 at Mr ≈ -22 mag. Analysis of the Gonzalez-Perez et al. (2014) GALFORM semi-analytic model yields no such trend of PVD with luminosity: the model overpredicts the PVD for faint galaxies. This is most likely a result of the model placing too many low-luminosity galaxies in massive haloes.

  10. Detecting the limits of regulatory element conservation anddivergence estimation using pairwise and multiple alignments

    Energy Technology Data Exchange (ETDEWEB)

    Pollard, Daniel A.; Moses, Alan M.; Iyer, Venky N.; Eisen,Michael B.

    2006-08-14

    Background: Molecular evolutionary studies of noncodingsequences rely on multiple alignments. Yet how multiple alignmentaccuracy varies across sequence types, tree topologies, divergences andtools, and further how this variation impacts specific inferences,remains unclear. Results: Here we develop a molecular evolutionsimulation platform, CisEvolver, with models of background noncoding andtranscription factor binding site evolution, and use simulated alignmentsto systematically examine multiple alignment accuracy and its impact ontwo key molecular evolutionary inferences: transcription factor bindingsite conservation and divergence estimation. We find that the accuracy ofmultiple alignments is determined almost exclusively by the pairwisedivergence distance of the two most diverged species and that additionalspecies have a negligible influence on alignment accuracy. Conservedtranscription factor binding sites align better than surroundingnoncoding DNA yet are often found to be misaligned at relatively shortdivergence distances, such that studies of binding site gain and losscould easily be confounded by alignment error. Divergence estimates frommultiple alignments tend to be overestimated at short divergencedistances but reach a tool specific divergence at which they cease toincrease, leading to underestimation at long divergences. Our moststriking finding was that overall alignment accuracy, binding sitealignment accuracy and divergence estimation accuracy vary greatly acrossbranches in a tree and are most accurate for terminal branches connectingsister taxa and least accurate for internal branches connectingsub-alignments. Conclusions: Our results suggest that variation inalignment accuracy can lead to errors in molecular evolutionaryinferences that could be construed as biological variation. Thesefindings have implications for which species to choose for analyses, whatkind of errors would be expected for a given set of species and howmultiple alignment tools and

  11. Surface feeding and aggressive behaviour of diploid and triploid brown trout Salmo trutta during allopatric pair-wise matchings.

    Science.gov (United States)

    Preston, A C; Taylor, J F; Adams, C E; Migaud, H

    2014-09-01

    Diploid and triploid brown trout Salmo trutta were acclimated for 6 weeks on two feeding regimes (floating and sinking). Thereafter, aggression and surface feeding response were compared between pairs of all diploid, all triploid and diploid and triploid S. trutta in an experimental stream. In each pair-wise matching, fish of similar size were placed in allopatry and rank was determined by the total number of aggressive interactions recorded. Dominant individuals initiated more aggression than subordinates, spent more time defending a territory and positioned themselves closer to the surface food source (Gammarus pulex), whereas subordinates occupied the peripheries. In cross ploidy trials, diploid S. trutta were more aggressive than triploid, and dominated their sibling when placed in pair-wise matchings. Surface feeding, however, did not differ statistically between ploidy irrespective of feeding regime. Triploids adopted a sneak feeding strategy while diploids expended more time defending a territory. In addition, we also tested whether triploids exhibit a similar social dominance to diploids when placed in allopatry. Although aggression was lower in triploid pairs than in the diploid and triploid pairs, a dominance hierarchy was also observed between individuals of the same ploidy. Dominant triploid fish were more aggressive and consumed more feed items than subordinate individuals. Subordinate fish displayed a darker colour index than dominant fish suggesting increased stress levels. Dominant triploid fish, however, appeared to be more tolerant of subordinate individuals and did not display the same degree of invasive aggression as seen in the diploid and diploid or diploid and triploid matchings. These novel findings suggest that sterile triploid S. trutta feed similarly but are less aggressive than diploid trout. Future studies should determine the habitat choice of triploid S. trutta after release and the interaction between wild fish and triploids during

  12. Synchronization of pairwise-coupled, identical, relaxation oscillators based on metal-insulator phase transition devices: A model study

    Science.gov (United States)

    Parihar, Abhinav; Shukla, Nikhil; Datta, Suman; Raychowdhury, Arijit

    2015-02-01

    Computing with networks of synchronous oscillators has attracted wide-spread attention as novel materials and device topologies have enabled realization of compact, scalable and low-power coupled oscillatory systems. Of particular interest are compact and low-power relaxation oscillators that have been recently demonstrated using MIT (metal-insulator-transition) devices using properties of correlated oxides. Further the computational capability of pairwise coupled relaxation oscillators has also been shown to outperform traditional Boolean digital logic circuits. This paper presents an analysis of the dynamics and synchronization of a system of two such identical coupled relaxation oscillators implemented with MIT devices. We focus on two implementations of the oscillator: (a) a D-D configuration where complementary MIT devices (D) are connected in series to provide oscillations and (b) a D-R configuration where it is composed of a resistor (R) in series with a voltage-triggered state changing MIT device (D). The MIT device acts like a hysteresis resistor with different resistances in the two different states. The synchronization dynamics of such a system has been analyzed with purely charge based coupling using a resistive (RC) and a capacitive (CC) element in parallel. It is shown that in a D-D configuration symmetric, identical and capacitively coupled relaxation oscillator system synchronizes to an anti-phase locking state, whereas when coupled resistively the system locks in phase. Further, we demonstrate that for certain range of values of RC and CC, a bistable system is possible which can have potential applications in associative computing. In D-R configuration, we demonstrate the existence of rich dynamics including non-monotonic flows and complex phase relationship governed by the ratios of the coupling impedance. Finally, the developed theoretical formulations have been shown to explain experimentally measured waveforms of such pairwise coupled

  13. Sequence assembly

    DEFF Research Database (Denmark)

    Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria

    2009-01-01

    Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and...... in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html....

  14. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  15. A time warping approach to multiple sequence alignment.

    Science.gov (United States)

    Arribas-Gil, Ana; Matias, Catherine

    2017-04-25

    We propose an approach for multiple sequence alignment (MSA) derived from the dynamic time warping viewpoint and recent techniques of curve synchronization developed in the context of functional data analysis. Starting from pairwise alignments of all the sequences (viewed as paths in a certain space), we construct a median path that represents the MSA we are looking for. We establish a proof of concept that our method could be an interesting ingredient to include into refined MSA techniques. We present a simple synthetic experiment as well as the study of a benchmark dataset, together with comparisons with 2 widely used MSA softwares.

  16. Multicoil2: predicting coiled coils and their oligomerization states from sequence in the twilight zone.

    Directory of Open Access Journals (Sweden)

    Jason Trigg

    Full Text Available The alpha-helical coiled coil can adopt a variety of topologies, among the most common of which are parallel and antiparallel dimers and trimers. We present Multicoil2, an algorithm that predicts both the location and oligomerization state (two versus three helices of coiled coils in protein sequences. Multicoil2 combines the pairwise correlations of the previous Multicoil method with the flexibility of Hidden Markov Models (HMMs in a Markov Random Field (MRF. The resulting algorithm integrates sequence features, including pairwise interactions, through multinomial logistic regression to devise an optimized scoring function for distinguishing dimer, trimer and non-coiled-coil oligomerization states; this scoring function is used to produce Markov Random Field potentials that incorporate pairwise correlations localized in sequence. Multicoil2 significantly improves both coiled-coil detection and dimer versus trimer state prediction over the original Multicoil algorithm retrained on a newly-constructed database of coiled-coil sequences. The new database, comprised of 2,105 sequences containing 124,088 residues, includes reliable structural annotations based on experimental data in the literature. Notably, the enhanced performance of Multicoil2 is evident when tested in stringent leave-family-out cross-validation on the new database, reflecting expected performance on challenging new prediction targets that have minimal sequence similarity to known coiled-coil families. The Multicoil2 program and training database are available for download from http://multicoil2.csail.mit.edu.

  17. Sequence embedding for fast construction of guide trees for multiple sequence alignment

    LENUS (Irish Health Repository)

    Blackshields, Gordon

    2010-05-14

    Abstract Background The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http:\\/\\/www.clustal.org\\/mbed.tgz.

  18. On generalized fixed sequence procedures for controlling the FWER.

    Science.gov (United States)

    Qiu, Zhiying; Guo, Wenge; Lynch, Gavin

    2015-12-30

    Testing a sequence of pre-ordered hypotheses to decide which of these can be rejected or accepted while controlling the familywise error rate (FWER) is of importance in many scientific studies such as clinical trials. In this paper, we first introduce a generalized fixed sequence procedure whose critical values are defined by using a function of the numbers of rejections and acceptances, and which allows follow-up hypotheses to be tested even if some earlier hypotheses are not rejected. We then construct the least favorable configuration for this generalized fixed sequence procedure and present a sufficient condition for the FWER control under arbitrary dependence. Based on the condition, we develop three new generalized fixed sequence procedures controlling the FWER under arbitrary dependence. We also prove that each generalized fixed sequence procedure can be described as a specific closed testing procedure. Through simulation studies and a clinical trial example, we compare the power performance of these proposed procedures with those of the existing FWER controlling procedures. Finally, when the pairwise joint distributions of the true null p-values are known, we further improve these procedures by incorporating pairwise correlation information while maintaining the control of the FWER. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  19. Optimal definition of inter-residual contact in globular proteins based on pairwise interaction energy calculations, its robustness, and applications.

    Science.gov (United States)

    Fačkovec, Boris; Vondrášek, Jiří

    2012-10-25

    Although a contact is an essential measurement for the topology as well as strength of non-covalent interactions in biomolecules and their complexes, there is no general agreement in the definition of this feature. Most of the definitions work with simple geometric criteria which do not fully reflect the energy content or ability of the biomolecular building blocks to arrange their environment. We offer a reasonable solution to this problem by distinguishing between "productive" and "non-productive" contacts based on their interaction energy strength and properties. We have proposed a method which converts the protein topology into a contact map that represents interactions with statistically significant high interaction energies. We do not prove that these contacts are exclusively stabilizing, but they represent a gateway to thermodynamically important rather than geometry-based contacts. The process is based on protein fragmentation and calculation of interaction energies using the OPLS force field and relies on pairwise additivity of amino acid interactions. Our approach integrates the treatment of different types of interactions, avoiding the problems resulting from different contributions to the overall stability and the different effect of the environment. The first applications on a set of homologous proteins have shown the usefulness of this classification for a sound estimate of protein stability.

  20. The use of a modified pairwise comparison method in evaluating critical success factors for community-based rural homestay programmes

    Science.gov (United States)

    Daud, Shahidah Md; Ramli, Razamin; Kasim, Maznah Mat; Kayat, Kalsom; Razak, Rafidah Abd

    2014-12-01

    Tourism industry has become the highlighted sector which has amazingly increased the national income level. Despite the tourism industry being one of the highest income generating sectors, Homestay Programme as a Community-Based Tourism (CBT) product in Malaysia does not absorbed much of the incoming wealth. Homestay Programme refers to a programme in a community where a tourist stays together with a host family and experiences the everyday way of life of the family in both direct and indirect manner. There are over 100 Homestay Programme currently being registered with the Ministry of Culture and Tourism Malaysia which mostly are located in rural areas, but only a few excel and enjoying the fruit of the booming industry. Hence, this article seeks to identify the critical success factors for a Community-Based Rural Homestay Programme in Malaysia. A modified pairwise method is utilized to further evaluate the identified success factors in a more meaningful way. The findings will help Homestay Programme function as a community development tool that manages tourism resources. Thus, help the community in improving local economy and creating job opportunities.

  1. Evaluation of criteria for sustainability of community-based rural homestay programs via a modified pairwise comparison method

    Science.gov (United States)

    Ramli, Rohaini; Kasim, Maznah Mat; Ramli, Razamin; Kayat, Kalsom; Razak, Rafidah Abd

    2014-12-01

    Ministry of Tourism and Culture Malaysia has long introduced homestay programs across the country to enhance the quality of life of people, especially those living in rural areas. This type of program is classified as a community-based tourism (CBT) as it is expected to economically improve livelihood through cultural and community associated activities. It is the aspiration of the ministry to see that the income imbalance between people in the rural and urban areas is reduced, thus would contribute towards creating more developed states of Malaysia. Since 1970s, there are 154 homestay programs registered with the ministry. However, the performance and sustainability of the programs are still not satisfying. There are only a number of homestay programs that perform well and able to sustain. Thus, the aim of this paper is to identify relevant criteria contributing to the sustainability of a homestay program. The criteria are evaluated for their levels of importance via the use of a modified pairwise method and analyzed for other potentials. The findings will help the homestay operators to focus on the necessary criteria and thus, effectively perform as the CBT business initiative.

  2. Randomized Approaches for Nearest Neighbor Search in Metric Space When Computing the Pairwise Distance Is Extremely Expensive

    Science.gov (United States)

    Wang, Lusheng; Yang, Yong; Lin, Guohui

    Finding the closest object for a query in a database is a classical problem in computer science. For some modern biological applications, computing the similarity between two objects might be very time consuming. For example, it takes a long time to compute the edit distance between two whole chromosomes and the alignment cost of two 3D protein structures. In this paper, we study the nearest neighbor search problem in metric space, where the pair-wise distance between two objects in the database is known and we want to minimize the number of distances computed on-line between the query and objects in the database in order to find the closest object. We have designed two randomized approaches for indexing metric space databases, where objects are purely described by their distances with each other. Analysis and experiments show that our approaches only need to compute O(logn) objects in order to find the closest object, where n is the total number of objects in the database.

  3. Remarkable sequence conservation of the last intron in the PKD1 gene.

    Science.gov (United States)

    Rodova, Marianna; Islam, M Rafiq; Peterson, Kenneth R; Calvet, James P

    2003-10-01

    The last intron of the PKD1 gene (intron 45) was found to have exceptionally high sequence conservation across four mammalian species: human, mouse, rat, and dog. This conservation did not extend to the comparable intron in pufferfish. Pairwise comparisons for intron 45 showed 91% identity (human vs. dog) to 100% identity (mouse vs. rat) for an average for all four species of 94% identity. In contrast, introns 43 and 44 of the PKD1 gene had average pairwise identities of 57% and 54%, and exons 43, 44, and 45 and the coding region of exon 46 had average pairwise identities of 80%, 84%, 82%, and 80%. Intron 45 is 90 to 95 bp in length, with the major region of sequence divergence being in a central 4-bp to 9-bp variable region. RNA secondary structure analysis of intron 45 predicts a branching stem-loop structure in which the central variable region lies in one loop and the putative branch point sequence lies in another loop, suggesting that the intron adopts a specific stem-loop structure that may be important for its removal. Although intron 45 appears to conform to the class of small, G-triplet-containing introns that are spliced by a mechanism utilizing intron definition, its high sequence conservation may be a reflection of constraints imposed by a unique mechanism that coordinates splicing of this last PKD1 intron with polyadenylation.

  4. Improving the modelling of redshift-space distortions - I. A bivariate Gaussian description for the galaxy pairwise velocity distributions

    Science.gov (United States)

    Bianchi, Davide; Chiesa, Matteo; Guzzo, Luigi

    2015-01-01

    As a step towards a more accurate modelling of redshift-space distortions (RSD) in galaxy surveys, we develop a general description of the probability distribution function of galaxy pairwise velocities within the framework of the so-called streaming model. For a given galaxy separation r, such function can be described as a superposition of virtually infinite local distributions. We characterize these in terms of their moments and then consider the specific case in which they are Gaussian functions, each with its own mean μ and dispersion σ. Based on physical considerations, we make the further crucial assumption that these two parameters are in turn distributed according to a bivariate Gaussian, with its own mean and covariance matrix. Tests using numerical simulations explicitly show that with this compact description one can correctly model redshift-space distortions on all scales, fully capturing the overall linear and non-linear dynamics of the galaxy flow at different separations. In particular, we naturally obtain Gaussian/exponential, skewed/unskewed distribution functions, depending on separation as observed in simulations and data. Also, the recently proposed single-Gaussian description of RSD is included in this model as a limiting case, when the bivariate Gaussian is collapsed to a two-dimensional Dirac delta function. We also show how this description naturally allows for the Taylor expansion of 1 + ξS(s) around 1 + ξR(r), which leads to the Kaiser linear formula when truncated to second order, explicating its connection with the moments of the velocity distribution functions. More work is needed, but these results indicate a very promising path to make definitive progress in our programme to improve RSD estimators.

  5. Post-Hartree-Fock studies of the He/Mg(0001) interaction: Anti-corrugation, screening, and pairwise additivity

    Energy Technology Data Exchange (ETDEWEB)

    Lara-Castells, María Pilar de, E-mail: Pilar.deLara.Castells@csic.es [Instituto de Física Fundamental (CSIC), Serrano 123, E-28006 Madrid (Spain); Fernández-Perea, Ricardo [Instituto de Estructura de la Materia (CSIC), Serrano 123, E-28006 Madrid (Spain); Madzharova, Fani; Voloshina, Elena, E-mail: elena.voloshina@hu-berlin.de [Humboldt-Universität zu Berlin, Institut für Chemie, Unter den Linden 6, 10099 Berlin (Germany)

    2016-06-28

    The adsorption of noble gases on metallic surfaces represents a paradigmatic case of van-der-Waals (vdW) interaction due to the role of screening effects on the corrugation of the interaction potential [J. L. F. Da Silva et al., Phys. Rev. Lett. 90, 066104 (2003)]. The extremely small adsorption energy of He atoms on the Mg(0001) surface (below 3 meV) and the delocalized nature and mobility of the surface electrons make the He/Mg(0001) system particularly challenging, even for state-of-the-art vdW-corrected density functional-based (vdW-DFT) approaches [M. P. de Lara-Castells et al., J. Chem. Phys. 143, 194701 (2015)]. In this work, we meet this challenge by applying two different procedures. First, the dispersion-corrected second-order Möller-Plesset perturbation theory (MP2C) approach is adopted, using bare metal clusters of increasing size. Second, the method of increments [H. Stoll, J. Chem. Phys. 97, 8449 (1992)] is applied at coupled cluster singles and doubles and perturbative triples level, using embedded cluster models of the metal surface. Both approaches provide clear evidences of the anti-corrugation of the interaction potential: the He atom prefers on-top sites, instead of the expected hollow sites. This is interpreted as a signature of the screening of the He atom by the metal for the on-top configuration. The strong screening in the metal is clearly reflected in the relative contribution of successively deeper surface layers to the main dispersion contribution. Aimed to assist future dynamical simulations, a pairwise potential model for the He/surface interaction as a sum of effective He–Mg pair potentials is also presented, as an improvement of the approximation using isolated He–Mg pairs.

  6. Acupuncture-Related Techniques for Psoriasis: A Systematic Review with Pairwise and Network Meta-Analyses of Randomized Controlled Trials.

    Science.gov (United States)

    Yeh, Mei-Ling; Ko, Shu-Hua; Wang, Mei-Hua; Chi, Ching-Chi; Chung, Yu-Chu

    2017-12-01

    There has be a large body of evidence on the pharmacological treatments for psoriasis, but whether nonpharmacological interventions are effective in managing psoriasis remains largely unclear. This systematic review conducted pairwise and network meta-analyses to determine the effects of acupuncture-related techniques on acupoint stimulation for the treatment of psoriasis and to determine the order of effectiveness of these remedies. This study searched the following databases from inception to March 15, 2016: Medline, PubMed, Cochrane Central Register of Controlled Trials, EBSCO (including Academic Search Premier, American Doctoral Dissertations, and CINAHL), Airiti Library, and China National Knowledge Infrastructure. Randomized controlled trials (RCTs) on the effects of acupuncture-related techniques on acupoint stimulation as intervention for psoriasis were independently reviewed by two researchers. A total of 13 RCTs with 1,060 participants were included. The methodological quality of included studies was not rigorous. Acupoint stimulation, compared with nonacupoint stimulation, had a significant treatment for psoriasis. However, the most common adverse events were thirst and dry mouth. Subgroup analysis was further done to confirm that the short-term treatment effect was superior to that of the long-term effect in treating psoriasis. Network meta-analysis identified acupressure or acupoint catgut embedding, compared with medication, and had a significant effect for improving psoriasis. It was noted that acupressure was the most effective treatment. Acupuncture-related techniques could be considered as an alternative or adjuvant therapy for psoriasis in short term, especially of acupressure and acupoint catgut embedding. This study recommends further well-designed, methodologically rigorous, and more head-to-head randomized trials to explore the effects of acupuncture-related techniques for treating psoriasis.

  7. The effect of fiscal incentives on market penetration of electric vehicles: A pairwise comparison of total cost of ownership

    International Nuclear Information System (INIS)

    Lévay, Petra Zsuzsa; Drossinos, Yannis; Thiel, Christian

    2017-01-01

    An important barrier to electric vehicle (EV) sales is their high purchase price compared to internal combustion engine (ICE) vehicles. We conducted total cost of ownership (TCO) calculations to study how costs and sales of EVs relate to each other and to examine the role of fiscal incentives in reducing TCO and increasing EV sales. We composed EV-ICE vehicle pairs that allowed cross-segment and cross-country comparison in eight European countries. Actual car prices were used to calculate the incentives for each model in each country. We found a negative TCO-sales relationship that differs across car segments. Compared to their ICE vehicle pair, big EVs have lower TCO, higher sales, and seem to be less price responsive than small EVs. Three country groups can be distinguished according to the level of fiscal incentives and their impact on TCO and EV sales. In Norway, incentives led to the lowest TCO for the EVs. In the Netherlands, France, and UK the TCO of EVs is close to the TCO of the ICE pairs. In the other countries the TCO of EVs exceeds that of the ICE vehicles. We found that exemptions from flat taxes favour big EVs, while lump-sum subsidies favour small EVs. - Highlights: • Pairwise comparison of EV and ICE vehicle TCO and sales in eight European countries. • In NO, EV TCO is lower than ICE TCO; in NL, FR, and UK, EV TCO is slightly higher. • Compared to ICE vehicles, big EVs have lower TCO and higher sales than small EVs. • Exemptions from flat taxes favour big EVs, lump-sum subsidies favour small EVs. • Most popular EV models: Tesla Model S, Nissan Leaf, Mitsubishi Outlander PHEV.

  8. RNA-Pareto: interactive analysis of Pareto-optimal RNA sequence-structure alignments.

    Science.gov (United States)

    Schnattinger, Thomas; Schöning, Uwe; Marchfelder, Anita; Kestler, Hans A

    2013-12-01

    Incorporating secondary structure information into the alignment process improves the quality of RNA sequence alignments. Instead of using fixed weighting parameters, sequence and structure components can be treated as different objectives and optimized simultaneously. The result is not a single, but a Pareto-set of equally optimal solutions, which all represent different possible weighting parameters. We now provide the interactive graphical software tool RNA-Pareto, which allows a direct inspection of all feasible results to the pairwise RNA sequence-structure alignment problem and greatly facilitates the exploration of the optimal solution set.

  9. Ensemble survival tree models to reveal pairwise interactions of variables with time-to-events outcomes in low-dimensional setting

    Science.gov (United States)

    Dazard, Jean-Eudes; Ishwaran, Hemant; Mehlotra, Rajeev; Weinberg, Aaron; Zimmerman, Peter

    2018-01-01

    Unraveling interactions among variables such as genetic, clinical, demographic and environmental factors is essential to understand the development of common and complex diseases. To increase the power to detect such variables interactions associated with clinical time-to-events outcomes, we borrowed established concepts from random survival forest (RSF) models. We introduce a novel RSF-based pairwise interaction estimator and derive a randomization method with bootstrap confidence intervals for inferring interaction significance. Using various linear and nonlinear time-to-events survival models in simulation studies, we first show the efficiency of our approach: true pairwise interaction-effects between variables are uncovered, while they may not be accompanied with their corresponding main-effects, and may not be detected by standard semi-parametric regression modeling and test statistics used in survival analysis. Moreover, using a RSF-based cross-validation scheme for generating prediction estimators, we show that informative predictors may be inferred. We applied our approach to an HIV cohort study recording key host gene polymorphisms and their association with HIV change of tropism or AIDS progression. Altogether, this shows how linear or nonlinear pairwise statistical interactions of variables may be efficiently detected with a predictive value in observational studies with time-to-event outcomes. PMID:29453930

  10. Comparison of sputum collection methods for tuberculosis diagnosis: a systematic review and pairwise and network meta-analysis.

    Science.gov (United States)

    Datta, Sumona; Shah, Lena; Gilman, Robert H; Evans, Carlton A

    2017-08-01

    The performance of laboratory tests to diagnose pulmonary tuberculosis is dependent on the quality of the sputum sample tested. The relative merits of sputum collection methods to improve tuberculosis diagnosis are poorly characterised. We therefore aimed to investigate the effects of sputum collection methods on tuberculosis diagnosis. We did a systematic review and meta-analysis to investigate whether non-invasive sputum collection methods in people aged at least 12 years improve the diagnostic performance of laboratory testing for pulmonary tuberculosis. We searched PubMed, Google Scholar, ProQuest, Web of Science, CINAHL, and Embase up to April 14, 2017, to identify relevant experimental, case-control, or cohort studies. We analysed data by pairwise meta-analyses with a random-effects model and by network meta-analysis. All diagnostic performance data were calculated at the sputum-sample level, except where authors only reported data at the individual patient-level. Heterogeneity was assessed, with potential causes identified by logistic meta-regression. We identified 23 eligible studies published between 1959 and 2017, involving 8967 participants who provided 19 252 sputum samples. Brief, on-demand spot sputum collection was the main reference standard. Pooled sputum collection increased tuberculosis diagnosis by microscopy (odds ratio [OR] 1·6, 95% CI 1·3-1·9, pmeta-analysis confirmed these findings, and revealed that both pooled and instructed spot sputum collections were similarly effective techniques for increasing the diagnostic performance of microscopy. Tuberculosis diagnoses were substantially increased by either pooled collection or by providing instruction on how to produce a sputum sample taken at any time of the day. Both interventions had a similar effect to that reported for the introduction of new, expensive laboratory tests, and therefore warrant further exploration in the drive to end the global tuberculosis epidemic. Wellcome Trust

  11. High-throughput sequence alignment using Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Trapnell Cole

    2007-12-01

    Full Text Available Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  12. Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.

    Science.gov (United States)

    Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami

    2012-08-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or 15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.

  13. Improved scFv Anti-HIV-1 p17 Binding Affinity Guided from the Theoretical Calculation of Pairwise Decomposition Energies and Computational Alanine Scanning

    Directory of Open Access Journals (Sweden)

    Panthip Tue-ngeun

    2013-01-01

    Full Text Available Computational approaches have been used to evaluate and define important residues for protein-protein interactions, especially antigen-antibody complexes. In our previous study, pairwise decomposition of residue interaction energies of single chain Fv with HIV-1 p17 epitope variants has indicated the key specific residues in the complementary determining regions (CDRs of scFv anti-p17. In this present investigation in order to determine whether a specific side chain group of residue in CDRs plays an important role in bioactivity, computational alanine scanning has been applied. Molecular dynamics simulations were done with several complexes of original scFv anti-p17 and scFv anti-p17mutants with HIV-1 p17 epitope variants with a production run up to 10 ns. With the combination of pairwise decomposition residue interaction and alanine scanning calculations, the point mutation has been initially selected at the position MET100 to improve the residue binding affinity. The calculated docking interaction energy between a single mutation from methionine to either arginine or glycine has shown the improved binding affinity, contributed from the electrostatic interaction with the negative favorably interaction energy, compared to the wild type. Theoretical calculations agreed well with the results from the peptide ELISA results.

  14. Leaf habit does not determine the investment in both physical and chemical defences and pair-wise correlations between these defensive traits.

    Science.gov (United States)

    Moreira, X; Pearse, I S

    2017-05-01

    Plant life-history strategies associated with resource acquisition and economics (e.g. leaf habit) are thought to be fundamental determinants of the traits and mechanisms that drive herbivore pressure, resource allocation to plant defensive traits, and the simultaneous expression (positive correlations) or trade-offs (negative correlations) between these defensive traits. In particular, it is expected that evergreen species - which usually grow slower and support constant herbivore pressure in comparison with deciduous species - will exhibit higher levels of both physical and chemical defences and a higher predisposition to the simultaneous expression of physical and chemical defensive traits. Here, by using a dataset which included 56 oak species (Quercus genus), we investigated whether leaf habit of plant species governs the investment in both physical and chemical defences and pair-wise correlations between these defensive traits. Our results showed that leaf habit does not determine the production of most leaf physical and chemical defences. Although evergreen oak species had higher levels of leaf toughness and specific leaf mass (physical defences) than deciduous oak species, both traits are essentially prerequisites for evergreenness. Similarly, our results also showed that leaf habit does not determine pair-wise correlations between defensive traits because most physical and chemical defensive traits were simultaneously expressed in both evergreen and deciduous oak species. Our findings indicate that leaf habit does not substantially contribute to oak species differences in plant defence investment. © 2017 German Botanical Society and The Royal Botanical Society of the Netherlands.

  15. Analysis of Multiple Genomic Sequence Alignments: A Web Resource, Online Tools, and Lessons Learned From Analysis of Mammalian SCL Loci

    Science.gov (United States)

    Chapman, Michael A.; Donaldson, Ian J.; Gilbert, James; Grafham, Darren; Rogers, Jane; Green, Anthony R.; Göttgens, Berthold

    2004-01-01

    Comparative analysis of genomic sequences is becoming a standard technique for studying gene regulation. However, only a limited number of tools are currently available for the analysis of multiple genomic sequences. An extensive data set for the testing and training of such tools is provided by the SCL gene locus. Here we have expanded the data set to eight vertebrate species by sequencing the dog SCL locus and by annotating the dog and rat SCL loci. To provide a resource for the bioinformatics community, all SCL sequences and functional annotations, comprising a collation of the extensive experimental evidence pertaining to SCL regulation, have been made available via a Web server. A Web interface to new tools specifically designed for the display and analysis of multiple sequence alignments was also implemented. The unique SCL data set and new sequence comparison tools allowed us to perform a rigorous examination of the true benefits of multiple sequence comparisons. We demonstrate that multiple sequence alignments are, overall, superior to pairwise alignments for identification of mammalian regulatory regions. In the search for individual transcription factor binding sites, multiple alignments markedly increase the signal-to-noise ratio compared to pairwise alignments. PMID:14718377

  16. Sequence determination and analysis of the NSs genes of two tospoviruses.

    Science.gov (United States)

    Hallwass, Mariana; Leastro, Mikhail O; Lima, Mirtes F; Inoue-Nagata, Alice K; Resende, Renato O

    2012-03-01

    The tospoviruses groundnut ringspot virus (GRSV) and zucchini lethal chlorosis virus (ZLCV) cause severe losses in many crops, especially in solanaceous and cucurbit species. In this study, the non-structural NSs gene and the 5'UTRs of these two biologically distinct tospoviruses were cloned and sequenced. The NSs sequence of GRSV and ZLCV were both 1,404 nucleotides long. Pairwise comparison showed that the NSs amino acid sequence of GRSV shared 69.6% identity with that of ZLCV and 75.9% identity with that of TSWV, while the NSs sequence of ZLCV and TSWV shared 67.9% identity. Phylogenetic analysis based on NSs sequences confirmed that these viruses cluster in the American clade.

  17. On calculating the probability of a set of orthologous sequences

    Directory of Open Access Journals (Sweden)

    Junfeng Liu

    2009-02-01

    Full Text Available Junfeng Liu1,2, Liang Chen3, Hongyu Zhao4, Dirk F Moore1,2, Yong Lin1,2, Weichung Joe Shih1,21Biometrics Division, The Cancer, Institute of New Jersey, New Brunswick, NJ, USA; 2Department of Biostatistics, School of Public Health, University of Medicine and Dentistry of New Jersey, Piscataway, NJ, USA; 3Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA; 4Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT, USAAbstract: Probabilistic DNA sequence models have been intensively applied to genome research. Within the evolutionary biology framework, this article investigates the feasibility for rigorously estimating the probability of a set of orthologous DNA sequences which evolve from a common progenitor. We propose Monte Carlo integration algorithms to sample the unknown ancestral and/or root sequences a posteriori conditional on a reference sequence and apply pairwise Needleman–Wunsch alignment between the sampled and nonreference species sequences to estimate the probability. We test our algorithms on both simulated and real sequences and compare calculated probabilities from Monte Carlo integration to those induced by single multiple alignment.Keywords: evolution, Jukes–Cantor model, Monte Carlo integration, Needleman–Wunsch alignment, orthologous

  18. Whole-Genome Sequences of Two Carbapenem-Resistant Klebsiella quasipneumoniae Strains Isolated from a Tertiary Hospital in Johor, Malaysia.

    Science.gov (United States)

    Gan, Han Ming; Rajasekaram, Ganeswrie; Eng, Wilhelm Wei Han; Kaniappan, Priyatharisni; Dhanoa, Amreeta

    2017-08-10

    We report the whole-genome sequences of two carbapenem-resistant clinical isolates of Klebsiella quasipneumoniae subsp. similipneumoniae obtained from two different patients. Both strains contained three different extended-spectrum β-lactamase genes and showed strikingly high pairwise average nucleotide identity of 99.99% despite being isolated 3 years apart from the same hospital. Copyright © 2017 Gan et al.

  19. Scoring protein relationships in functional interaction networks predicted from sequence data.

    Directory of Open Access Journals (Sweden)

    Gaston K Mazandu

    Full Text Available UNLABELLED: The abundance of diverse biological data from various sources constitutes a rich source of knowledge, which has the power to advance our understanding of organisms. This requires computational methods in order to integrate and exploit these data effectively and elucidate local and genome wide functional connections between protein pairs, thus enabling functional inferences for uncharacterized proteins. These biological data are primarily in the form of sequences, which determine functions, although functional properties of a protein can often be predicted from just the domains it contains. Thus, protein sequences and domains can be used to predict protein pair-wise functional relationships, and thus contribute to the function prediction process of uncharacterized proteins in order to ensure that knowledge is gained from sequencing efforts. In this work, we introduce information-theoretic based approaches to score protein-protein functional interaction pairs predicted from protein sequence similarity and conserved protein signature matches. The proposed schemes are effective for data-driven scoring of connections between protein pairs. We applied these schemes to the Mycobacterium tuberculosis proteome to produce a homology-based functional network of the organism with a high confidence and coverage. We use the network for predicting functions of uncharacterised proteins. AVAILABILITY: Protein pair-wise functional relationship scores for Mycobacterium tuberculosis strain CDC1551 sequence data and python scripts to compute these scores are available at http://web.cbio.uct.ac.za/~gmazandu/scoringschemes.

  20. Shotgun protein sequencing.

    Energy Technology Data Exchange (ETDEWEB)

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  1. A pairwise unit-root-test based approach to investigating convergence of household debts in South Africa and the United States

    Directory of Open Access Journals (Sweden)

    Ntebogang Dinah Moroke

    2015-05-01

    Full Text Available The purpose of this paper was to test convergence of household debts in the United States and South Africa taking a pairwise unit root tests based approaches into account. Substantial number of studies dealt with convergence of several macroeconomic variables but to my knowledge no study considered this subject with respect to household debts of the identified countries. Quarterly data on household debts consisting of 88 observations in the South Africa and United States spanning the period 1990 to 2013 was collected from the South African and St. Louis Federal Reserve Banks. Focused on the absolute value of household debts, this study proved that South Africa is far from catching-up with the United States in terms of overcoming household debts for the selected period. The findings of this study can be used by relevant authorities to help improve ways and means of dealing with household debts South Africa

  2. A spreadsheet template compatible with Microsoft Excel and iWork Numbers that returns the simultaneous confidence intervals for all pairwise differences between multiple sample means.

    Science.gov (United States)

    Brown, Angus M

    2010-04-01

    The objective of the method described in this paper is to develop a spreadsheet template for the purpose of comparing multiple sample means. An initial analysis of variance (ANOVA) test on the data returns F--the test statistic. If F is larger than the critical F value drawn from the F distribution at the appropriate degrees of freedom, convention dictates rejection of the null hypothesis and allows subsequent multiple comparison testing to determine where the inequalities between the sample means lie. A variety of multiple comparison methods are described that return the 95% confidence intervals for differences between means using an inclusive pairwise comparison of the sample means. 2009 Elsevier Ireland Ltd. All rights reserved.

  3. Multimodal sequence learning.

    Science.gov (United States)

    Kemény, Ferenc; Meier, Beat

    2016-02-01

    While sequence learning research models complex phenomena, previous studies have mostly focused on unimodal sequences. The goal of the current experiment is to put implicit sequence learning into a multimodal context: to test whether it can operate across different modalities. We used the Task Sequence Learning paradigm to test whether sequence learning varies across modalities, and whether participants are able to learn multimodal sequences. Our results show that implicit sequence learning is very similar regardless of the source modality. However, the presence of correlated task and response sequences was required for learning to take place. The experiment provides new evidence for implicit sequence learning of abstract conceptual representations. In general, the results suggest that correlated sequences are necessary for implicit sequence learning to occur. Moreover, they show that elements from different modalities can be automatically integrated into one unitary multimodal sequence. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Sequence Read Archive (SRA)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome...

  5. Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.

    Science.gov (United States)

    Rahn, René; Budach, Stefan; Costanza, Pascal; Ehrhardt, Marcel; Hancox, Jonny; Reinert, Knut

    2018-05-03

    Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. rene.rahn@fu-berlin.de.

  6. Enthalpic pairwise self-association of L-carnitine in aqueous solutions of some alkali halides at T = 298.15 K

    International Nuclear Information System (INIS)

    Wang, Hua-Qin; Cheng, Wei-Na; Zhu, Li-Yuan; Hu, Xin-Gen

    2016-01-01

    Highlights: • Dilution enthalpies of L-carnitine in aqueous alkali halide solutions by ITC. • The second virial coefficients of enthalpy (h_2) have been calculated. • The values of h_2 increase with increasing molalities of aqueous salt solutions. • The signs of h_2 turn from negative in pure water to positive in salt solutions. • The trends is ascribed to the salt effects on pairwise self-associations. - Abstract: Knowledge of the influence of ions of various nature on intermolecular hydrophilic and hydrophobic interactions in solutions is required in many research fields. In this paper, dilution enthalpies of zwitterion L-carnitine in aqueous NaCl, KCl and NaBr solutions of various molalities (b = 0 to 3.0 mol · kg"−"1) have been determined respectively at T = (298.15 ± 0.01) K and p = (0.100 ± 0.005) MPa by isothermal titration calorimetry (ITC). In light of the MacMillan–Mayer theory, the 2nd virial enthalpic coefficients (h_2) have been calculated. The h_2 coefficients increase gradually with increasing molality (b) of the three aqueous alkali halides solutions, from small negative values in pure water to relatively larger positive values in solution. The trends of h_2 coefficients are ascribed to the salt effects on the balance between hydrophilic and hydrophobic interactions in pairwise self-associations. It is considered that the size of cations and anions exert influences on h_2 coefficients through their surface charge densities and hydration (or dehydration) abilities.

  7. Fast-GPU-PCC: A GPU-Based Technique to Compute Pairwise Pearson's Correlation Coefficients for Time Series Data-fMRI Study.

    Science.gov (United States)

    Eslami, Taban; Saeed, Fahad

    2018-04-20

    Functional magnetic resonance imaging (fMRI) is a non-invasive brain imaging technique, which has been regularly used for studying brain’s functional activities in the past few years. A very well-used measure for capturing functional associations in brain is Pearson’s correlation coefficient. Pearson’s correlation is widely used for constructing functional network and studying dynamic functional connectivity of the brain. These are useful measures for understanding the effects of brain disorders on connectivities among brain regions. The fMRI scanners produce huge number of voxels and using traditional central processing unit (CPU)-based techniques for computing pairwise correlations is very time consuming especially when large number of subjects are being studied. In this paper, we propose a graphics processing unit (GPU)-based algorithm called Fast-GPU-PCC for computing pairwise Pearson’s correlation coefficient. Based on the symmetric property of Pearson’s correlation, this approach returns N ( N − 1 ) / 2 correlation coefficients located at strictly upper triangle part of the correlation matrix. Storing correlations in a one-dimensional array with the order as proposed in this paper is useful for further usage. Our experiments on real and synthetic fMRI data for different number of voxels and varying length of time series show that the proposed approach outperformed state of the art GPU-based techniques as well as the sequential CPU-based versions. We show that Fast-GPU-PCC runs 62 times faster than CPU-based version and about 2 to 3 times faster than two other state of the art GPU-based methods.

  8. Utility of RNA Sequencing for Analysis of Maize Reproductive Transcriptomes

    Directory of Open Access Journals (Sweden)

    Rebecca M. Davidson

    2011-11-01

    Full Text Available Transcriptome sequencing is a powerful method for studying global expression patterns in large, complex genomes. Evaluation of sequence-based expression profiles during reproductive development would provide functional annotation to genes underlying agronomic traits. We generated transcriptome profiles for 12 diverse maize ( L. reproductive tissues representing male, female, developing seed, and leaf tissues using high throughput transcriptome sequencing. Overall, ∼80% of annotated genes were expressed. Comparative analysis between sequence and hybridization-based methods demonstrated the utility of ribonucleic acid sequencing (RNA-seq for expression determination and differentiation of paralagous genes (∼85% of maize genes. Analysis of 4975 gene families across reproductive tissues revealed expression divergence is proportional to family size. In all pairwise comparisons between tissues, 7 (pre- vs. postemergence cobs to 48% (pollen vs. ovule of genes were differentially expressed. Genes with expression restricted to a single tissue within this study were identified with the highest numbers observed in leaves, endosperm, and pollen. Coexpression network analysis identified 17 gene modules with complex and shared expression patterns containing many previously described maize genes. The data and analyses in this study provide valuable tools through improved gene annotation, gene family characterization, and a core set of candidate genes to further characterize maize reproductive development and improve grain yield potential.

  9. Solving Classification Problems for Large Sets of Protein Sequences with the Example of Hox and ParaHox Proteins

    Directory of Open Access Journals (Sweden)

    Stefanie D. Hueber

    2016-02-01

    Full Text Available Phylogenetic methods are key to providing models for how a given protein family evolved. However, these methods run into difficulties when sequence divergence is either too low or too high. Here, we provide a case study of Hox and ParaHox proteins so that additional insights can be gained using a new computational approach to help solve old classification problems. For two (Gsx and Cdx out of three ParaHox proteins the assignments differ between the currently most established view and four alternative scenarios. We use a non-phylogenetic, pairwise-sequence-similarity-based method to assess which of the previous predictions, if any, are best supported by the sequence-similarity relationships between Hox and ParaHox proteins. The overall sequence-similarities show Gsx to be most similar to Hox2–3, and Cdx to be most similar to Hox4–8. The results indicate that a purely pairwise-sequence-similarity-based approach can provide additional information not only when phylogenetic inference methods have insufficient information to provide reliable classifications (as was shown previously for central Hox proteins, but also when the sequence variation is so high that the resulting phylogenetic reconstructions are likely plagued by long-branch-attraction artifacts.

  10. The complete chloroplast genome sequence of Abies nephrolepis (Pinaceae: Abietoideae

    Directory of Open Access Journals (Sweden)

    Dong-Keun Yi

    2016-06-01

    Full Text Available The plant chloroplast (cp genome has maintained a relatively conserved structure and gene content throughout evolution. Cp genome sequences have been used widely for resolving evolutionary and phylogenetic issues at various taxonomic levels of plants. Here, we report the complete cp genome of Abies nephrolepis. The A. nephrolepis cp genome is 121,336 base pairs (bp in length including a pair of short inverted repeat regions (IRa and IRb of 139 bp each separated by a small single copy (SSC region of 54,323 bp (SSC and a large single copy region of 66,735 bp (LSC. It contains 114 genes, 68 of which are protein coding genes, 35 tRNA and four rRNA genes, six open reading frames, and one pseudogene. Seventeen repeat units and 64 simple sequence repeats (SSR have been detected in A. nephrolepis cp genome. Large IR sequences locate in 42-kb inversion points (1186 bp. The A. nephrolepis cp genome is identical to Abies koreana’s which is closely related to taxa. Pairwise comparison between two cp genomes revealed 140 polymorphic sites in each. Complete cp genome sequence of A. nephrolepis has a significant potential to provide information on the evolutionary pattern of Abietoideae and valuable data for development of DNA markers for easy identification and classification.

  11. Generation and analysis of expressed sequence tags from Botrytis cinerea

    Directory of Open Access Journals (Sweden)

    EVELYN SILVA

    2006-01-01

    Full Text Available Botrytis cinerea is a filamentous plant pathogen of a wide range of plant species, and its infection may cause enormous damage both during plant growth and in the post-harvest phase. We have constructed a cDNA library from an isolate of B. cinerea and have sequenced 11,482 expressed sequence tags that were assembled into 1,003 contigs sequences and 3,032 singletons. Approximately 81% of the unigenes showed significant similarity to genes coding for proteins with known functions: more than 50% of the sequences code for genes involved in cellular metabolism, 12% for transport of metabolites, and approximately 10% for cellular organization. Other functional categories include responses to biotic and abiotic stimuli, cell communication, cell homeostasis, and cell development. We carried out pair-wise comparisons with fungal databases to determine the B. cinerea unisequence set with relevant similarity to genes in other fungal pathogenic counterparts. Among the 4,035 non-redundant B. cinerea unigenes, 1,338 (23% have significant homology with Fusarium verticillioides unigenes. Similar values were obtained for Saccharomyces cerevisiae and Aspergillus nidulans (22% and 24%, respectively. The lower percentages of homology were with Magnaporthe grisae and Neurospora crassa (13% and 19%, respectively. Several genes involved in putative and known fungal virulence and general pathogenicity were identified. The results provide important information for future research on this fungal pathogen

  12. Conservation patterns in different functional sequence categoriesof divergent Drosophila species

    Energy Technology Data Exchange (ETDEWEB)

    Papatsenko, Dmitri; Kislyuk, Andrey; Levine, Michael; Dubchak, Inna

    2005-10-01

    We have explored the distributions of fully conservedungapped blocks in genome-wide pairwise alignments of recently completedspecies of Drosophila: D.yakuba, D.ananassae, D.pseudoobscura, D.virilisand D.mojavensis. Based on these distributions we have found that nearlyevery functional sequence category possesses its own distinctiveconservation pattern, sometimes independent of the overall sequenceconservation level. In the coding and regulatory regions, the ungappedblocks were longer than in introns, UTRs and non-functional sequences. Atthe same time, the blocks in the coding regions carried 3N+2 signaturecharacteristic to synonymic substitutions in the 3rd codon positions.Larger block sizes in transcription regulatory regions can be explainedby the presence of conserved arrays of binding sites for transcriptionfactors. We also have shown that the longest ungapped blocks, or'ultraconserved' sequences, are associated with specific gene groups,including those encoding ion channels and components of the cytoskeleton.We discussed how restrained conservation patterns may help in mappingfunctional sequence categories and improving genomeannotation.

  13. Nonparametric combinatorial sequence models.

    Science.gov (United States)

    Wauthier, Fabian L; Jordan, Michael I; Jojic, Nebojsa

    2011-11-01

    This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This article presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three biological sequence families which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution over sequence representations induced by the prior. By integrating out the posterior, our method compares favorably to leading binding predictors.

  14. Evolutionary analysis of hepatitis C virus gene sequences from 1953

    Science.gov (United States)

    Gray, Rebecca R.; Tanaka, Yasuhito; Takebe, Yutaka; Magiorkinis, Gkikas; Buskell, Zelma; Seeff, Leonard; Alter, Harvey J.; Pybus, Oliver G.

    2013-01-01

    Reconstructing the transmission history of infectious diseases in the absence of medical or epidemiological records often relies on the evolutionary analysis of pathogen genetic sequences. The precision of evolutionary estimates of epidemic history can be increased by the inclusion of sequences derived from ‘archived’ samples that are genetically distinct from contemporary strains. Historical sequences are especially valuable for viral pathogens that circulated for many years before being formally identified, including HIV and the hepatitis C virus (HCV). However, surprisingly few HCV isolates sampled before discovery of the virus in 1989 are currently available. Here, we report and analyse two HCV subgenomic sequences obtained from infected individuals in 1953, which represent the oldest genetic evidence of HCV infection. The pairwise genetic diversity between the two sequences indicates a substantial period of HCV transmission prior to the 1950s, and their inclusion in evolutionary analyses provides new estimates of the common ancestor of HCV in the USA. To explore and validate the evolutionary information provided by these sequences, we used a new phylogenetic molecular clock method to estimate the date of sampling of the archived strains, plus the dates of four more contemporary reference genomes. Despite the short fragments available, we conclude that the archived sequences are consistent with a proposed sampling date of 1953, although statistical uncertainty is large. Our cross-validation analyses suggest that the bias and low statistical power observed here likely arise from a combination of high evolutionary rate heterogeneity and an unstructured, star-like phylogeny. We expect that attempts to date other historical viruses under similar circumstances will meet similar problems. PMID:23938759

  15. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  16. Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data.

    Science.gov (United States)

    Polanski, A; Kimmel, M; Chakraborty, R

    1998-05-12

    Distribution of pairwise differences of nucleotides from data on a sample of DNA sequences from a given segment of the genome has been used in the past to draw inferences about the past history of population size changes. However, all earlier methods assume a given model of population size changes (such as sudden expansion), parameters of which (e.g., time and amplitude of expansion) are fitted to the observed distributions of nucleotide differences among pairwise comparisons of all DNA sequences in the sample. Our theory indicates that for any time-dependent population size, N(tau) (in which time tau is counted backward from present), a time-dependent coalescence process yields the distribution, p(tau), of the time of coalescence between two DNA sequences randomly drawn from the population. Prediction of p(tau) and N(tau) requires the use of a reverse Laplace transform known to be unstable. Nevertheless, simulated data obtained from three models of monotone population change (stepwise, exponential, and logistic) indicate that the pattern of a past population size change leaves its signature on the pattern of DNA polymorphism. Application of the theory to the published mtDNA sequences indicates that the current mtDNA sequence variation is not inconsistent with a logistic growth of the human population.

  17. Structured prediction models for RNN based sequence labeling in clinical text.

    Science.gov (United States)

    Jagannatha, Abhyuday N; Yu, Hong

    2016-11-01

    Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.

  18. Long sequence correlation coprocessor

    Science.gov (United States)

    Gage, Douglas W.

    1994-09-01

    A long sequence correlation coprocessor (LSCC) accelerates the bitwise correlation of arbitrarily long digital sequences by calculating in parallel the correlation score for 16, for example, adjacent bit alignments between two binary sequences. The LSCC integrated circuit is incorporated into a computer system with memory storage buffers and a separate general purpose computer processor which serves as its controller. Each of the LSCC's set of sequential counters simultaneously tallies a separate correlation coefficient. During each LSCC clock cycle, computer enable logic associated with each counter compares one bit of a first sequence with one bit of a second sequence to increment the counter if the bits are the same. A shift register assures that the same bit of the first sequence is simultaneously compared to different bits of the second sequence to simultaneously calculate the correlation coefficient by the different counters to represent different alignments of the two sequences.

  19. Roles of repetitive sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bell, G.I.

    1991-12-31

    The DNA of higher eukaryotes contains many repetitive sequences. The study of repetitive sequences is important, not only because many have important biological function, but also because they provide information on genome organization, evolution and dynamics. In this paper, I will first discuss some generic effects that repetitive sequences will have upon genome dynamics and evolution. In particular, it will be shown that repetitive sequences foster recombination among, and turnover of, the elements of a genome. I will then consider some examples of repetitive sequences, notably minisatellite sequences and telomere sequences as examples of tandem repeats, without and with respectively known function, and Alu sequences as an example of interspersed repeats. Some other examples will also be considered in less detail.

  20. Anomaly Detection in Sequences

    Data.gov (United States)

    National Aeronautics and Space Administration — We present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that...

  1. DNA sequencing conference, 2

    Energy Technology Data Exchange (ETDEWEB)

    Cook-Deegan, R.M. [Georgetown Univ., Kennedy Inst. of Ethics, Washington, DC (United States); Venter, J.C. [National Inst. of Neurological Disorders and Strokes, Bethesda, MD (United States); Gilbert, W. [Harvard Univ., Cambridge, MA (United States); Mulligan, J. [Stanford Univ., CA (United States); Mansfield, B.K. [Oak Ridge National Lab., TN (United States)

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  2. sequenceMiner algorithm

    Data.gov (United States)

    National Aeronautics and Space Administration — Detecting and describing anomalies in large repositories of discrete symbol sequences. sequenceMiner has been open-sourced! Download the file below to try it out....

  3. Pairwise harmonics for shape analysis

    KAUST Repository

    Zheng, Youyi; Tai, Chiewlan; Zhang, Eugene; Xu, Pengfei

    2013-01-01

    efficient algorithms than the state-of-the-art methods for three applications: intrinsic reflectional symmetry axis computation, matching shape extremities, and simultaneous surface segmentation and skeletonization. © 2012 IEEE.

  4. Risk of Breast Cancer with CXCR4-using HIV Defined by V3-Loop Sequencing

    Science.gov (United States)

    Goedert, James J.; Swenson, Luke C.; Napolitano, Laura A.; Haddad, Mojgan; Anastos, Kathryn; Minkoff, Howard; Young, Mary; Levine, Alexandra; Adeyemi, Oluwatoyin; Seaberg, Eric C.; Aouizerat, Bradley; Rabkin, Charles S.; Harrigan, P. Richard; Hessol, Nancy A.

    2014-01-01

    Objective Evaluate the risk of female breast cancer associated with HIV-CXCR4 (X4) tropism as determined by various genotypic measures. Methods A breast cancer case-control study, with pairwise comparisons of tropism determination methods, was conducted. From the Women's Interagency HIV Study repository, one stored plasma specimen was selected from 25 HIV-infected cases near the breast cancer diagnosis date and 75 HIV-infected control women matched for age and calendar date. HIVgp120-V3 sequences were derived by Sanger population sequencing (PS) and 454-pyro deep sequencing (DS). Sequencing-based HIV-X4 tropism was defined using the geno2pheno algorithm, with both high-stringency DS [False-Positive-Rate (FPR 3.5) and 2% X4 cutoff], and lower stringency DS (FPR 5.75, 15% X4 cut-off). Concordance of tropism results by PS, DS, and previously performed phenotyping was assessed with kappa (κ) statistics. Case-control comparisons used exact P-values and conditional logistic regression. Results In 74 women (19 cases, 55 controls) with complete results, prevalence of HIV-X4 by PS was 5% in cases vs 29% in controls (P=0.06, odds ratio 0.14, confidence interval 0.003-1.03). Smaller case-control prevalence differences were found with high-stringency DS (21% vs 36%, P=0.32), lower-stringency DS (16% vs 35%, P=0.18), and phenotyping (11% vs 31%, P=0.10). HIV-X4-tropism concordance was best between PS and lower-stringency DS (93%, κ=0.83). Other pairwise concordances were 82%-92% (κ=0.56-0.81). Concordance was similar among cases and controls. Conclusions HIV-X4 defined by population sequencing (PS) had good agreement with lower stringency deep sequencing and was significantly associated with lower odds of breast cancer. PMID:25321183

  5. K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

    Science.gov (United States)

    Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

    2018-05-15

    Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.

  6. Determination of genetic relatedness from low-coverage human genome sequences using pedigree simulations.

    Science.gov (United States)

    Martin, Michael D; Jay, Flora; Castellano, Sergi; Slatkin, Montgomery

    2017-08-01

    We develop and evaluate methods for inferring relatedness among individuals from low-coverage DNA sequences of their genomes, with particular emphasis on sequences obtained from fossil remains. We suggest the major factors complicating the determination of relatedness among ancient individuals are sequencing depth, the number of overlapping sites, the sequencing error rate and the presence of contamination from present-day genetic sources. We develop a theoretical model that facilitates the exploration of these factors and their relative effects, via measurement of pairwise genetic distances, without calling genotypes, and determine the power to infer relatedness under various scenarios of varying sequencing depth, present-day contamination and sequencing error. The model is validated by a simulation study as well as the analysis of aligned sequences from present-day human genomes. We then apply the method to the recently published genome sequences of ancient Europeans, developing a statistical treatment to determine confidence in assigned relatedness that is, in some cases, more precise than previously reported. As the majority of ancient specimens are from animals, this method would be applicable to investigate kinship in nonhuman remains. The developed software grups (Genetic Relatedness Using Pedigree Simulations) is implemented in Python and freely available. © 2017 John Wiley & Sons Ltd.

  7. BioWord: A sequence manipulation suite for Microsoft Word

    Directory of Open Access Journals (Sweden)

    Anzaldi Laura J

    2012-06-01

    Full Text Available Abstract Background The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. Results BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. Conclusions BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms.

  8. BioWord: A sequence manipulation suite for Microsoft Word

    Science.gov (United States)

    2012-01-01

    Background The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. Results BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA) as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. Conclusions BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms. PMID:22676326

  9. Statistical method to compare massive parallel sequencing pipelines.

    Science.gov (United States)

    Elsensohn, M H; Leblay, N; Dimassi, S; Campan-Fournier, A; Labalme, A; Roucher-Boulez, F; Sanlaville, D; Lesca, G; Bardel, C; Roy, P

    2017-03-01

    Today, sequencing is frequently carried out by Massive Parallel Sequencing (MPS) that cuts drastically sequencing time and expenses. Nevertheless, Sanger sequencing remains the main validation method to confirm the presence of variants. The analysis of MPS data involves the development of several bioinformatic tools, academic or commercial. We present here a statistical method to compare MPS pipelines and test it in a comparison between an academic (BWA-GATK) and a commercial pipeline (TMAP-NextGENe®), with and without reference to a gold standard (here, Sanger sequencing), on a panel of 41 genes in 43 epileptic patients. This method used the number of variants to fit log-linear models for pairwise agreements between pipelines. To assess the heterogeneity of the margins and the odds ratios of agreement, four log-linear models were used: a full model, a homogeneous-margin model, a model with single odds ratio for all patients, and a model with single intercept. Then a log-linear mixed model was fitted considering the biological variability as a random effect. Among the 390,339 base-pairs sequenced, TMAP-NextGENe® and BWA-GATK found, on average, 2253.49 and 1857.14 variants (single nucleotide variants and indels), respectively. Against the gold standard, the pipelines had similar sensitivities (63.47% vs. 63.42%) and close but significantly different specificities (99.57% vs. 99.65%; p < 0.001). Same-trend results were obtained when only single nucleotide variants were considered (99.98% specificity and 76.81% sensitivity for both pipelines). The method allows thus pipeline comparison and selection. It is generalizable to all types of MPS data and all pipelines.

  10. BioWord: a sequence manipulation suite for Microsoft Word.

    Science.gov (United States)

    Anzaldi, Laura J; Muñoz-Fernández, Daniel; Erill, Ivan

    2012-06-07

    The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA) as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms.

  11. The complete genome sequence of the Atlantic salmon paramyxovirus (ASPV)

    International Nuclear Information System (INIS)

    Nylund, Stian; Karlsen, Marius; Nylund, Are

    2008-01-01

    The complete RNA genome of the Atlantic salmon paramyxovirus (ASPV), isolated from Atlantic salmon suffering from proliferative gill inflammation (PGI), has been determined. The genome is 16,965 nucleotides in length and consists of six nonoverlapping genes in the order 3'- N - P/C/V - M - F - HN - L -5', coding for the nucleocapsid, phospho-, matrix, fusion, hemagglutinin-neuraminidase and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and trinucleotide intergenic regions similar to those of other Paramyxoviridae. The ASPV P-gene expression strategy is like that of the respiro- and morbilliviruses, which express the phosphoprotein from the primary transcript, and edit a portion of the mRNA to encode the accessory proteins V and W. It also encodes the C-protein by ribosomal choice of translation initiation. Pairwise comparisons of amino acid identities, and phylogenetic analysis of deduced ASPV protein sequences with homologous sequences from other Paramyxoviridae, show that ASPV has an affinity for the genus Respirovirus, but may represent a new genus within the subfamily Paramyxovirinae

  12. CAFE: aCcelerated Alignment-FrEe sequence analysis.

    Science.gov (United States)

    Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu

    2017-07-03

    Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Investigation of the range of validity of the pairwise summation method applied to the calculation of the surface roughness correction to the van der Waals force

    Science.gov (United States)

    Gusso, André; Burnham, Nancy A.

    2016-09-01

    It has long been recognized that stochastic surface roughness can considerably change the van der Waals (vdW) force between interacting surfaces and particles. However, few analytical expressions for the vdW force between rough surfaces have been presented in the literature. Because they have been derived using perturbative methods or the proximity force approximation the expressions are valid when the roughness correction is small and for a limited range of roughness parameters and surface separation. In this work, a nonperturbative approach, the effective density method (EDM) is proposed to circumvent some of these limitations. The method simplifies the calculations of the roughness correction based on pairwise summation (PWS), and allows us to derive simple expressions for the vdW force and energy between two semispaces covered with stochastic rough surfaces. Because the range of applicability of PWS and, therefore, of our results, are not known a priori, we compare the predictions based on the EDM with those based on the multilayer effective medium model, whose range of validity can be defined more properly and which is valid when the roughness correction is comparatively large. We conclude that the PWS can be used for roughness characterized by a correlation length of the order of its rms amplitude, when this amplitude is of the order of or smaller than a few nanometers, and only for typically insulating materials such as silicon dioxide, silicon nitride, diamond, and certain glasses, polymers and ceramics. The results are relevant for the correct modeling of systems where the vdW force can play a significant role such as micro and nanodevices, for the calculation of the tip-sample force in atomic force microscopy, and in problems involving adhesion.

  14. Enhancing pairwise state-transition weights: A new weighting scheme in simulated tempering that can minimize transition time between a pair of conformational states

    Science.gov (United States)

    Qiao, Qin; Zhang, Hou-Dao; Huang, Xuhui

    2016-04-01

    Simulated tempering (ST) is a widely used enhancing sampling method for Molecular Dynamics simulations. As one expanded ensemble method, ST is a combination of canonical ensembles at different temperatures and the acceptance probability of cross-temperature transitions is determined by both the temperature difference and the weights of each temperature. One popular way to obtain the weights is to adopt the free energy of each canonical ensemble, which achieves uniform sampling among temperature space. However, this uniform distribution in temperature space may not be optimal since high temperatures do not always speed up the conformational transitions of interest, as anti-Arrhenius kinetics are prevalent in protein and RNA folding. Here, we propose a new method: Enhancing Pairwise State-transition Weights (EPSW), to obtain the optimal weights by minimizing the round-trip time for transitions among different metastable states at the temperature of interest in ST. The novelty of the EPSW algorithm lies in explicitly considering the kinetics of conformation transitions when optimizing the weights of different temperatures. We further demonstrate the power of EPSW in three different systems: a simple two-temperature model, a two-dimensional model for protein folding with anti-Arrhenius kinetics, and the alanine dipeptide. The results from these three systems showed that the new algorithm can substantially accelerate the transitions between conformational states of interest in the ST expanded ensemble and further facilitate the convergence of thermodynamics compared to the widely used free energy weights. We anticipate that this algorithm is particularly useful for studying functional conformational changes of biological systems where the initial and final states are often known from structural biology experiments.

  15. Enhancing pairwise state-transition weights: A new weighting scheme in simulated tempering that can minimize transition time between a pair of conformational states

    International Nuclear Information System (INIS)

    Qiao, Qin; Zhang, Hou-Dao; Huang, Xuhui

    2016-01-01

    Simulated tempering (ST) is a widely used enhancing sampling method for Molecular Dynamics simulations. As one expanded ensemble method, ST is a combination of canonical ensembles at different temperatures and the acceptance probability of cross-temperature transitions is determined by both the temperature difference and the weights of each temperature. One popular way to obtain the weights is to adopt the free energy of each canonical ensemble, which achieves uniform sampling among temperature space. However, this uniform distribution in temperature space may not be optimal since high temperatures do not always speed up the conformational transitions of interest, as anti-Arrhenius kinetics are prevalent in protein and RNA folding. Here, we propose a new method: Enhancing Pairwise State-transition Weights (EPSW), to obtain the optimal weights by minimizing the round-trip time for transitions among different metastable states at the temperature of interest in ST. The novelty of the EPSW algorithm lies in explicitly considering the kinetics of conformation transitions when optimizing the weights of different temperatures. We further demonstrate the power of EPSW in three different systems: a simple two-temperature model, a two-dimensional model for protein folding with anti-Arrhenius kinetics, and the alanine dipeptide. The results from these three systems showed that the new algorithm can substantially accelerate the transitions between conformational states of interest in the ST expanded ensemble and further facilitate the convergence of thermodynamics compared to the widely used free energy weights. We anticipate that this algorithm is particularly useful for studying functional conformational changes of biological systems where the initial and final states are often known from structural biology experiments.

  16. Enhancing pairwise state-transition weights: A new weighting scheme in simulated tempering that can minimize transition time between a pair of conformational states

    Energy Technology Data Exchange (ETDEWEB)

    Qiao, Qin, E-mail: qqiao@ust.hk; Zhang, Hou-Dao [Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon (Hong Kong); Huang, Xuhui, E-mail: xuhuihuang@ust.hk [Department of Chemistry, Division of Biomedical Engineering, Center of Systems Biology and Human Health, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon (Hong Kong); The HKUST Shenzhen Research Institute, Shenzhen (China)

    2016-04-21

    Simulated tempering (ST) is a widely used enhancing sampling method for Molecular Dynamics simulations. As one expanded ensemble method, ST is a combination of canonical ensembles at different temperatures and the acceptance probability of cross-temperature transitions is determined by both the temperature difference and the weights of each temperature. One popular way to obtain the weights is to adopt the free energy of each canonical ensemble, which achieves uniform sampling among temperature space. However, this uniform distribution in temperature space may not be optimal since high temperatures do not always speed up the conformational transitions of interest, as anti-Arrhenius kinetics are prevalent in protein and RNA folding. Here, we propose a new method: Enhancing Pairwise State-transition Weights (EPSW), to obtain the optimal weights by minimizing the round-trip time for transitions among different metastable states at the temperature of interest in ST. The novelty of the EPSW algorithm lies in explicitly considering the kinetics of conformation transitions when optimizing the weights of different temperatures. We further demonstrate the power of EPSW in three different systems: a simple two-temperature model, a two-dimensional model for protein folding with anti-Arrhenius kinetics, and the alanine dipeptide. The results from these three systems showed that the new algorithm can substantially accelerate the transitions between conformational states of interest in the ST expanded ensemble and further facilitate the convergence of thermodynamics compared to the widely used free energy weights. We anticipate that this algorithm is particularly useful for studying functional conformational changes of biological systems where the initial and final states are often known from structural biology experiments.

  17. Pegasys: software for executing and integrating analyses of biological sequences

    Directory of Open Access Journals (Sweden)

    Lett Drew

    2004-04-01

    Full Text Available Abstract Background We present Pegasys – a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools. Results The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. All non-serial dependent analyses are executed in parallel on a compute cluster for efficiency of data generation. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The modularity of the design allows for new tools to be added to the system with little programmer overhead. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries. Conclusions The Pegasys system enables biologists and bioinformaticians to create and manage sequence analysis workflows. The software is released under the Open Source GNU General Public License. All source code and documentation is available for download at http://bioinformatics.ubc.ca/pegasys/.

  18. Constructing disease-specific gene networks using pair-wise relevance metric: Application to colon cancer identifies interleukin 8, desmin and enolase 1 as the central elements

    Directory of Open Access Journals (Sweden)

    Jiang Wei

    2008-08-01

    Full Text Available Abstract Background With the advance of large-scale omics technologies, it is now feasible to reversely engineer the underlying genetic networks that describe the complex interplays of molecular elements that lead to complex diseases. Current networking approaches are mainly focusing on building genetic networks at large without probing the interaction mechanisms specific to a physiological or disease condition. The aim of this study was thus to develop such a novel networking approach based on the relevance concept, which is ideal to reveal integrative effects of multiple genes in the underlying genetic circuit for complex diseases. Results The approach started with identification of multiple disease pathways, called a gene forest, in which the genes extracted from the decision forest constructed by supervised learning of the genome-wide transcriptional profiles for patients and normal samples. Based on the newly identified disease mechanisms, a novel pair-wise relevance metric, adjusted frequency value, was used to define the degree of genetic relationship between two molecular determinants. We applied the proposed method to analyze a publicly available microarray dataset for colon cancer. The results demonstrated that the colon cancer-specific gene network captured the most important genetic interactions in several cellular processes, such as proliferation, apoptosis, differentiation, mitogenesis and immunity, which are known to be pivotal for tumourigenesis. Further analysis of the topological architecture of the network identified three known hub cancer genes [interleukin 8 (IL8 (p ≈ 0, desmin (DES (p = 2.71 × 10-6 and enolase 1 (ENO1 (p = 4.19 × 10-5], while two novel hub genes [RNA binding motif protein 9 (RBM9 (p = 1.50 × 10-4 and ribosomal protein L30 (RPL30 (p = 1.50 × 10-4] may define new central elements in the gene network specific to colon cancer. Gene Ontology (GO based analysis of the colon cancer-specific gene network and

  19. Constructing disease-specific gene networks using pair-wise relevance metric: application to colon cancer identifies interleukin 8, desmin and enolase 1 as the central elements.

    Science.gov (United States)

    Jiang, Wei; Li, Xia; Rao, Shaoqi; Wang, Lihong; Du, Lei; Li, Chuanxing; Wu, Chao; Wang, Hongzhi; Wang, Yadong; Yang, Baofeng

    2008-08-10

    With the advance of large-scale omics technologies, it is now feasible to reversely engineer the underlying genetic networks that describe the complex interplays of molecular elements that lead to complex diseases. Current networking approaches are mainly focusing on building genetic networks at large without probing the interaction mechanisms specific to a physiological or disease condition. The aim of this study was thus to develop such a novel networking approach based on the relevance concept, which is ideal to reveal integrative effects of multiple genes in the underlying genetic circuit for complex diseases. The approach started with identification of multiple disease pathways, called a gene forest, in which the genes extracted from the decision forest constructed by supervised learning of the genome-wide transcriptional profiles for patients and normal samples. Based on the newly identified disease mechanisms, a novel pair-wise relevance metric, adjusted frequency value, was used to define the degree of genetic relationship between two molecular determinants. We applied the proposed method to analyze a publicly available microarray dataset for colon cancer. The results demonstrated that the colon cancer-specific gene network captured the most important genetic interactions in several cellular processes, such as proliferation, apoptosis, differentiation, mitogenesis and immunity, which are known to be pivotal for tumourigenesis. Further analysis of the topological architecture of the network identified three known hub cancer genes [interleukin 8 (IL8) (p approximately 0), desmin (DES) (p = 2.71 x 10(-6)) and enolase 1 (ENO1) (p = 4.19 x 10(-5))], while two novel hub genes [RNA binding motif protein 9 (RBM9) (p = 1.50 x 10(-4)) and ribosomal protein L30 (RPL30) (p = 1.50 x 10(-4))] may define new central elements in the gene network specific to colon cancer. Gene Ontology (GO) based analysis of the colon cancer-specific gene network and the sub-network that

  20. Sequences for Student Investigation

    Science.gov (United States)

    Barton, Jeffrey; Feil, David; Lartigue, David; Mullins, Bernadette

    2004-01-01

    We describe two classes of sequences that give rise to accessible problems for undergraduate research. These problems may be understood with virtually no prerequisites and are well suited for computer-aided investigation. The first sequence is a variation of one introduced by Stephen Wolfram in connection with his study of cellular automata. The…

  1. Sequence History Update Tool

    Science.gov (United States)

    Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

    2008-01-01

    The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

  2. HIV Sequence Compendium 2015

    Energy Technology Data Exchange (ETDEWEB)

    Foley, Brian Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas Kenneth [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Cristian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Pennsylvania, Philadelphia, PA (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette Tina Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  3. Sequence comparison alignment-free approach based on suffix tree and L-words frequency.

    Science.gov (United States)

    Soares, Inês; Goios, Ana; Amorim, António

    2012-01-01

    The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  4. Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency

    Directory of Open Access Journals (Sweden)

    Inês Soares

    2012-01-01

    Full Text Available The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions. In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L—L-words—in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  5. Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences

    Directory of Open Access Journals (Sweden)

    Robert C. Edgar

    2018-04-01

    Full Text Available Prediction of taxonomy for marker gene sequences such as 16S ribosomal RNA (rRNA is a fundamental task in microbiology. Most experimentally observed sequences are diverged from reference sequences of authoritatively named organisms, creating a challenge for prediction methods. I assessed the accuracy of several algorithms using cross-validation by identity, a new benchmark strategy which explicitly models the variation in distances between query sequences and the closest entry in a reference database. When the accuracy of genus predictions was averaged over a representative range of identities with the reference database (100%, 99%, 97%, 95% and 90%, all tested methods had ≤50% accuracy on the currently-popular V4 region of 16S rRNA. Accuracy was found to fall rapidly with identity; for example, better methods were found to have V4 genus prediction accuracy of ∼100% at 100% identity but ∼50% at 97% identity. The relationship between identity and taxonomy was quantified as the probability that a rank is the lowest shared by a pair of sequences with a given pair-wise identity. With the V4 region, 95% identity was found to be a twilight zone where taxonomy is highly ambiguous because the probabilities that the lowest shared rank between pairs of sequences is genus, family, order or class are approximately equal.

  6. The Colliding Beams Sequencer

    International Nuclear Information System (INIS)

    Johnson, D.E.; Johnson, R.P.

    1989-01-01

    The Colliding Beam Sequencer (CBS) is a computer program used to operate the pbar-p Collider by synchronizing the applications programs and simulating the activities of the accelerator operators during filling and storage. The Sequencer acts as a meta-program, running otherwise stand alone applications programs, to do the set-up, beam transfers, acceleration, low beta turn on, and diagnostics for the transfers and storage. The Sequencer and its operational performance will be described along with its special features which include a periodic scheduler and command logger. 14 refs., 3 figs

  7. Phylogenetic Trees From Sequences

    Science.gov (United States)

    Ryvkin, Paul; Wang, Li-San

    In this chapter, we review important concepts and approaches for phylogeny reconstruction from sequence data.We first cover some basic definitions and properties of phylogenetics, and briefly explain how scientists model sequence evolution and measure sequence divergence. We then discuss three major approaches for phylogenetic reconstruction: distance-based phylogenetic reconstruction, maximum parsimony, and maximum likelihood. In the third part of the chapter, we review how multiple phylogenies are compared by consensus methods and how to assess confidence using bootstrapping. At the end of the chapter are two sections that list popular software packages and additional reading.

  8. Variability among the Most Rapidly Evolving Plastid Genomic Regions is Lineage-Specific: Implications of Pairwise Genome Comparisons in Pyrus (Rosaceae) and Other Angiosperms for Marker Choice

    Science.gov (United States)

    Ter-Voskanyan, Hasmik; Allgaier, Martin; Borsch, Thomas

    2014-01-01

    Plastid genomes exhibit different levels of variability in their sequences, depending on the respective kinds of genomic regions. Genes are usually more conserved while noncoding introns and spacers evolve at a faster pace. While a set of about thirty maximum variable noncoding genomic regions has been suggested to provide universally promising phylogenetic markers throughout angiosperms, applications often require several regions to be sequenced for many individuals. Our project aims to illuminate evolutionary relationships and species-limits in the genus Pyrus (Rosaceae)—a typical case with very low genetic distances between taxa. In this study, we have sequenced the plastid genome of Pyrus spinosa and aligned it to the already available P. pyrifolia sequence. The overall p-distance of the two Pyrus genomes was 0.00145. The intergenic spacers between ndhC–trnV, trnR–atpA, ndhF–rpl32, psbM–trnD, and trnQ–rps16 were the most variable regions, also comprising the highest total numbers of substitutions, indels and inversions (potentially informative characters). Our comparative analysis of further plastid genome pairs with similar low p-distances from Oenothera (representing another rosid), Olea (asterids) and Cymbidium (monocots) showed in each case a different ranking of genomic regions in terms of variability and potentially informative characters. Only two intergenic spacers (ndhF–rpl32 and trnK–rps16) were consistently found among the 30 top-ranked regions. We have mapped the occurrence of substitutions and microstructural mutations in the four genome pairs. High AT content in specific sequence elements seems to foster frequent mutations. We conclude that the variability among the fastest evolving plastid genomic regions is lineage-specific and thus cannot be precisely predicted across angiosperms. The often lineage-specific occurrence of stem-loop elements in the sequences of introns and spacers also governs lineage-specific mutations

  9. Variability among the most rapidly evolving plastid genomic regions is lineage-specific: implications of pairwise genome comparisons in Pyrus (Rosaceae and other angiosperms for marker choice.

    Directory of Open Access Journals (Sweden)

    Nadja Korotkova

    Full Text Available Plastid genomes exhibit different levels of variability in their sequences, depending on the respective kinds of genomic regions. Genes are usually more conserved while noncoding introns and spacers evolve at a faster pace. While a set of about thirty maximum variable noncoding genomic regions has been suggested to provide universally promising phylogenetic markers throughout angiosperms, applications often require several regions to be sequenced for many individuals. Our project aims to illuminate evolutionary relationships and species-limits in the genus Pyrus (Rosaceae-a typical case with very low genetic distances between taxa. In this study, we have sequenced the plastid genome of Pyrus spinosa and aligned it to the already available P. pyrifolia sequence. The overall p-distance of the two Pyrus genomes was 0.00145. The intergenic spacers between ndhC-trnV, trnR-atpA, ndhF-rpl32, psbM-trnD, and trnQ-rps16 were the most variable regions, also comprising the highest total numbers of substitutions, indels and inversions (potentially informative characters. Our comparative analysis of further plastid genome pairs with similar low p-distances from Oenothera (representing another rosid, Olea (asterids and Cymbidium (monocots showed in each case a different ranking of genomic regions in terms of variability and potentially informative characters. Only two intergenic spacers (ndhF-rpl32 and trnK-rps16 were consistently found among the 30 top-ranked regions. We have mapped the occurrence of substitutions and microstructural mutations in the four genome pairs. High AT content in specific sequence elements seems to foster frequent mutations. We conclude that the variability among the fastest evolving plastid genomic regions is lineage-specific and thus cannot be precisely predicted across angiosperms. The often lineage-specific occurrence of stem-loop elements in the sequences of introns and spacers also governs lineage-specific mutations. Sequencing

  10. Gomphid DNA sequence data

    Data.gov (United States)

    U.S. Environmental Protection Agency — DNA sequence data for several genetic loci. This dataset is not publicly accessible because: It's already publicly available on GenBank. It can be accessed through...

  11. Yeast genome sequencing:

    DEFF Research Database (Denmark)

    Piskur, Jure; Langkjær, Rikke Breinhold

    2004-01-01

    For decades, unicellular yeasts have been general models to help understand the eukaryotic cell and also our own biology. Recently, over a dozen yeast genomes have been sequenced, providing the basis to resolve several complex biological questions. Analysis of the novel sequence data has shown...... of closely related species helps in gene annotation and to answer how many genes there really are within the genomes. Analysis of non-coding regions among closely related species has provided an example of how to determine novel gene regulatory sequences, which were previously difficult to analyse because...... they are short and degenerate and occupy different positions. Comparative genomics helps to understand the origin of yeasts and points out crucial molecular events in yeast evolutionary history, such as whole-genome duplication and horizontal gene transfer(s). In addition, the accumulating sequence data provide...

  12. Dynamic Sequence Assignment.

    Science.gov (United States)

    1983-12-01

    D-136 548 DYNAMIIC SEQUENCE ASSIGNMENT(U) ADVANCED INFORMATION AND 1/2 DECISION SYSTEMS MOUNTAIN YIELW CA C A 0 REILLY ET AL. UNCLSSIIED DEC 83 AI/DS...I ADVANCED INFORMATION & DECISION SYSTEMS Mountain View. CA 94040 84 u ,53 V,..’. Unclassified _____ SCURITY CLASSIFICATION OF THIS PAGE REPORT...reviews some important heuristic algorithms developed for fas- ter solution of the sequence assignment problem. 3.1. DINAMIC MOGRAMUNIG FORMULATION FOR

  13. HIV Sequence Compendium 2010

    Energy Technology Data Exchange (ETDEWEB)

    Kuiken, Carla [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Foley, Brian [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Christian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Alabama, Tuscaloosa, AL (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  14. General LTE Sequence

    OpenAIRE

    Billal, Masum

    2015-01-01

    In this paper,we have characterized sequences which maintain the same property described in Lifting the Exponent Lemma. Lifting the Exponent Lemma is a very powerful tool in olympiad number theory and recently it has become very popular. We generalize it to all sequences that maintain a property like it i.e. if p^{\\alpha}||a_k and p^\\b{eta}||n, then p^{{\\alpha}+\\b{eta}}||a_{nk}.

  15. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed A.; Bonny, Talal; Salama, Khaled N.

    2012-01-01

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  16. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed A.

    2012-01-26

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  17. Enthalpic discrimination of homochiral pairwise interactions: Enantiomers of proline and hydroxyproline in (dimethyl formamide (DMF) + H2O) and (dimethylsulfoxide (DMSO) + H2O) mixtures at 298.15 K

    International Nuclear Information System (INIS)

    Hu, Xin-Gen; Liu, Jia-Min; Guo, Zheng; Liang, Hong-Yu; Jia, Zhao-Peng; Cheng, Wei-Na; Guo, Ai-Di; Zhang, He-Juan

    2013-01-01

    Highlights: • The h xx values of each α-amino acids decrease gradually with the mass fractions of cosolvents. • The absolute values of h xx of L-enantiomers are larger than D-enantiomers in the range w COS = 0 to 0.30. • The h xx values of the two proline enantiomers are all positive at each composition of mixed solvents. • When a hydrophilic hydroxyl group is introduced into proline enantiomers, the values of h xx become negative. -- Abstract: Dilution enthalpies of two pairs of α-amino acid enantiomers, namely L-proline vsD-proline, and L-hydroxyproline vsD-hydroxyproline, in water-rich regions of dimethyl formamide (DMF) + H 2 O and dimethylsulfoxide (DMSO) + H 2 O mixtures (mass fractions of cosolvents w COS = 0 to 0.30) have been determined respectively at 298.15 K by isothermal titration calorimetry (ITC). The successive values of dilution enthalpy obtained in a single run of ITC determination were used to calculate homochiral enthalpic pairwise interaction coefficients (h xx ) at the corresponding composition of mixed solvents according to the McMillan–Mayer’ statistical thermodynamic approach. The sign and magnitude of h xx were interpreted in terms of solute–solute interactions mediated by solvent and cosolvent molecules, and preferential configurations of homochiral pairwise interactions (L–L or D–D pairs) in aqueous solutions. The variations of h xx with w COS were considered to be dependent greatly on the competition equilibrium between hydrophobic and hydrophilic interactions, as well as the structural alteration of water caused by the two highly polar aprotic cosolvents (DMF and DMSO). Especially, it was found that when one of the two kinds of interactions (hydrophobic or hydrophilic interactions) preponderates over the other in solutions, enthalpic effect of homochiral pairwise interactions is always remarkable, and is characterized by a large absolute value of h xx , positive or negative, which corresponds respectively to the

  18. Development of Genetic Markers in Eucalyptus Species by Target Enrichment and Exome Sequencing

    Science.gov (United States)

    Dasgupta, Modhumita Ghosh; Dharanishanthi, Veeramuthu; Agarwal, Ishangi; Krutovsky, Konstantin V.

    2015-01-01

    The advent of next-generation sequencing has facilitated large-scale discovery, validation and assessment of genetic markers for high density genotyping. The present study was undertaken to identify markers in genes supposedly related to wood property traits in three Eucalyptus species. Ninety four genes involved in xylogenesis were selected for hybridization probe based nuclear genomic DNA target enrichment and exome sequencing. Genomic DNA was isolated from the leaf tissues and used for on-array probe hybridization followed by Illumina sequencing. The raw sequence reads were trimmed and high-quality reads were mapped to the E. grandis reference sequence and the presence of single nucleotide variants (SNVs) and insertions/ deletions (InDels) were identified across the three species. The average read coverage was 216X and a total of 2294 SNVs and 479 InDels were discovered in E. camaldulensis, 2383 SNVs and 518 InDels in E. tereticornis, and 1228 SNVs and 409 InDels in E. grandis. Additionally, SNV calling and InDel detection were conducted in pair-wise comparisons of E. tereticornis vs. E. grandis, E. camaldulensis vs. E. tereticornis and E. camaldulensis vs. E. grandis. This study presents an efficient and high throughput method on development of genetic markers for family– based QTL and association analysis in Eucalyptus. PMID:25602379

  19. Development of genetic markers in Eucalyptus species by target enrichment and exome sequencing.

    Directory of Open Access Journals (Sweden)

    Modhumita Ghosh Dasgupta

    Full Text Available The advent of next-generation sequencing has facilitated large-scale discovery, validation and assessment of genetic markers for high density genotyping. The present study was undertaken to identify markers in genes supposedly related to wood property traits in three Eucalyptus species. Ninety four genes involved in xylogenesis were selected for hybridization probe based nuclear genomic DNA target enrichment and exome sequencing. Genomic DNA was isolated from the leaf tissues and used for on-array probe hybridization followed by Illumina sequencing. The raw sequence reads were trimmed and high-quality reads were mapped to the E. grandis reference sequence and the presence of single nucleotide variants (SNVs and insertions/ deletions (InDels were identified across the three species. The average read coverage was 216X and a total of 2294 SNVs and 479 InDels were discovered in E. camaldulensis, 2383 SNVs and 518 InDels in E. tereticornis, and 1228 SNVs and 409 InDels in E. grandis. Additionally, SNV calling and InDel detection were conducted in pair-wise comparisons of E. tereticornis vs. E. grandis, E. camaldulensis vs. E. tereticornis and E. camaldulensis vs. E. grandis. This study presents an efficient and high throughput method on development of genetic markers for family- based QTL and association analysis in Eucalyptus.

  20. Main sequence mass loss

    International Nuclear Information System (INIS)

    Brunish, W.M.; Guzik, J.A.; Willson, L.A.; Bowen, G.

    1987-01-01

    It has been hypothesized that variable stars may experience mass loss, driven, at least in part, by oscillations. The class of stars we are discussing here are the δ Scuti variables. These are variable stars with masses between about 1.2 and 2.25 M/sub θ/, lying on or very near the main sequence. According to this theory, high rotation rates enhance the rate of mass loss, so main sequence stars born in this mass range would have a range of mass loss rates, depending on their initial rotation velocity and the amplitude of the oscillations. The stars would evolve rapidly down the main sequence until (at about 1.25 M/sub θ/) a surface convection zone began to form. The presence of this convective region would slow the rotation, perhaps allowing magnetic braking to occur, and thus sharply reduce the mass loss rate. 7 refs

  1. Amino acid sequences of ribosomal proteins S11 from Bacillus stearothermophilus and S19 from Halobacterium marismortui. Comparison of the ribosomal protein S11 family.

    Science.gov (United States)

    Kimura, M; Kimura, J; Hatakeyama, T

    1988-11-21

    The complete amino acid sequences of ribosomal proteins S11 from the Gram-positive eubacterium Bacillus stearothermophilus and of S19 from the archaebacterium Halobacterium marismortui have been determined. A search for homologous sequences of these proteins revealed that they belong to the ribosomal protein S11 family. Homologous proteins have previously been sequenced from Escherichia coli as well as from chloroplast, yeast and mammalian ribosomes. A pairwise comparison of the amino acid sequences showed that Bacillus protein S11 shares 68% identical residues with S11 from Escherichia coli and a slightly lower homology (52%) with the homologous chloroplast protein. The halophilic protein S19 is more related to the eukaryotic (45-49%) than to the eubacterial counterparts (35%).

  2. Electricity sequence control

    International Nuclear Information System (INIS)

    Shin, Heung Ryeol

    2010-03-01

    The contents of the book are introduction of control system, like classification and control signal, introduction of electricity power switch, such as push-button and detection switch sensor for induction type and capacitance type machinery for control, solenoid valve, expression of sequence and type of electricity circuit about using diagram, time chart, marking and term, logic circuit like Yes, No, and, or and equivalence logic, basic electricity circuit, electricity sequence control, added condition, special program control about choice and jump of program, motor control, extra circuit on repeat circuit, pause circuit in a conveyer, safety regulations and rule about classification of electricity disaster and protective device for insulation.

  3. Next-generation sequencing

    DEFF Research Database (Denmark)

    Rieneck, Klaus; Bak, Mads; Jønson, Lars

    2013-01-01

    , Illumina); several millions of PCR sequences were analyzed. RESULTS: The results demonstrated the feasibility of diagnosing the fetal KEL1 or KEL2 blood group from cell-free DNA purified from maternal plasma. CONCLUSION: This method requires only one primer pair, and the large amount of sequence...... information obtained allows well for statistical analysis of the data. This general approach can be integrated into current laboratory practice and has numerous applications. Besides DNA-based predictions of blood group phenotypes, platelet phenotypes, or sickle cell anemia, and the determination of zygosity...

  4. Sequence imputation of HPV16 genomes for genetic association studies.

    Directory of Open Access Journals (Sweden)

    Benjamin Smith

    Full Text Available Human Papillomavirus type 16 (HPV16 causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs is unknown. In addition, we neither know which nucleotides vary across and within HPV types and lineages, nor which of the single nucleotide polymorphisms (SNPs determine oncogenicity.A reference set of 62 HPV16 complete genome sequences was established and used to examine patterns of evolutionary relatedness amongst variants using a pairwise identity heatmap and HPV16 phylogeny. A BLAST-based algorithm was developed to impute complete genome data from partial sequence information using the reference database. To interrogate the oncogenic risk of determined and imputed HPV16 SNPs, odds-ratios for each SNP were calculated in a case-control viral genome-wide association study (VWAS using biopsy confirmed high-grade cervix neoplasia and self-limited HPV16 infections from Guanacaste, Costa Rica.HPV16 variants display evolutionarily stable lineages that contain conserved diagnostic SNPs. The imputation algorithm indicated that an average of 97.5±1.03% of SNPs could be accurately imputed. The VWAS revealed specific HPV16 viral SNPs associated with variant lineages and elevated odds ratios; however, individual causal SNPs could not be distinguished with certainty due to the nature of HPV evolution.Conserved and lineage-specific SNPs can be imputed with a high degree of accuracy from limited viral polymorphic data due to the lack of recombination and the stochastic mechanism of variation accumulation in the HPV genome. However, to determine the role of novel variants or non-lineage-specific SNPs by VWAS will require direct sequence analysis. The investigation of patterns of genetic variation and the identification of diagnostic SNPs for lineages of HPV16 variants provides a valuable resource for future studies of HPV16

  5. THE RHIC SEQUENCER

    International Nuclear Information System (INIS)

    VAN ZEIJTS, J.; DOTTAVIO, T.; FRAK, B.; MICHNOFF, R.

    2001-01-01

    The Relativistic Heavy Ion Collider (RHIC) has a high level asynchronous time-line driven by a controlling program called the ''Sequencer''. Most high-level magnet and beam related issues are orchestrated by this system. The system also plays an important task in coordinated data acquisition and saving. We present the program, operator interface, operational impact and experience

  6. Twin anemia polycythemia sequence

    NARCIS (Netherlands)

    Slaghekke, Femke

    2014-01-01

    In this thesis we describe that Twin Anemia Polycythemia Sequence (TAPS) is a form of chronic feto-fetal transfusion in monochorionic (identical) twins based on a small amount of blood transfusion through very small anastomoses. For the antenatal diagnosis of TAPS, Middle Cerebral Artery – Peak

  7. simple sequence repeat (SSR)

    African Journals Online (AJOL)

    In the present study, 78 mapped simple sequence repeat (SSR) markers representing 11 linkage groups of adzuki bean were evaluated for transferability to mungbean and related Vigna spp. 41 markers amplified characteristic bands in at least one Vigna species. The transferability percentage across the genotypes ranged ...

  8. Filling gaps in biodiversity knowledge for macrofungi: contributions and assessment of an herbarium collection DNA barcode sequencing project.

    Science.gov (United States)

    Osmundson, Todd W; Robert, Vincent A; Schoch, Conrad L; Baker, Lydia J; Smith, Amy; Robich, Giovanni; Mizzan, Luca; Garbelotto, Matteo M

    2013-01-01

    Despite recent advances spearheaded by molecular approaches and novel technologies, species description and DNA sequence information are significantly lagging for fungi compared to many other groups of organisms. Large scale sequencing of vouchered herbarium material can aid in closing this gap. Here, we describe an effort to obtain broad ITS sequence coverage of the approximately 6000 macrofungal-species-rich herbarium of the Museum of Natural History in Venice, Italy. Our goals were to investigate issues related to large sequencing projects, develop heuristic methods for assessing the overall performance of such a project, and evaluate the prospects of such efforts to reduce the current gap in fungal biodiversity knowledge. The effort generated 1107 sequences submitted to GenBank, including 416 previously unrepresented taxa and 398 sequences exhibiting a best BLAST match to an unidentified environmental sequence. Specimen age and taxon affected sequencing success, and subsequent work on failed specimens showed that an ITS1 mini-barcode greatly increased sequencing success without greatly reducing the discriminating power of the barcode. Similarity comparisons and nonmetric multidimensional scaling ordinations based on pairwise distance matrices proved to be useful heuristic tools for validating the overall accuracy of specimen identifications, flagging potential misidentifications, and identifying taxa in need of additional species-level revision. Comparison of within- and among-species nucleotide variation showed a strong increase in species discriminating power at 1-2% dissimilarity, and identified potential barcoding issues (same sequence for different species and vice-versa). All sequences are linked to a vouchered specimen, and results from this study have already prompted revisions of species-sequence assignments in several taxa.

  9. TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.

    Science.gov (United States)

    Chang, Jia-Ming; Di Tommaso, Paolo; Notredame, Cedric

    2014-06-01

    Multiple sequence alignment (MSA) is a key modeling procedure when analyzing biological sequences. Homology and evolutionary modeling are the most common applications of MSAs. Both are known to be sensitive to the underlying MSA accuracy. In this work, we show how this problem can be partly overcome using the transitive consistency score (TCS), an extended version of the T-Coffee scoring scheme. Using this local evaluation function, we show that one can identify the most reliable portions of an MSA, as judged from BAliBASE and PREFAB structure-based reference alignments. We also show how this measure can be used to improve phylogenetic tree reconstruction using both an established simulated data set and a novel empirical yeast data set. For this purpose, we describe a novel lossless alternative to site filtering that involves overweighting the trustworthy columns. Our approach relies on the T-Coffee framework; it uses libraries of pairwise alignments to evaluate any third party MSA. Pairwise projections can be produced using fast or slow methods, thus allowing a trade-off between speed and accuracy. We compared TCS with Heads-or-Tails, GUIDANCE, Gblocks, and trimAl and found it to lead to significantly better estimates of structural accuracy and more accurate phylogenetic trees. The software is available from www.tcoffee.org/Projects/tcs. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  10. Targeted sequencing of plant genomes

    Science.gov (United States)

    Mark D. Huynh

    2014-01-01

    Next-generation sequencing (NGS) has revolutionized the field of genetics by providing a means for fast and relatively affordable sequencing. With the advancement of NGS, wholegenome sequencing (WGS) has become more commonplace. However, sequencing an entire genome is still not cost effective or even beneficial in all cases. In studies that do not require a whole-...

  11. Almost convergence of triple sequences

    OpenAIRE

    Ayhan Esi; M.Necdet Catalbas

    2013-01-01

    In this paper we introduce and study the concepts of almost convergence and almost Cauchy for triple sequences. Weshow that the set of almost convergent triple sequences of 0's and 1's is of the first category and also almost everytriple sequence of 0's and 1's is not almost convergent.Keywords: almost convergence, P-convergent, triple sequence.

  12. A few Smarandache Integer Sequences

    OpenAIRE

    Ibstedt, Henry

    2010-01-01

    This paper deals with the analysis of a few Smarandache Integer Sequences which first appeared in Properties or the Numbers, F. Smarandache, University or Craiova Archives, 1975. The first four sequences are recurrence generated sequences while the last three are concatenation sequences.

  13. An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.

    Science.gov (United States)

    Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K

    2014-01-01

    Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone

  14. Allele Re-sequencing Technologies

    DEFF Research Database (Denmark)

    Byrne, Stephen; Farrell, Jacqueline Danielle; Asp, Torben

    2013-01-01

    The development of next-generation sequencing technologies has made sequencing an affordable approach for detection of genetic variations associated with various traits. However, the cost of whole genome re-sequencing still remains too high to be feasible for many plant species with large...... alternative to whole genome re-sequencing to identify causative genetic variations in plants. One challenge, however, will be efficient bioinformatics strategies for data handling and analysis from the increasing amount of sequence information....

  15. An Outbreak of Streptococcus pyogenes in a Mental Health Facility: Advantage of Well-Timed Whole-Genome Sequencing Over emm Typing.

    Science.gov (United States)

    Bergin, Sarah M; Periaswamy, Balamurugan; Barkham, Timothy; Chua, Hong Choon; Mok, Yee Ming; Fung, Daniel Shuen Sheng; Su, Alex Hsin Chuan; Lee, Yen Ling; Chua, Ming Lai Ivan; Ng, Poh Yong; Soon, Wei Jia Wendy; Chu, Collins Wenhan; Tan, Siyun Lucinda; Meehan, Mary; Ang, Brenda Sze Peng; Leo, Yee Sin; Holden, Matthew T G; De, Partha; Hsu, Li Yang; Chen, Swaine L; de Sessions, Paola Florez; Marimuthu, Kalisvar

    2018-05-09

    OBJECTIVEWe report the utility of whole-genome sequencing (WGS) conducted in a clinically relevant time frame (ie, sufficient for guiding management decision), in managing a Streptococcus pyogenes outbreak, and present a comparison of its performance with emm typing.SETTINGA 2,000-bed tertiary-care psychiatric hospital.METHODSActive surveillance was conducted to identify new cases of S. pyogenes. WGS guided targeted epidemiological investigations, and infection control measures were implemented. Single-nucleotide polymorphism (SNP)-based genome phylogeny, emm typing, and multilocus sequence typing (MLST) were performed. We compared the ability of WGS and emm typing to correctly identify person-to-person transmission and to guide the management of the outbreak.RESULTSThe study included 204 patients and 152 staff. We identified 35 patients and 2 staff members with S. pyogenes. WGS revealed polyclonal S. pyogenes infections with 3 genetically distinct phylogenetic clusters (C1-C3). Cluster C1 isolates were all emm type 4, sequence type 915 and had pairwise SNP differences of 0-5, which suggested recent person-to-person transmissions. Epidemiological investigation revealed that cluster C1 was mediated by dermal colonization and transmission of S. pyogenes in a male residential ward. Clusters C2 and C3 were genomically diverse, with pairwise SNP differences of 21-45 and 26-58, and emm 11 and mostly emm120, respectively. Clusters C2 and C3, which may have been considered person-to-person transmissions by emm typing, were shown by WGS to be unlikely by integrating pairwise SNP differences with epidemiology.CONCLUSIONSWGS had higher resolution than emm typing in identifying clusters with recent and ongoing person-to-person transmissions, which allowed implementation of targeted intervention to control the outbreak.Infect Control Hosp Epidemiol 2018;1-9.

  16. Multilocus Sequence Typing

    OpenAIRE

    Belén, Ana; Pavón, Ibarz; Maiden, Martin C.J.

    2009-01-01

    Multilocus sequence typing (MLST) was first proposed in 1998 as a typing approach that enables the unambiguous characterization of bacterial isolates in a standardized, reproducible, and portable manner using the human pathogen Neisseria meningitidis as the exemplar organism. Since then, the approach has been applied to a large and growing number of organisms by public health laboratories and research institutions. MLST data, shared by investigators over the world via the Internet, have been ...

  17. Achalasia Carcinoma Sequence

    OpenAIRE

    Makmun, Dadang

    2001-01-01

    We report a case of carcinoma of the esophagus in a 58 years old woman with achalasia, who has been diagnosed since 30 years ago, which initiated by surgical treatment (myotomy) and the symptoms recurred since 3 years ago. According to the progress of the disease, Malignancy was strongly suspected due to prolonged stasis and mucosal irritation caused by achalasia (achalasia carcinoma sequence). Because of these contributing factors for the development of serious complications such as Malignan...

  18. Sequencing BPS spectra

    Energy Technology Data Exchange (ETDEWEB)

    Gukov, Sergei [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Max-Planck-Institut für Mathematik,Vivatsgasse 7, D-53111 Bonn (Germany); Nawata, Satoshi [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Centre for Quantum Geometry of Moduli Spaces, University of Aarhus,Nordre Ringgade 1, DK-8000 (Denmark); Saberi, Ingmar [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Stošić, Marko [CAMGSD, Departamento de Matemática, Instituto Superior Técnico,Av. Rovisco Pais, 1049-001 Lisbon (Portugal); Mathematical Institute SANU,Knez Mihajlova 36, 11000 Belgrade (Serbia); Sułkowski, Piotr [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Faculty of Physics, University of Warsaw,ul. Pasteura 5, 02-093 Warsaw (Poland)

    2016-03-02

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincaré polynomials in numerous examples. Among these structural properties is a novel “sliding” property, which can be explained by using (refined) modular S-matrix. This leads to the identification of modular transformations in Chern-Simons theory and 3d N=2 theory via the 3d/3d correspondence. Lastly, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  19. Sequencing BPS spectra

    International Nuclear Information System (INIS)

    Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; Stošić, Marko; Sułkowski, Piotr

    2016-01-01

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincaré polynomials in numerous examples. Among these structural properties is a novel “sliding” property, which can be explained by using (refined) modular S-matrix. This leads to the identification of modular transformations in Chern-Simons theory and 3d N=2 theory via the 3d/3d correspondence. Lastly, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  20. Image sequence analysis

    CERN Document Server

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  1. mESAdb: microRNA expression and sequence analysis database.

    Science.gov (United States)

    Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen

    2011-01-01

    microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.

  2. Sequence diversity and copy number variation of Mutator-like transposases in wheat

    Directory of Open Access Journals (Sweden)

    Nobuaki Asakura

    2008-01-01

    Full Text Available Partial transposase-coding sequences of Mutator-like elements (MULEs were isolated from a wild einkorn wheat, Triticum urartu, by degenerate PCR. The isolated sequences were classified into a MuDR or Class I clade and divided into two distinct subclasses (subclass I and subclass II. The average pair-wise identity between members of both subclasses was 58.8% at the nucleotide sequence level. Sequence diversity of subclass I was larger than that of subclass II. DNA gel blot analysis showed that subclass I was present as low copy number elements in the genomes of all Triticum and Aegilops accessions surveyed, while subclass II was present as high copy number elements. These two subclasses seemed uncapable of recognizing each other for transposition. The number of copies of subclass II elements was much higher in Aegilops with the S, Sl and D genomes and polyploid Triticum species than in diploid Triticum with the A genome, indicating that active transposition occurred in S, Sl and D genomes before polyploidization. DNA gel blot analysis of six species selected from three subfamilies of Poaceae demonstrated that only the tribe Triticeae possessed both subclasses. These results suggest that the differentiation of these two subclasses occurred before or immediately after the establishment of the tribe Triticeae.

  3. Functional brain activation differences in stuttering identified with a rapid fMRI sequence

    Science.gov (United States)

    Kraft, Shelly Jo; Choo, Ai Leen; Sharma, Harish; Ambrose, Nicoline G.

    2011-01-01

    The purpose of this study was to investigate whether brain activity related to the presence of stuttering can be identified with rapid functional MRI (fMRI) sequences that involved overt and covert speech processing tasks. The long-term goal is to develop sensitive fMRI approaches with developmentally appropriate tasks to identify deviant speech motor and auditory brain activity in children who stutter closer to the age at which recovery from stuttering is documented. Rapid sequences may be preferred for individuals or populations who do not tolerate long scanning sessions. In this report, we document the application of a picture naming and phoneme monitoring task in three minute fMRI sequences with adults who stutter (AWS). If relevant brain differences are found in AWS with these approaches that conform to previous reports, then these approaches can be extended to younger populations. Pairwise contrasts of brain BOLD activity between AWS and normally fluent adults indicated the AWS showed higher BOLD activity in the right inferior frontal gyrus (IFG), right temporal lobe and sensorimotor cortices during picture naming and and higher activity in the right IFG during phoneme monitoring. The right lateralized pattern of BOLD activity together with higher activity in sensorimotor cortices is consistent with previous reports, which indicates rapid fMRI sequences can be considered for investigating stuttering in younger participants. PMID:22133409

  4. GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data

    Directory of Open Access Journals (Sweden)

    Li Chen

    2018-04-01

    Full Text Available Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero-inflation remain largely undeveloped. Here we propose geometric mean of pairwise ratios—a simple but effective normalization method—for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.

  5. GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data.

    Science.gov (United States)

    Chen, Li; Reeve, James; Zhang, Lujun; Huang, Shengbing; Wang, Xuefeng; Chen, Jun

    2018-01-01

    Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero-inflation remain largely undeveloped. Here we propose geometric mean of pairwise ratios-a simple but effective normalization method-for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.

  6. Genomic Selection Using Genotyping-By-Sequencing Data with Different Coverage Depth in Perennial Ryegrass

    DEFF Research Database (Denmark)

    Cericola, Fabio; Fé, Dario; Janss, Luc

    2015-01-01

    the diagonal elements by estimating the amount of genetic variance caused by the reduction of the coverage depth. Secondly we developed a method to scale the relationship matrix by taking into account the overall amount of pairwise non-missing loci between all families. Rust resistance and heading date were......Genotyping by sequencing (GBS) allows generating up to millions of molecular markers with a cost per sample which is proportional to the level of multiplexing. Increasing the sample multiplexing decreases the genotyping price but also reduces the numbers of reads per marker. In this work we...... investigated how this reduction of the coverage depth affects the genomic relationship matrices used to estimated breeding value of F2 family pools in perennial ryegrass. A total of 995 families were genotyped via GBS providing more than 1.8M allele frequency estimates for each family with an average coverage...

  7. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    2015-01-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected...... with the aim of augmenting the custom low-density Illumina BovineLD SNP chip (San Diego, CA) used in the Nordic countries. The single-marker analysis was done breed-wise on all 16 index traits included in the breeding goals for Nordic Holstein, Danish Jersey, and Nordic Red cattle plus the total merit index...... itself. Depending on the trait’s economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage...

  8. The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies.

    Directory of Open Access Journals (Sweden)

    Patrick D Schloss

    Full Text Available Pyrosequencing of PCR-amplified fragments that target variable regions within the 16S rRNA gene has quickly become a powerful method for analyzing the membership and structure of microbial communities. This approach has revealed and introduced questions that were not fully appreciated by those carrying out traditional Sanger sequencing-based methods. These include the effects of alignment quality, the best method of calculating pairwise genetic distances for 16S rRNA genes, whether it is appropriate to filter variable regions, and how the choice of variable region relates to the genetic diversity observed in full-length sequences. I used a diverse collection of 13,501 high-quality full-length sequences to assess each of these questions. First, alignment quality had a significant impact on distance values and downstream analyses. Specifically, the greengenes alignment, which does a poor job of aligning variable regions, predicted higher genetic diversity, richness, and phylogenetic diversity than the SILVA and RDP-based alignments. Second, the effect of different gap treatments in determining pairwise genetic distances was strongly affected by the variation in sequence length for a region; however, the effect of different calculation methods was subtle when determining the sample's richness or phylogenetic diversity for a region. Third, applying a sequence mask to remove variable positions had a profound impact on genetic distances by muting the observed richness and phylogenetic diversity. Finally, the genetic distances calculated for each of the variable regions did a poor job of correlating with the full-length gene. Thus, while it is tempting to apply traditional cutoff levels derived for full-length sequences to these shorter sequences, it is not advisable. Analysis of beta-diversity metrics showed that each of these factors can have a significant impact on the comparison of community membership and structure. Taken together, these results

  9. Complete mitochondrial genome sequences from five Eimeria species (Apicomplexa; Coccidia; Eimeriidae) infecting domestic turkeys.

    Science.gov (United States)

    Ogedengbe, Mosun E; El-Sherry, Shiem; Whale, Julia; Barta, John R

    2014-07-17

    Clinical and subclinical coccidiosis is cosmopolitan and inflicts significant losses to the poultry industry globally. Seven named Eimeria species are responsible for coccidiosis in turkeys: Eimeria dispersa; Eimeria meleagrimitis; Eimeria gallopavonis; Eimeria meleagridis; Eimeria adenoeides; Eimeria innocua; and, Eimeria subrotunda. Although attempts have been made to characterize these parasites molecularly at the nuclear 18S rDNA and ITS loci, the maternally-derived and mitotically replicating mitochondrial genome may be more suited for species level molecular work; however, only limited sequence data are available for Eimeria spp. infecting turkeys. The purpose of this study was to sequence and annotate the complete mitochondrial genomes from 5 Eimeria species that commonly infect the domestic turkey (Meleagris gallopavo). Six single-oocyst derived cultures of five Eimeria species infecting turkeys were PCR-amplified and sequenced completely prior to detailed annotation. Resulting sequences were aligned and used in phylogenetic analyses (BI, ML, and MP) that included complete mitochondrial genomes from 16 Eimeria species or concatenated CDS sequences from each genome. Complete mitochondrial genome sequences were obtained for Eimeria adenoeides Guelph, 6211 bp; Eimeria dispersa Briston, 6238 bp; Eimeria meleagridis USAR97-01, 6212 bp; Eimeria meleagrimitis USMN08-01, 6165 bp; Eimeria gallopavonis Weybridge, 6215 bp; and Eimeria gallopavonis USKS06-01, 6215 bp). The order, orientation and CDS lengths of the three protein coding genes (COI, COIII and CytB) as well as rDNA fragments encoding ribosomal large and small subunit rRNA were conserved among all sequences. Pairwise sequence identities between species ranged from 88.1% to 98.2%; sequence variability was concentrated within CDS or between rDNA fragments (where indels were common). No phylogenetic reconstruction supported monophyly of Eimeria species infecting turkeys; Eimeria dispersa may have arisen

  10. Thermodynamic Molecular Switch in Sequence-Specific Hydrophobic Interaction: Two Computational Models Compared

    Directory of Open Access Journals (Sweden)

    Paul Chun

    2003-01-01

    Full Text Available We have shown in our published work the existence of a thermodynamic switch in biological systems wherein a change of sign in ΔCp°(Treaction leads to a true negative minimum in the Gibbs free energy change of reaction, and hence, a maximum in the related Keq. We have examined 35 pair-wise, sequence-specific hydrophobic interactions over the temperature range of 273–333 K, based on data reported by Nemethy and Scheraga in 1962. A closer look at a single example, the pair-wise hydrophobic interaction of leucine-isoleucine, will demonstrate the significant differences when the data are analyzed using the Nemethy-Scheraga model or treated by the Planck-Benzinger methodology which we have developed. The change in inherent chemical bond energy at 0 K, ΔH°(T0 is 7.53 kcal mol-1 compared with 2.4 kcal mol-1, while ‹ts› is 365 K as compared with 355 K, for the Nemethy-Scheraga and Planck-Benzinger model, respectively. At ‹tm›, the thermal agitation energy is about five times greater than ΔH°(T0 in the Planck-Benzinger model, that is 465 K compared to 497 K in the Nemethy-Scheraga model. The results imply that the negative Gibbs free energy minimum at a well-defined ‹ts›, where TΔS° = 0 at about 355 K, has its origin in the sequence-specific hydrophobic interactions, which are highly dependent on details of molecular structure. The Nemethy-Scheraga model shows no evidence of the thermodynamic molecular switch that we have found to be a universal feature of biological interactions. The Planck-Benzinger method is the best known for evaluating the innate temperature-invariant enthalpy, ΔH°(T0, and provides for better understanding of the heat of reaction for biological molecules.

  11. Comparative Genomics in Switchgrass Using 61,585 High-Quality Expressed Sequence Tags

    Directory of Open Access Journals (Sweden)

    Christian M. Tobias

    2008-11-01

    Full Text Available The development of genomic resources for switchgrass ( L., a perennial NAD-malic enzyme type C grass, is required to enable molecular breeding and biotechnological approaches for improving its value as a forage and bioenergy crop. Expressed sequence tag (EST sequencing is one method that can quickly sample gene inventories and produce data suitable for marker development or analysis of tissue-specific patterns of expression. Toward this goal, three cDNA libraries from callus, crown, and seedling tissues of ‘Kanlow’ switchgrass were end-sequenced to generate a total of 61,585 high-quality ESTs from 36,565 separate clones. Seventy-three percent of the assembled consensus sequences could be aligned with the sorghum [ (L. Moench] genome at a -value of <1 × 10, indicating a high degree of similarity. Sixty-five percent of the ESTs matched with gene ontology molecular terms, and 3.3% of the sequences were matched with genes that play potential roles in cell-wall biogenesis. The representation in the three libraries of gene families known to be associated with C photosynthesis, cellulose and β-glucan synthesis, phenylpropanoid biosynthesis, and peroxidase activity indicated likely roles for individual family members. Pairwise comparisons of synonymous codon substitutions were used to assess genome sequence diversity and indicated an overall similarity between the two genome copies present in the tetraploid. Identification of EST–simple sequence repeat markers and amplification on two individual parents of a mapping population yielded an average of 2.18 amplicons per individual, and 35% of the markers produced fragment length polymorphisms.

  12. Phylogenetic analysis of Demodex caprae based on mitochondrial 16S rDNA sequence.

    Science.gov (United States)

    Zhao, Ya-E; Hu, Li; Ma, Jun-Xian

    2013-11-01

    Demodex caprae infests the hair follicles and sebaceous glands of goats worldwide, which not only seriously impairs goat farming, but also causes a big economic loss. However, there are few reports on the DNA level of D. caprae. To reveal the taxonomic position of D. caprae within the genus Demodex, the present study conducted phylogenetic analysis of D. caprae based on mt16S rDNA sequence data. D. caprae adults and eggs were obtained from a skin nodule of the goat suffering demodicidosis. The mt16S rDNA sequences of individual mite were amplified using specific primers, and then cloned, sequenced, and aligned. The sequence divergence, genetic distance, and transition/transversion rate were computed, and the phylogenetic trees in Demodex were reconstructed. Results revealed the 339-bp partial sequences of six D. caprae isolates were obtained, and the sequence identity was 100% among isolates. The pairwise divergences between D. caprae and Demodex canis or Demodex folliculorum or Demodex brevis were 22.2-24.0%, 24.0-24.9%, and 22.9-23.2%, respectively. The corresponding average genetic distances were 2.840, 2.926, and 2.665, and the average transition/transversion rates were 0.70, 0.55, and 0.54, respectively. The divergences, genetic distances, and transition/transversion rates of D. caprae versus the other three species all reached interspecies level. The five phylogenetic trees all presented that D. caprae clustered with D. brevis first, and then with D. canis, D. folliculorum, and Demodex injai in sequence. In conclusion, D. caprae is an independent species, and it is closer to D. brevis than to D. canis, D. folliculorum, or D. injai.

  13. Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences

    Directory of Open Access Journals (Sweden)

    Shade Larry L

    2006-06-01

    Full Text Available Abstract Background Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. Results Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9 change/site/year was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9 change/site/year was approximately half of the overall rate (1.9–2.0 × 10(-9 change/site/year. Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. Conclusion This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies.

  14. Dynamically heterogenous partitions and phylogenetic inference: an evaluation of analytical strategies with cytochrome b and ND6 gene sequences in cranes.

    Science.gov (United States)

    Krajewski, C; Fain, M G; Buckley, L; King, D G

    1999-11-01

    ki ctes over whether molecular sequence data should be partitioned for phylogenetic analysis often confound two types of heterogeneity among partitions. We distinguish historical heterogeneity (i.e., different partitions have different evolutionary relationships) from dynamic heterogeneity (i.e., different partitions show different patterns of sequence evolution) and explore the impact of the latter on phylogenetic accuracy and precision with a two-gene, mitochondrial data set for cranes. The well-established phylogeny of cranes allows us to contrast tree-based estimates of relevant parameter values with estimates based on pairwise comparisons and to ascertain the effects of incorporating different amounts of process information into phylogenetic estimates. We show that codon positions in the cytochrome b and NADH dehydrogenase subunit 6 genes are dynamically heterogenous under both Poisson and invariable-sites + gamma-rates versions of the F84 model and that heterogeneity includes variation in base composition and transition bias as well as substitution rate. Estimates of transition-bias and relative-rate parameters from pairwise sequence comparisons were comparable to those obtained as tree-based maximum likelihood estimates. Neither rate-category nor mixed-model partitioning strategies resulted in a loss of phylogenetic precision relative to unpartitioned analyses. We suggest that weighted-average distances provide a computationally feasible alternative to direct maximum likelihood estimates of phylogeny for mixed-model analyses of large, dynamically heterogenous data sets. Copyright 1999 Academic Press.

  15. Sequence variation in mitochondrial cox1 and nad1 genes of ascaridoid nematodes in cats and dogs from Iran.

    Science.gov (United States)

    Mikaeili, F; Mirhendi, H; Mohebali, M; Hosseini, M; Sharbatkhori, M; Zarei, Z; Kia, E B

    2015-07-01

    The study was conducted to determine the sequence variation in two mitochondrial genes, namely cytochrome c oxidase 1 (pcox1) and NADH dehydrogenase 1 (pnad1) within and among isolates of Toxocara cati, Toxocara canis and Toxascaris leonina. Genomic DNA was extracted from 32 isolates of T. cati, 9 isolates of T. canis and 19 isolates of T. leonina collected from cats and dogs in different geographical areas of Iran. Mitochondrial genes were amplified by polymerase chain reaction (PCR) and sequenced. Sequence data were aligned using the BioEdit software and compared with published sequences in GenBank. Phylogenetic analysis was performed using Bayesian inference and maximum likelihood methods. Based on pairwise comparison, intra-species genetic diversity within Iranian isolates of T. cati, T. canis and T. leonina amounted to 0-2.3%, 0-1.3% and 0-1.0% for pcox1 and 0-2.0%, 0-1.7% and 0-2.6% for pnad1, respectively. Inter-species sequence variation among the three ascaridoid nematodes was significantly higher, being 9.5-16.6% for pcox1 and 11.9-26.7% for pnad1. Sequence and phylogenetic analysis of the pcox1 and pnad1 genes indicated that there is significant genetic diversity within and among isolates of T. cati, T. canis and T. leonina from different areas of Iran, and these genes can be used for studying genetic variation of ascaridoid nematodes.

  16. Foundations of Sequence-to-Sequence Modeling for Time Series

    OpenAIRE

    Kuznetsov, Vitaly; Mariet, Zelda

    2018-01-01

    The availability of large amounts of time series data, paired with the performance of deep-learning algorithms on a broad class of problems, has recently led to significant interest in the use of sequence-to-sequence models for time series forecasting. We provide the first theoretical analysis of this time series forecasting framework. We include a comparison of sequence-to-sequence modeling to classical time series models, and as such our theory can serve as a quantitative guide for practiti...

  17. Novel expressed sequence tag- simple sequence repeats (EST ...

    African Journals Online (AJOL)

    Using different bioinformatic criteria, the SUCEST database was used to mine for simple sequence repeat (SSR) markers. Among 42,189 clusters, 1,425 expressed sequence tag- simple sequence repeats (EST-SSRs) were identified in silico. Trinucleotide repeats were the most abundant SSRs detected. Of 212 primer pairs ...

  18. Infinite sequences and series

    CERN Document Server

    Knopp, Konrad

    1956-01-01

    One of the finest expositors in the field of modern mathematics, Dr. Konrad Knopp here concentrates on a topic that is of particular interest to 20th-century mathematicians and students. He develops the theory of infinite sequences and series from its beginnings to a point where the reader will be in a position to investigate more advanced stages on his own. The foundations of the theory are therefore presented with special care, while the developmental aspects are limited by the scope and purpose of the book. All definitions are clearly stated; all theorems are proved with enough detail to ma

  19. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal; Salama, Khaled N.

    2011-01-01

    fast alignment algorithm, called 'Alignment By Scanning' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the 'GAP' (which is heuristic) and the 'Needleman

  20. Next-Generation Sequencing Platforms

    Science.gov (United States)

    Mardis, Elaine R.

    2013-06-01

    Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.

  1. Rapid Polymer Sequencer

    Science.gov (United States)

    Stolc, Viktor (Inventor); Brock, Matthew W (Inventor)

    2013-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component. The nanopore preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  2. The advantages of SMRT sequencing

    OpenAIRE

    Roberts, Richard J; Carneiro, Mauricio O; Schatz, Michael C

    2013-01-01

    Of the current next-generation sequencing technologies, SMRT sequencing is sometimes overlooked. However, attributes such as long reads, modified base detection and high accuracy make SMRT a useful technology and an ideal approach to the complete sequencing of small genomes.

  3. Putting instruction sequences into effect

    NARCIS (Netherlands)

    Bergstra, J.A.

    2011-01-01

    An attempt is made to define the concept of execution of an instruction sequence. It is found to be a special case of directly putting into effect of an instruction sequence. Directly putting into effect of an instruction sequences comprises interpretation as well as execution. Directly putting into

  4. Region segmentation along image sequence

    International Nuclear Information System (INIS)

    Monchal, L.; Aubry, P.

    1995-01-01

    A method to extract regions in sequence of images is proposed. Regions are not matched from one image to the following one. The result of a region segmentation is used as an initialization to segment the following and image to track the region along the sequence. The image sequence is exploited as a spatio-temporal event. (authors). 12 refs., 8 figs

  5. Log-balanced combinatorial sequences

    Directory of Open Access Journals (Sweden)

    Tomislav Došlic

    2005-01-01

    Full Text Available We consider log-convex sequences that satisfy an additional constraint imposed on their rate of growth. We call such sequences log-balanced. It is shown that all such sequences satisfy a pair of double inequalities. Sufficient conditions for log-balancedness are given for the case when the sequence satisfies a two- (or more- term linear recurrence. It is shown that many combinatorially interesting sequences belong to this class, and, as a consequence, that the above-mentioned double inequalities are valid for all of them.

  6. New MR pulse sequence

    International Nuclear Information System (INIS)

    Harms, S.E.; Flamig, D.P.; Griffey, R.H.

    1990-01-01

    This paper describes a method for fat suppression for three-dimensional MR imaging. The FATS (fat-suppressed acquisition with echo time shortened) sequence employs a pair of opposing adiabatic half-passage RF pulses tuned on fat resonance. The imaging parameters are as follows: TR, 20 msec; TE, 21.7-3.2 msec; 1,024 x 128 x 128 acquired matrix; imaging time, approximately 11 minutes. A series of 54 examinations were performed. Excellent fat suppression with water excitation is achieved in all cases. The orbital images demonstrate superior resolution of small orbital lesions. The high signal-to-noise ratio (SNR) in cranial studies demonstrates excellent petrous bone and internal auditory canal anatomy

  7. Complete nuclear ribosomal DNA sequence amplification and molecular analyses of Bangia (Bangiales, Rhodophyta) from China

    Science.gov (United States)

    Xu, Jiajie; Jiang, Bo; Chai, Sanming; He, Yuan; Zhu, Jianyi; Shen, Zonggen; Shen, Songdong

    2016-09-01

    Filamentous Bangia, which are distributed extensively throughout the world, have simple and similar morphological characteristics. Scientists can classify these organisms using molecular markers in combination with morphology. We successfully sequenced the complete nuclear ribosomal DNA, approximately 13 kb in length, from a marine Bangia population. We further analyzed the small subunit ribosomal DNA gene (nrSSU) and the internal transcribed spacer (ITS) sequence regions along with nine other marine, and two freshwater Bangia samples from China. Pairwise distances of the nrSSU and 5.8S ribosomal DNA gene sequences show the marine samples grouping together with low divergences (00.003; 0-0.006, respectively) from each other, but high divergences (0.123-0.126; 0.198, respectively) from freshwater samples. An exception is the marine sample collected from Weihai, which shows high divergence from both other marine samples (0.063-0.065; 0.129, respectively) and the freshwater samples (0.097; 0.120, respectively). A maximum likelihood phylogenetic tree based on a combined SSU-ITS dataset with maximum likelihood method shows the samples divided into three clades, with the two marine sample clades containing Bangia spp. from North America, Europe, Asia, and Australia; and one freshwater clade, containing Bangia atropurpurea from North America and China.

  8. ViCTree: An automated framework for taxonomic classification from protein sequences.

    Science.gov (United States)

    Modha, Sejal; Thanki, Anil; Cotmore, Susan F; Davison, Andrew J; Hughes, Joseph

    2018-02-20

    The increasing rate of submission of genetic sequences into public databases is providing a growing resource for classifying the organisms that these sequences represent. To aid viral classification, we have developed ViCTree, which automatically integrates the relevant sets of sequences in NCBI GenBank and transforms them into an interactive maximum likelihood phylogenetic tree that can be updated automatically. ViCTree incorporates ViCTreeView, which is a JavaScript-based visualisation tool that enables the tree to be explored interactively in the context of pairwise distance data. To demonstrate utility, ViCTree was applied to subfamily Densovirinae of family Parvoviridae. This led to the identification of six new species of insect virus. ViCTree is open-source and can be run on any Linux- or Unix-based computer or cluster. A tutorial, the documentation and the source code are available under a GPL3 license, and can be accessed at http://bioinformatics.cvr.ac.uk/victree_web/. sejal.modha@glasgow.ac.uk.

  9. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    Science.gov (United States)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  10. LZW-Kernel: fast kernel utilizing variable length code blocks from LZW compressors for protein sequence classification.

    Science.gov (United States)

    Filatov, Gleb; Bauwens, Bruno; Kertész-Farkas, Attila

    2018-05-07

    Bioinformatics studies often rely on similarity measures between sequence pairs, which often pose a bottleneck in large-scale sequence analysis. Here, we present a new convolutional kernel function for protein sequences called the LZW-Kernel. It is based on code words identified with the Lempel-Ziv-Welch (LZW) universal text compressor. The LZW-Kernel is an alignment-free method, it is always symmetric, is positive, always provides 1.0 for self-similarity and it can directly be used with Support Vector Machines (SVMs) in classification problems, contrary to normalized compression distance (NCD), which often violates the distance metric properties in practice and requires further techniques to be used with SVMs. The LZW-Kernel is a one-pass algorithm, which makes it particularly plausible for big data applications. Our experimental studies on remote protein homology detection and protein classification tasks reveal that the LZW-Kernel closely approaches the performance of the Local Alignment Kernel (LAK) and the SVM-pairwise method combined with Smith-Waterman (SW) scoring at a fraction of the time. Moreover, the LZW-Kernel outperforms the SVM-pairwise method when combined with BLAST scores, which indicates that the LZW code words might be a better basis for similarity measures than local alignment approximations found with BLAST. In addition, the LZW-Kernel outperforms n-gram based mismatch kernels, hidden Markov model based SAM and Fisher kernel, and protein family based PSI-BLAST, among others. Further advantages include the LZW-Kernel's reliance on a simple idea, its ease of implementation, and its high speed, three times faster than BLAST and several magnitudes faster than SW or LAK in our tests. LZW-Kernel is implemented as a standalone C code and is a free open-source program distributed under GPLv3 license and can be downloaded from https://github.com/kfattila/LZW-Kernel. akerteszfarkas@hse.ru. Supplementary data are available at Bioinformatics Online.

  11. Universal sequence map (USM of arbitrary discrete sequences

    Directory of Open Access Journals (Sweden)

    Almeida Jonas S

    2002-02-01

    Full Text Available Abstract Background For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. Results We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM, is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR. The latter enables the representation of 4 unit type sequences (like DNA as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. Conclusions USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules.

  12. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  13. Genomic sequencing in clinical trials

    OpenAIRE

    Mestan, Karen K; Ilkhanoff, Leonard; Mouli, Samdeep; Lin, Simon

    2011-01-01

    Abstract Human genome sequencing is the process by which the exact order of nucleic acid base pairs in the 24 human chromosomes is determined. Since the completion of the Human Genome Project in 2003, genomic sequencing is rapidly becoming a major part of our translational research efforts to understand and improve human health and disease. This article reviews the current and future directions of clinical research with respect to genomic sequencing, a technology that is just beginning to fin...

  14. Biosensors for DNA sequence detection

    Science.gov (United States)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  15. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal

    2011-08-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  16. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal; Salama, Khaled N.

    2011-01-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the 'Needleman-Wunsch' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  17. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal

    2011-11-01

    Bioinformatics database is growing exponentially in size. Processing these large amount of data may take hours of time even if super computers are used. One of the most important processing tool in Bioinformatics is sequence alignment. We introduce fast alignment algorithm, called \\'Alignment By Scanning\\' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the \\'GAP\\' (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 51% enhancement in alignment score when it is compared with the GAP Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  18. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS...

  19. Dog Y chromosomal DNA sequence: identification, sequencing and SNP discovery

    Directory of Open Access Journals (Sweden)

    Kirkness Ewen

    2006-10-01

    Full Text Available Abstract Background Population genetic studies of dogs have so far mainly been based on analysis of mitochondrial DNA, describing only the history of female dogs. To get a picture of the male history, as well as a second independent marker, there is a need for studies of biallelic Y-chromosome polymorphisms. However, there are no biallelic polymorphisms reported, and only 3200 bp of non-repetitive dog Y-chromosome sequence deposited in GenBank, necessitating the identification of dog Y chromosome sequence and the search for polymorphisms therein. The genome has been only partially sequenced for one male dog, disallowing mapping of the sequence into specific chromosomes. However, by comparing the male genome sequence to the complete female dog genome sequence, candidate Y-chromosome sequence may be identified by exclusion. Results The male dog genome sequence was analysed by Blast search against the human genome to identify sequences with a best match to the human Y chromosome and to the female dog genome to identify those absent in the female genome. Candidate sequences were then tested for male specificity by PCR of five male and five female dogs. 32 sequences from the male genome, with a total length of 24 kbp, were identified as male specific, based on a match to the human Y chromosome, absence in the female dog genome and male specific PCR results. 14437 bp were then sequenced for 10 male dogs originating from Europe, Southwest Asia, Siberia, East Asia, Africa and America. Nine haplotypes were found, which were defined by 14 substitutions. The genetic distance between the haplotypes indicates that they originate from at least five wolf haplotypes. There was no obvious trend in the geographic distribution of the haplotypes. Conclusion We have identified 24159 bp of dog Y-chromosome sequence to be used for population genetic studies. We sequenced 14437 bp in a worldwide collection of dogs, identifying 14 SNPs for future SNP analyses, and

  20. Complete genome sequence of Fer-de-Lance Virus reveals a novel gene in reptilian Paramyxoviruses

    Science.gov (United States)

    Kurath, G.; Batts, W.N.; Ahne, W.; Winton, J.R.

    2004-01-01

    The complete RNA genome sequence of the archetype reptilian paramyxovirus, Fer-de-Lance virus (FDLV), has been determined. The genome is 15,378 nucleotides in length and consists of seven nonoverlapping genes in the order 3??? N-U-P-M-F-HN-L 5???, coding for the nucleocapsid, unknown, phospho-, matrix, fusion, hemagglutinin-neuraminidase, and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and tri-nucleotide intergenic regions similar to those of other Paramyxoviridae. The FDLV P gene expression strategy is like that of rubulaviruses, which express the accessory V protein from the primary transcript and edit a portion of the mRNA to encode P and I proteins. There is also an overlapping open reading frame potentially encoding a small basic protein in the P gene. The gene designated U (unknown), encodes a deduced protein of 19.4 kDa that has no counterpart in other paramyxoviruses and has no similarity with sequences in the National Center for Biotechnology Information database. Active transcription of the U gene in infected cells was demonstrated by Northern blot analysis, and bicistronic N-U mRNA was also evident. The genomes of two other snake paramyxovirus genotypes were also found to have U genes, with 11 to 16% nucleotide divergence from the FDLV U gene. Pairwise comparisons of amino acid identities and phylogenetic analyses of all deduced FDLV protein sequences with homologous sequences from other Paramyxoviridae indicate that FDLV represents a new genus within the subfamily Paramyxovirinae. We suggest the name Ferlavirus for the new genus, with FDLV as the type species.

  1. Testing statistical significance scores of sequence comparison methods with structure similarity

    Directory of Open Access Journals (Sweden)

    Leunissen Jack AM

    2006-10-01

    Full Text Available Abstract Background In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. Results All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. Conclusion The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons.

  2. SVX Sequencer Board

    International Nuclear Information System (INIS)

    Utes, M.

    1997-01-01

    The SVX Sequencer boards are 9U by 280mm circuit boards that reside in slots 2 through 21 of each of eight Eurocard crates in the D0 Detector Platform. The basic purpose is to control the SVX chips for data acquisition and when a trigger occurs, to gather the SVX data and relay the data to the VRB boards in the Movable Counting House. Functions and features are as follows: (1) Initialization of eight SVX chip strings using the MIL-STD-1553 data bus; (2) Real time manipulation of the SVX control lines to effect data acquisition, digitization, and readout based on the NRZ/Clock signals from the Controller; (3) Conversion of 8-bit electrical SVX readout data to an optical signal operating at 1.062 Gbit/sec, sent to the VRB. Eight HDIs will be serviced per board; (4) Built-in logic analyzer which can record the most important control and data lines during a data acquisition cycle and put this recorded information onto the 1553 bus; (5) Identification header and end of data trailer tacked onto data stream; (6) 1553 register which can read the current values of the control and data lines; (7) 1553 register which can test the optical link; (8) 1553 registers for crossing pulse width, calibration pulse voltage, and calibration pipeline select; (9) 1553 register for reading the optical drivers status link; (10) 1553 register for power control of SVX chips and ignoring bad SVX strings; (11) Front panel displays and LEDs show the board status at a glance; (12) In-system programmable EPLDs are programmed via 1553 or Altera's 'Bitblaster'; (13) Automatic readout abort after 45us; (14) Supplies BUSY signal back to Trigger Framework; (15) Supports a heartbeat system to prevent excessive SVX current draw; and (16) Supports a SVX power trip feature if heartbeat failure occurs.

  3. Sequence Algebra, Sequence Decision Diagrams and Dynamic Fault Trees

    International Nuclear Information System (INIS)

    Rauzy, Antoine B.

    2011-01-01

    A large attention has been focused on the Dynamic Fault Trees in the past few years. By adding new gates to static (regular) Fault Trees, Dynamic Fault Trees aim to take into account dependencies among events. Merle et al. proposed recently an algebraic framework to give a formal interpretation to these gates. In this article, we extend Merle et al.'s work by adopting a slightly different perspective. We introduce Sequence Algebras that can be seen as Algebras of Basic Events, representing failures of non-repairable components. We show how to interpret Dynamic Fault Trees within this framework. Finally, we propose a new data structure to encode sets of sequences of Basic Events: Sequence Decision Diagrams. Sequence Decision Diagrams are very much inspired from Minato's Zero-Suppressed Binary Decision Diagrams. We show that all operations of Sequence Algebras can be performed on this data structure.

  4. TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences

    Directory of Open Access Journals (Sweden)

    Sharma Gaurav

    2011-04-01

    Full Text Available Abstract Background The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented. Results TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a

  5. Chameleon sequences in neurodegenerative diseases

    International Nuclear Information System (INIS)

    Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Salari, Ali

    2016-01-01

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to “helix to strand (HE)”, “helix to coil (HC)” and “strand to coil (CE)” alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.

  6. Direct, rapid RNA sequence analysis

    International Nuclear Information System (INIS)

    Peattie, D.A.

    1987-01-01

    The original methods of RNA sequence analysis were based on enzymatic production and chromatographic separation of overlapping oligonucleotide fragments from within an RNA molecule followed by identification of the mononucleotides comprising the oligomer. Over the past decade the field of nucleic acid sequencing has changed dramatically, however, and RNA molecules now can be sequenced in a variety of more streamlined fashions. Most of the more recent advances in RNA sequencing have involved one-dimensional electrophoretic separation of 32 P-end-labeled oligoribonucleotides on polyacrylamide gels. In this chapter the author discusses two of these methods for determining the nucleotide sequences of RNA molecules rapidly: the chemical method and the enzymatic method. Both methods are direct and degradative, i.e., they rely on fragmatic and chemical approaches should be utilized. The single-strand-specific ribonucleases (A, T 1 , T 2 , and S 1 ) provide an efficient means to locate double-helical regions rapidly, and the chemical reactions provide a means to determine the RNA sequence within these regions. In addition, the chemical reactions allow one to assign interactions to specific atoms and to distinguish secondary interactions from tertiary ones. If the RNA molecule is small enough to be sequenced directly by the enzymatic or chemical method, the probing reactions can be done easily at the same time as sequencing reactions

  7. Chameleon sequences in neurodegenerative diseases.

    Science.gov (United States)

    Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Salari, Ali

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to "helix to strand (HE)", "helix to coil (HC)" and "strand to coil (CE)" alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Farey sequences and resistor networks

    Indian Academy of Sciences (India)

    Green's function, while the perturbation of a network is investigated in [3]. ... In Theorem 1 below, we employ the Farey sequence to establish a strict .... We next show that the Farey sequence method is applicable for circuits with n or fewer.

  9. DNA Sequencing by Capillary Electrophoresis

    Science.gov (United States)

    Karger, Barry L.; Guttman, Andras

    2009-01-01

    Sequencing of human and other genomes has been at the center of interest in the biomedical field over the past several decades and is now leading toward an era of personalized medicine. During this time, DNA sequencing methods have evolved from the labor intensive slab gel electrophoresis, through automated multicapillary electrophoresis systems using fluorophore labeling with multispectral imaging, to the “next generation” technologies of cyclic array, hybridization based, nanopore and single molecule sequencing. Deciphering the genetic blueprint and follow-up confirmatory sequencing of Homo sapiens and other genomes was only possible by the advent of modern sequencing technologies that was a result of step by step advances with a contribution of academics, medical personnel and instrument companies. While next generation sequencing is moving ahead at break-neck speed, the multicapillary electrophoretic systems played an essential role in the sequencing of the Human Genome, the foundation of the field of genomics. In this prospective, we wish to overview the role of capillary electrophoresis in DNA sequencing based in part of several of our articles in this journal. PMID:19517496

  10. Graphene nanodevices for DNA sequencing

    NARCIS (Netherlands)

    Heerema, S.J.; Dekker, C.

    2016-01-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with

  11. Chameleon sequences in neurodegenerative diseases

    Energy Technology Data Exchange (ETDEWEB)

    Bahramali, Golnaz [Institute of Biochemistry and Biophysics, University of Tehran, Tehran (Iran, Islamic Republic of); Goliaei, Bahram, E-mail: goliaei@ut.ac.ir [Institute of Biochemistry and Biophysics, University of Tehran, Tehran (Iran, Islamic Republic of); Minuchehr, Zarrin, E-mail: minuchehr@nigeb.ac.ir [Department of Systems Biotechnology, National Institute of Genetic Engineering and Biotechnology, (NIGEB), Tehran (Iran, Islamic Republic of); Salari, Ali [Department of Systems Biotechnology, National Institute of Genetic Engineering and Biotechnology, (NIGEB), Tehran (Iran, Islamic Republic of)

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to “helix to strand (HE)”, “helix to coil (HC)” and “strand to coil (CE)” alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.

  12. Commercial Art: Scope and Sequence.

    Science.gov (United States)

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a commercial art vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  13. Rapid Diagnostics of Onboard Sequences

    Science.gov (United States)

    Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

    2012-01-01

    Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command

  14. Accident sequence quantification with KIRAP

    International Nuclear Information System (INIS)

    Kim, Tae Un; Han, Sang Hoon; Kim, Kil You; Yang, Jun Eon; Jeong, Won Dae; Chang, Seung Cheol; Sung, Tae Yong; Kang, Dae Il; Park, Jin Hee; Lee, Yoon Hwan; Hwang, Mi Jeong.

    1997-01-01

    The tasks of probabilistic safety assessment(PSA) consists of the identification of initiating events, the construction of event tree for each initiating event, construction of fault trees for event tree logics, the analysis of reliability data and finally the accident sequence quantification. In the PSA, the accident sequence quantification is to calculate the core damage frequency, importance analysis and uncertainty analysis. Accident sequence quantification requires to understand the whole model of the PSA because it has to combine all event tree and fault tree models, and requires the excellent computer code because it takes long computation time. Advanced Research Group of Korea Atomic Energy Research Institute(KAERI) has developed PSA workstation KIRAP(Korea Integrated Reliability Analysis Code Package) for the PSA work. This report describes the procedures to perform accident sequence quantification, the method to use KIRAP's cut set generator, and method to perform the accident sequence quantification with KIRAP. (author). 6 refs

  15. Accident sequence quantification with KIRAP

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Tae Un; Han, Sang Hoon; Kim, Kil You; Yang, Jun Eon; Jeong, Won Dae; Chang, Seung Cheol; Sung, Tae Yong; Kang, Dae Il; Park, Jin Hee; Lee, Yoon Hwan; Hwang, Mi Jeong

    1997-01-01

    The tasks of probabilistic safety assessment(PSA) consists of the identification of initiating events, the construction of event tree for each initiating event, construction of fault trees for event tree logics, the analysis of reliability data and finally the accident sequence quantification. In the PSA, the accident sequence quantification is to calculate the core damage frequency, importance analysis and uncertainty analysis. Accident sequence quantification requires to understand the whole model of the PSA because it has to combine all event tree and fault tree models, and requires the excellent computer code because it takes long computation time. Advanced Research Group of Korea Atomic Energy Research Institute(KAERI) has developed PSA workstation KIRAP(Korea Integrated Reliability Analysis Code Package) for the PSA work. This report describes the procedures to perform accident sequence quantification, the method to use KIRAP`s cut set generator, and method to perform the accident sequence quantification with KIRAP. (author). 6 refs.

  16. Repeated DNA sequences in fungi

    Energy Technology Data Exchange (ETDEWEB)

    Dutta, S K

    1974-11-01

    Several fungal species, representatives of all broad groups like basidiomycetes, ascomycetes and phycomycetes, were examined for the nature of repeated DNA sequences by DNA:DNA reassociation studies using hydroxyapatite chromatography. All of the fungal species tested contained 10 to 20 percent repeated DNA sequences. There are approximately 100 to 110 copies of repeated DNA sequences of approximately 4 x 10/sup 7/ daltons piece size of each. Repeated DNA sequence homoduplexes showed on average 5/sup 0/C difference of T/sub e/50 (temperature at which 50 percent duplexes dissociate) values from the corresponding homoduplexes of unfractionated whole DNA. It is suggested that a part of repetitive sequences in fungi constitutes mitochondrial DNA and a part of it constitutes nuclear DNA. (auth)

  17. [Complete genome sequencing and sequence analysis of BCG Tice].

    Science.gov (United States)

    Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

    2012-10-04

    The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.

  18. Pairwise Network Information and Nonlinear Correlations

    Czech Academy of Sciences Publication Activity Database

    Martin, E.A.; Hlinka, Jaroslav; Davidsen, J.

    2016-01-01

    Roč. 94, č. 4 (2016), č. článku 040301. ISSN 2470-0045 R&D Projects: GA ČR GA13-23940S; GA MZd(CZ) NV15-29835A Grant - others:GA MŠk(CZ) LO1611 Institutional support: RVO:67985807 Keywords : mutual information * correlation * information theory * redundancy Subject RIV: BD - Theory of Information Impact factor: 2.366, year: 2016

  19. On Solving Intransitivities in Repeated Pairwise Choices

    NARCIS (Netherlands)

    A. Maas (Arne); Th.G.G. Bezembinder (Thom); P.P. Wakker (Peter)

    1995-01-01

    textabstractAn operational method is presented for deriving a linear ranking of alternatives from repeated paired comparisons of the alternatives. Intransitivities in the observed preferences are cleared away by the introduction of decision errors of varying importance. An observed preference

  20. Whole-Genome Sequencing and Comparative Genome Analysis Provided Insight into the Predatory Features and Genetic Diversity of Two Bdellovibrio Species Isolated from Soil

    Directory of Open Access Journals (Sweden)

    Omotayo Opemipo Oyedara

    2018-01-01

    Full Text Available Bdellovibrio spp. are predatory bacteria with great potential as antimicrobial agents. Studies have shown that members of the genus Bdellovibrio exhibit peculiar characteristics that influence their ecological adaptations. In this study, whole genomes of two different Bdellovibrio spp. designated SKB1291214 and SSB218315 isolated from soil were sequenced. The core genes shared by all the Bdellovibrio spp. considered for the pangenome analysis including the epibiotic B. exovorus were 795. The number of unique genes identified in Bdellovibrio spp. SKB1291214, SSB218315, W, and B. exovorus JJS was 1343, 113, 857, and 1572, respectively. These unique genes encode hydrolytic, chemotaxis, and transporter proteins which might be useful for predation in the Bdellovibrio strains. Furthermore, the two Bdellovibrio strains exhibited differences based on the % GC content, amino acid identity, and 16S rRNA gene sequence. The 16S rRNA gene sequence of Bdellovibrio sp. SKB1291214 shared 99% identity with that of an uncultured Bdellovibrio sp. clone 12L 106 (a pairwise distance of 0.008 and 95–97% identity (a pairwise distance of 0.043 with that of other culturable terrestrial Bdellovibrio spp., including strain SSB218315. In Bdellovibrio sp. SKB1291214, 174 bp sequence was inserted at the host interaction (hit locus region usually attributed to prey attachment, invasion, and development of host independent Bdellovibrio phenotypes. Also, a gene equivalent to Bd0108 in B. bacteriovorus HD100 was not conserved in Bdellovibrio sp. SKB1291214. The results of this study provided information on the genetic characteristics and diversity of the genus Bdellovibrio that can contribute to their successful applications as a biocontrol agent.

  1. GROUPING WEB ACCESS SEQUENCES uSING SEQUENCE ALIGNMENT METHOD

    OpenAIRE

    BHUPENDRA S CHORDIA; KRISHNAKANT P ADHIYA

    2011-01-01

    In web usage mining grouping of web access sequences can be used to determine the behavior or intent of a set of users. Grouping websessions is how to measure the similarity between web sessions. There are many shortcomings in traditional measurement methods. The taskof grouping web sessions based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-groupsimilarity is done using sequence alignment method. This paper introduces a new method to group we...

  2. GLASSgo – Automated and Reliable Detection of sRNA Homologs From a Single Input Sequence

    Directory of Open Access Journals (Sweden)

    Steffen C. Lott

    2018-04-01

    Full Text Available Bacterial small RNAs (sRNAs are important post-transcriptional regulators of gene expression. The functional and evolutionary characterization of sRNAs requires the identification of homologs, which is frequently challenging due to their heterogeneity, short length and partly, little sequence conservation. We developed the GLobal Automatic Small RNA Search go (GLASSgo algorithm to identify sRNA homologs in complex genomic databases starting from a single sequence. GLASSgo combines an iterative BLAST strategy with pairwise identity filtering and a graph-based clustering method that utilizes RNA secondary structure information. We tested the specificity, sensitivity and runtime of GLASSgo, BLAST and the combination RNAlien/cmsearch in a typical use case scenario on 40 bacterial sRNA families. The sensitivity of the tested methods was similar, while the specificity of GLASSgo and RNAlien/cmsearch was significantly higher than that of BLAST. GLASSgo was on average ∼87 times faster than RNAlien/cmsearch, and only ∼7.5 times slower than BLAST, which shows that GLASSgo optimizes the trade-off between speed and accuracy in the task of finding sRNA homologs. GLASSgo is fully automated, whereas BLAST often recovers only parts of homologs and RNAlien/cmsearch requires extensive additional bioinformatic work to get a comprehensive set of homologs. GLASSgo is available as an easy-to-use web server to find homologous sRNAs in large databases.

  3. Population-based statistical inference for temporal sequence of somatic mutations in cancer genomes.

    Science.gov (United States)

    Rhee, Je-Keun; Kim, Tae-Min

    2018-04-20

    It is well recognized that accumulation of somatic mutations in cancer genomes plays a role in carcinogenesis; however, the temporal sequence and evolutionary relationship of somatic mutations remain largely unknown. In this study, we built a population-based statistical framework to infer the temporal sequence of acquisition of somatic mutations. Using the model, we analyzed the mutation profiles of 1954 tumor specimens across eight tumor types. As a result, we identified tumor type-specific directed networks composed of 2-15 cancer-related genes (nodes) and their mutational orders (edges). The most common ancestors identified in pairwise comparison of somatic mutations were TP53 mutations in breast, head/neck, and lung cancers. The known relationship of KRAS to TP53 mutations in colorectal cancers was identified, as well as potential ancestors of TP53 mutation such as NOTCH1, EGFR, and PTEN mutations in head/neck, lung and endometrial cancers, respectively. We also identified apoptosis-related genes enriched with ancestor mutations in lung cancers and a relationship between APC hotspot mutations and TP53 mutations in colorectal cancers. While evolutionary analysis of cancers has focused on clonal versus subclonal mutations identified in individual genomes, our analysis aims to further discriminate ancestor versus descendant mutations in population-scale mutation profiles that may help select cancer drivers with clinical relevance.

  4. Introducing difference recurrence relations for faster semi-global alignment of long sequences.

    Science.gov (United States)

    Suzuki, Hajime; Kasahara, Masahiro

    2018-02-19

    The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith-Waterman-Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. We proposed a faster semi-global alignment algorithm, "difference recurrence relations," that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba .

  5. Global Carrier Rates of Rare Inherited Disorders Using Population Exome Sequences.

    Directory of Open Access Journals (Sweden)

    Kohei Fujikura

    Full Text Available Exome sequencing has revealed the causative mutations behind numerous rare, inherited disorders, but it is challenging to find reliable epidemiological values for rare disorders. Here, I provide a genetic epidemiology method to identify the causative mutations behind rare, inherited disorders using two population exome sequences (1000 Genomes and NHLBI. I created global maps of carrier rate distribution for 18 recessive disorders in 16 diverse ethnic populations. Out of a total of 161 mutations associated with 18 recessive disorders, I detected 24 mutations in either or both exome studies. The genetic mapping revealed strong international spatial heterogeneities in the carrier patterns of the inherited disorders. I next validated this methodology by statistically evaluating the carrier rate of one well-understood disorder, sickle cell anemia (SCA. The population exome-based epidemiology of SCA [African (allele frequency (AF = 0.0454, N = 2447, Asian (AF = 0, N = 286, European (AF = 0.000214, N = 4677, and Hispanic (AF = 0.0111, N = 362] was not significantly different from that obtained from a clinical prevalence survey. A pair-wise proportion test revealed no significant differences between the two exome projects in terms of AF (46/48 cases; P > 0.05. I conclude that population exome-based carrier rates can form the foundation for a prospectively maintained database of use to clinical geneticists. Similar modeling methods can be applied to many inherited disorders.

  6. LPTAU, Quasi Random Sequence Generator

    International Nuclear Information System (INIS)

    Sobol, Ilya M.

    1993-01-01

    1 - Description of program or function: LPTAU generates quasi random sequences. These are uniformly distributed sets of L=M N points in the N-dimensional unit cube: I N =[0,1]x...x[0,1]. These sequences are used as nodes for multidimensional integration; as searching points in global optimization; as trial points in multi-criteria decision making; as quasi-random points for quasi Monte Carlo algorithms. 2 - Method of solution: Uses LP-TAU sequence generation (see references). 3 - Restrictions on the complexity of the problem: The number of points that can be generated is L 30 . The dimension of the space cannot exceed 51

  7. Weak disorder in Fibonacci sequences

    Energy Technology Data Exchange (ETDEWEB)

    Ben-Naim, E [Theoretical Division and Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM 87545 (United States); Krapivsky, P L [Department of Physics and Center for Molecular Cybernetics, Boston University, Boston, MA 02215 (United States)

    2006-05-19

    We study how weak disorder affects the growth of the Fibonacci series. We introduce a family of stochastic sequences that grow by the normal Fibonacci recursion with probability 1 - {epsilon}, but follow a different recursion rule with a small probability {epsilon}. We focus on the weak disorder limit and obtain the Lyapunov exponent that characterizes the typical growth of the sequence elements, using perturbation theory. The limiting distribution for the ratio of consecutive sequence elements is obtained as well. A number of variations to the basic Fibonacci recursion including shift, doubling and copying are considered. (letter to the editor)

  8. Sequencing and Characterization of the Invasive Sycamore Lace Bug Corythucha ciliata (Hemiptera: Tingidae) Transcriptome

    Science.gov (United States)

    Qu, Cheng; Fu, Ningning; Xu, Yihua

    2016-01-01

    The sycamore lace bug, Corythucha ciliata (Hemiptera: Tingidae), is an invasive forestry pest rapidly expanding in many countries. This pest poses a considerable threat to the urban forestry ecosystem, especially to Platanus spp. However, its molecular biology and biochemistry are poorly understood. This study reports the first C. ciliata transcriptome, encompassing three different life stages (Nymphs, adults female (AF) and adults male (AM)). In total, 26.53 GB of clean data and 60,879 unigenes were obtained from three RNA-seq libraries. These unigenes were annotated and classified by Nr (NCBI non-redundant protein sequences), Nt (NCBI non-redundant nucleotide sequences), Pfam (Protein family), KOG/COG (Clusters of Orthologous Groups of proteins), Swiss-Prot (A manually annotated and reviewed protein sequence database), and KO (KEGG Ortholog database). After all pairwise comparisons between these three different samples, a large number of differentially expressed genes were revealed. The dramatic differences in global gene expression profiles were found between distinct life stages (nymphs and AF, nymphs and AM) and sex difference (AF and AM), with some of the significantly differentially expressed genes (DEGs) being related to metamorphosis, digestion, immune and sex difference. The different express of unigenes were validated through quantitative Real-Time PCR (qRT-PCR) for 16 randomly selected unigenes. In addition, 17,462 potential simple sequence repeat molecular markers were identified in these transcriptome resources. These comprehensive C. ciliata transcriptomic information can be utilized to promote the development of environmentally friendly methodologies to disrupt the processes of metamorphosis, digestion, immune and sex differences. PMID:27494615

  9. Variations in CCL3L gene cluster sequence and non-specific gene copy numbers

    Directory of Open Access Journals (Sweden)

    Edberg Jeffrey C

    2010-03-01

    Full Text Available Abstract Background Copy number variations (CNVs of the gene CC chemokine ligand 3-like1 (CCL3L1 have been implicated in HIV-1 susceptibility, but the association has been inconsistent. CCL3L1 shares homology with a cluster of genes localized to chromosome 17q12, namely CCL3, CCL3L2, and, CCL3L3. These genes are involved in host defense and inflammatory processes. Several CNV assays have been developed for the CCL3L1 gene. Findings Through pairwise and multiple alignments of these genes, we have shown that the homology between these genes ranges from 50% to 99% in complete gene sequences and from 70-100% in the exonic regions, with CCL3L1 and CCL3L3 being identical. By use of MEGA 4 and BioEdit, we aligned sense primers, anti-sense primers, and probes used in several previously described assays against pre-multiple alignments of all four chemokine genes. Each set of probes and primers aligned and matched with overlapping sequences in at least two of the four genes, indicating that previously utilized RT-PCR based CNV assays are not specific for only CCL3L1. The four available assays measured median copies of 2 and 3-4 in European and African American, respectively. The concordance between the assays ranged from 0.44-0.83 suggesting individual discordant calls and inconsistencies with the assays from the expected gene coverage from the known sequence. Conclusions This indicates that some of the inconsistencies in the association studies could be due to assays that provide heterogenous results. Sequence information to determine CNV of the three genes separately would allow to test whether their association with the pathogenesis of a human disease or phenotype is affected by an individual gene or by a combination of these genes.

  10. Population genetic implications from sequence variation in four Y chromosome genes.

    Science.gov (United States)

    Shen, P; Wang, F; Underhill, P A; Franco, C; Yang, W H; Roxas, A; Sung, R; Lin, A A; Hyman, R W; Vollrath, D; Davis, R W; Cavalli-Sforza, L L; Oefner, P J

    2000-06-20

    Some insight into human evolution has been gained from the sequencing of four Y chromosome genes. Primary genomic sequencing determined gene SMCY to be composed of 27 exons that comprise 4,620 bp of coding sequence. The unfinished sequencing of the 5' portion of gene UTY1 was completed by primer walking, and a total of 20 exons were found. By using denaturing HPLC, these two genes, as well as DBY and DFFRY, were screened for polymorphic sites in 53-72 representatives of the five continents. A total of 98 variants were found, yielding nucleotide diversity estimates of 2.45 x 10(-5), 5. 07 x 10(-5), and 8.54 x 10(-5) for the coding regions of SMCY, DFFRY, and UTY1, respectively, with no variant having been observed in DBY. In agreement with most autosomal genes, diversity estimates for the noncoding regions were about 2- to 3-fold higher and ranged from 9. 16 x 10(-5) to 14.2 x 10(-5) for the four genes. Analysis of the frequencies of derived alleles for all four genes showed that they more closely fit the expectation of a Luria-Delbrück distribution than a distribution expected under a constant population size model, providing evidence for exponential population growth. Pairwise nucleotide mismatch distributions date the occurrence of population expansion to approximately 28,000 years ago. This estimate is in accord with the spread of Aurignacian technology and the disappearance of the Neanderthals.

  11. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  12. Integrated sequence analysis. Final report

    International Nuclear Information System (INIS)

    Andersson, K.; Pyy, P.

    1998-02-01

    The NKS/RAK subprojet 3 'integrated sequence analysis' (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term 'methodology' denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  13. Optimization of sequence alignment for simple sequence repeat regions

    Directory of Open Access Journals (Sweden)

    Ogbonnaya Francis C

    2011-07-01

    Full Text Available Abstract Background Microsatellites, or simple sequence repeats (SSRs, are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs. SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. Findings To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type. When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. Conclusions The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic

  14. ADDRESS SEQUENCES FOR MULTI RUN RAM TESTING

    Directory of Open Access Journals (Sweden)

    V. N. Yarmolik

    2014-01-01

    Full Text Available A universal approach for generation of address sequences with specified properties is proposed and analyzed. A modified version of the Antonov and Saleev algorithm for Sobol sequences genera-tion is chosen as a mathematical description of the proposed method. Within the framework of the proposed universal approach, the Sobol sequences form a subset of the address sequences. Other sub-sets are also formed, which are Gray sequences, anti-Gray sequences, counter sequences and sequenc-es with specified properties.

  15. Using sobol sequences for planning computer experiments

    Science.gov (United States)

    Statnikov, I. N.; Firsov, G. I.

    2017-12-01

    Discusses the use for research of problems of multicriteria synthesis of dynamic systems method of Planning LP-search (PLP-search), which not only allows on the basis of the simulation model experiments to revise the parameter space within specified ranges of their change, but also through special randomized nature of the planning of these experiments is to apply a quantitative statistical evaluation of influence of change of varied parameters and their pairwise combinations to analyze properties of the dynamic system.Start your abstract here...

  16. Fast and secure retrieval of DNA sequences

    NARCIS (Netherlands)

    2014-01-01

    Sequence models are retrieved from a sequences index. The sequence models model DNA or RNA sequences stored in a database, and each comprises a finite memory tree source model and parameters for the finite memory tree source model. One or more DNA or RNA sequences stored in the database are

  17. Decidability of uniform recurrence of morphic sequences

    OpenAIRE

    Durand , Fabien

    2012-01-01

    We prove that the uniform recurrence of morphic sequences is decidable. For this we show that the number of derived sequences of uniformly recurrent morphic sequences is bounded. As a corollary we obtain that uniformly recurrent morphic sequences are primitive substitutive sequences.

  18. Sequence Factorization with Multiple References.

    Directory of Open Access Journals (Sweden)

    Sebastian Wandelt

    Full Text Available The success of high-throughput sequencing has lead to an increasing number of projects which sequence large populations of a species. Storage and analysis of sequence data is a key challenge in these projects, because of the sheer size of the datasets. Compression is one simple technology to deal with this challenge. Referential factorization and compression schemes, which store only the differences between input sequence and a reference sequence, gained lots of interest in this field. Highly-similar sequences, e.g., Human genomes, can be compressed with a compression ratio of 1,000:1 and more, up to two orders of magnitude better than with standard compression techniques. Recently, it was shown that the compression against multiple references from the same species can boost the compression ratio up to 4,000:1. However, a detailed analysis of using multiple references is lacking, e.g., for main memory consumption and optimality. In this paper, we describe one key technique for the referential compression against multiple references: The factorization of sequences. Based on the notion of an optimal factorization, we propose optimization heuristics and identify parameter settings which greatly influence 1 the size of the factorization, 2 the time for factorization, and 3 the required amount of main memory. We evaluate a total of 30 setups with a varying number of references on data from three different species. Our results show a wide range of factorization sizes (optimal to an overhead of up to 300%, factorization speed (0.01 MB/s to more than 600 MB/s, and main memory usage (few dozen MB to dozens of GB. Based on our evaluation, we identify the best configurations for common use cases. Our evaluation shows that multi-reference factorization is much better than single-reference factorization.

  19. Analysis of Pteridium ribosomal RNA sequences by rapid direct sequencing.

    Science.gov (United States)

    Tan, M K

    1991-08-01

    A total of 864 bases from 5 regions interspersed in the 18S and 26S rRNA molecules from various clones of Pteridium covering the general geographical distribution of the genus was analysed using a rapid rRNA sequencing technique. No base difference has been detected amongst the three major lineages, two of which apparently separated before the breakup of the ancient supercontinent, Pangaea. These regions of the rRNA sequences have thus been conserved for at least 160 million years and are here compared with other eukaryotic, especially plant rRNAs.

  20. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    Directory of Open Access Journals (Sweden)

    Dobbs Drena

    2011-06-01

    Full Text Available Abstract Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i NPS-HomPPI (Non partner-specific HomPPI, which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii PS-HomPPI (Partner-specific HomPPI, which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of

  1. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko; Tanaka, Tsuyoshi; Ohyanagi, Hajime; Hsing, Yue-Ie C.; Itoh, Takeshi

    2018-01-01

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  2. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko

    2018-02-14

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  3. Transformed composite sequences for improved qubit addressing

    Science.gov (United States)

    Merrill, J. True; Doret, S. Charles; Vittorini, Grahame; Addison, J. P.; Brown, Kenneth R.

    2014-10-01

    Selective laser addressing of a single atom or atomic ion qubit can be improved using narrow-band composite pulse sequences. We describe a Lie-algebraic technique to generalize known narrow-band sequences and introduce sequences related by dilation and rotation of sequence generators. Our method improves known narrow-band sequences by decreasing both the pulse time and the residual error. Finally, we experimentally demonstrate these composite sequences using 40Ca+ ions trapped in a surface-electrode ion trap.

  4. Complete nucleotide sequences of a new bipartite begomovirus from Malvastrum sp. plants with bright yellow mosaic symptoms in South Texas.

    Science.gov (United States)

    Alabi, Olufemi J; Villegas, Cecilia; Gregg, Lori; Murray, K Daniel

    2016-06-01

    Two isolates of a novel bipartite begomovirus, tentatively named malvastrum bright yellow mosaic virus (MaBYMV), were molecularly characterized from naturally infected plants of the genus Malvastrum showing bright yellow mosaic disease symptoms in South Texas. Six complete DNA-A and five DNA-B genome sequences of MaBYMV obtained from the isolates ranged in length from 2,608 to 2,609 nucleotides (nt) and 2,578 to 2,605 nt, respectively. Both genome segments shared a 178- to 180-nt common region. In pairwise comparisons, the complete DNA-A and DNA-B sequences of MaBYMV were most similar (87-88 % and 79-81 % identity, respectively) and phylogenetically related to the corresponding sequences of sida mosaic Sinaloa virus-[MX-Gua-06]. Further analysis revealed that MaBYMV is a putative recombinant virus, thus supporting the notion that malvaceous hosts may be influencing the evolution of several begomoviruses. The design of new diagnostic primers enabled the detection of MaBYMV in cohorts of Bemisia tabaci collected from symptomatic Malvastrum sp. plants, thus implicating whiteflies as potential vectors of the virus.

  5. Grateloupia ramosa Wang & Luan sp. nov. (Halymeniaceae, Rhodophyta), a new species from China based on morphological evidence and comparative rbcL sequences

    Science.gov (United States)

    Cao, Cuicui; Liu, Miao; Guo, Shaoru; Zhao, Dan; Luan, Rixiao; Wang, Hongwei

    2016-03-01

    Grateloupia ramosa Wang & Luan sp. nov. (Halymeniaceae, Rhodophyta) is newly described from Hainan Province, southern China. The organism has the following morphological features: (1) purplish red, cartilaginous and lubricous thalli 5-10 cm in height; (2) compressed percurrent axes bearing abundant branches with opposite arrangement; (3) claw-like apices on top, constricted to 2-4 cm at the base; (4) cortex consisting of 3-6 layers of elliptical or anomalous cells and a medulla covered by compact medullary filaments; (5) reproductive structures distributed throughout the thallus, especially centralized at the bottom of the end portion of the branches; and (6) 4-celled Carpogonial branches and 3-celled auxiliary-cell branches, both of the Grateloupia-type. The morphological diff erences were supported by molecular phylogenetics based on ribulose-1, 5-bisphosphate carboxylase/oxygenase ( rbcL) gene sequence analysis. There was only a 1 bp divergence between specimens collected from Wenchang and Lingshui of Hainan province. The new species was embedded in the large Grateloupia clade of the Halymeniaceae. The pairwise distances between G. ramosa and other species within Grateloupia ranged from 26 to 105 bp, within pairwise distances of 13-111 bp between species of the large genus Grateloupia in Halymeniaceae. Thus, we propose this new species as G. ramosa Wang & Luan sp. nov.

  6. Genome Sequence Analysis of New Isolates of the Winona Strain of Plum pox virus and the First Definitive Evidence of Intrastrain Recombination Events.

    Science.gov (United States)

    James, Delano; Sanderson, Dan; Varga, Aniko; Sheveleva, Anna; Chirkov, Sergei

    2016-04-01

    Plum pox virus (PPV) is genetically diverse with nine different strains identified. Mutations, indel events, and interstrain recombination events are known to contribute to the genetic diversity of PPV. This is the first report of intrastrain recombination events that contribute to PPV's genetic diversity. Fourteen isolates of the PPV strain Winona (W) were analyzed including nine new strain W isolates sequenced completely in this study. Isolates of other strains of PPV with more than one isolate with the complete genome sequence available in GenBank were included also in this study for comparison and analysis. Five intrastrain recombination events were detected among the PPV W isolates, one among PPV C strain isolates, and one among PPV M strain isolates. Four (29%) of the PPV W isolates analyzed are recombinants; one of which (P2-1) is a mosaic, with three recombination events identified. A new interstrain recombinant event was identified between a strain M isolate and a strain Rec isolate, a known recombinant. In silico recombination studies and pairwise distance analyses of PPV strain D isolates indicate that a threshold of genetic diversity exists for the detectability of recombination events, in the range of approximately 0.78×10(-2) to 1.33×10(-2) mean pairwise distance. RDP4 analyses indicate that in the case of PPV Rec isolates there may be a recombinant breakpoint distinct from the obvious transition point of strain sequences. Evidence was obtained that indicates that the frequency of PPV recombination is underestimated, which may be true for other RNA viruses where low genetic diversity exists.

  7. Sequences, groups, and number theory

    CERN Document Server

    Rigo, Michel

    2018-01-01

    This collaborative book presents recent trends on the study of sequences, including combinatorics on words and symbolic dynamics, and new interdisciplinary links to group theory and number theory. Other chapters branch out from those areas into subfields of theoretical computer science, such as complexity theory and theory of automata. The book is built around four general themes: number theory and sequences, word combinatorics, normal numbers, and group theory. Those topics are rounded out by investigations into automatic and regular sequences, tilings and theory of computation, discrete dynamical systems, ergodic theory, numeration systems, automaton semigroups, and amenable groups.  This volume is intended for use by graduate students or research mathematicians, as well as computer scientists who are working in automata theory and formal language theory. With its organization around unified themes, it would also be appropriate as a supplemental text for graduate level courses.

  8. Explaining the harmonic sequence paradox.

    Science.gov (United States)

    Schmidt, Ulrich; Zimper, Alexander

    2012-05-01

    According to the harmonic sequence paradox, an expected utility decision maker's willingness to pay for a gamble whose expected payoffs evolve according to the harmonic series is finite if and only if his marginal utility of additional income becomes zero for rather low payoff levels. Since the assumption of zero marginal utility is implausible for finite payoff levels, expected utility theory - as well as its standard generalizations such as cumulative prospect theory - are apparently unable to explain a finite willingness to pay. This paper presents first an experimental study of the harmonic sequence paradox. Additionally, it demonstrates that the theoretical argument of the harmonic sequence paradox only applies to time-patient decision makers, whereas the paradox is easily avoided if time-impatience is introduced. ©2011 The British Psychological Society.

  9. Integrated sequence analysis. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Andersson, K.; Pyy, P

    1998-02-01

    The NKS/RAK subprojet 3 `integrated sequence analysis` (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term `methodology` denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  10. Matrix transformations and sequence spaces

    International Nuclear Information System (INIS)

    Nanda, S.

    1983-06-01

    In most cases the most general linear operator from one sequence space into another is actually given by an infinite matrix and therefore the theory of matrix transformations has always been of great interest in the study of sequence spaces. The study of general theory of matrix transformations was motivated by the special results in summability theory. This paper is a review article which gives almost all known results on matrix transformations. This also suggests a number of open problems for further study and will be very useful for research workers. (author)

  11. Green's theorem and Gorenstein sequences

    OpenAIRE

    Ahn, Jeaman; Migliore, Juan C.; Shin, Yong-Su

    2016-01-01

    We study consequences, for a standard graded algebra, of extremal behavior in Green's Hyperplane Restriction Theorem. First, we extend his Theorem 4 from the case of a plane curve to the case of a hypersurface in a linear space. Second, assuming a certain Lefschetz condition, we give a connection to extremal behavior in Macaulay's theorem. We apply these results to show that $(1,19,17,19,1)$ is not a Gorenstein sequence, and as a result we classify the sequences of the form $(1,a,a-2,a,1)$ th...

  12. Sequences in language and text

    CERN Document Server

    Mikros, George K

    2015-01-01

    The aim of this volume is to present the diverse but highly interesting area of the quantitative analysis of the sequence of various linguistic structures. The collected articles present a wide spectrum of quantitative analyses of linguistic syntagmatic structures and explore novel sequential linguistic entities. This volume will be interesting to all researchers studying linguistics using quantitative methods.

  13. Probabilistic studies of accident sequences

    International Nuclear Information System (INIS)

    Villemeur, A.; Berger, J.P.

    1986-01-01

    For several years, Electricite de France has carried out probabilistic assessment of accident sequences for nuclear power plants. In the framework of this program many methods were developed. As the interest in these studies was increasing and as adapted methods were developed, Electricite de France has undertaken a probabilistic safety assessment of a nuclear power plant [fr

  14. MRI sequences and their parameters

    International Nuclear Information System (INIS)

    Teissier, J.M.

    1993-01-01

    Listing basic sequences and their present variants makes a synthetic classification of the various acquisition modes possible. The knowledge of the advantages of each of them, as well as of their disadvantages and restraints, seems to be an essential prerequisite to an optimal utilization of each magnetic resonance imaging system. (author)

  15. Degree sequence in message transfer

    Science.gov (United States)

    Yamuna, M.

    2017-11-01

    Message encryption is always an issue in current communication scenario. Methods are being devised using various domains. Graphs satisfy numerous unique properties which can be used for message transfer. In this paper, I propose a message encryption method based on degree sequence of graphs.

  16. Fractals in DNA sequence analysis

    Institute of Scientific and Technical Information of China (English)

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  17. On primes in Lucas sequences

    Czech Academy of Sciences Publication Activity Database

    Křížek, Michal; Somer, L.

    2015-01-01

    Roč. 53, č. 1 (2015), s. 2-23 ISSN 0015-0517 R&D Projects: GA ČR GA14-02067S Institutional support: RVO:67985840 Keywords : Lucas sequence * primes Subject RIV: BA - General Mathematics http://www.fq.math.ca/Abstracts/53-1/somer.pdf

  18. Is sequence awareness mandatory for perceptual sequence learning: An assessment using a pure perceptual sequence learning design.

    Science.gov (United States)

    Deroost, Natacha; Coomans, Daphné

    2018-02-01

    We examined the role of sequence awareness in a pure perceptual sequence learning design. Participants had to react to the target's colour that changed according to a perceptual sequence. By varying the mapping of the target's colour onto the response keys, motor responses changed randomly. The effect of sequence awareness on perceptual sequence learning was determined by manipulating the learning instructions (explicit versus implicit) and assessing the amount of sequence awareness after the experiment. In the explicit instruction condition (n = 15), participants were instructed to intentionally search for the colour sequence, whereas in the implicit instruction condition (n = 15), they were left uninformed about the sequenced nature of the task. Sequence awareness after the sequence learning task was tested by means of a questionnaire and the process-dissociation-procedure. The results showed that the instruction manipulation had no effect on the amount of perceptual sequence learning. Based on their report to have actively applied their sequence knowledge during the experiment, participants were subsequently regrouped in a sequence strategy group (n = 14, of which 4 participants from the implicit instruction condition and 10 participants from the explicit instruction condition) and a no-sequence strategy group (n = 16, of which 11 participants from the implicit instruction condition and 5 participants from the explicit instruction condition). Only participants of the sequence strategy group showed reliable perceptual sequence learning and sequence awareness. These results indicate that perceptual sequence learning depends upon the continuous employment of strategic cognitive control processes on sequence knowledge. Sequence awareness is suggested to be a necessary but not sufficient condition for perceptual learning to take place. Copyright © 2018 Elsevier B.V. All rights reserved.

  19. Teaching Task Sequencing via Verbal Mediation.

    Science.gov (United States)

    Rusch, Frank R.; And Others

    1987-01-01

    Verbal sequence training was used to teach a moderately mentally retarded woman to sequence job-related tasks. Learning to say the tasks in the proper sequence resulted in the employee performing her tasks in that sequence, and the employee was capable of mediating her own work behavior when scheduled changes occurred. (Author/JDD)

  20. Repdigits in k-Lucas sequences

    Indian Academy of Sciences (India)

    57(2) 2000 243-254) proved that 11 is the largest number with only one distinct digit (the so-called repdigit) in the sequence ( L n ( 2 ) ) n . In this paper, we address a similar problem in the family of -Lucas sequences. We also show that the -Lucas sequences have similar properties to those of -Fibonacci sequences ...

  1. AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling.

    Science.gov (United States)

    Wang, Sheng; Sun, Siqi; Xu, Jinbo

    2016-09-01

    Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC.

  2. Nonparametric Inference for Periodic Sequences

    KAUST Repository

    Sun, Ying

    2012-02-01

    This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.

  3. Multi-qubit compensation sequences

    International Nuclear Information System (INIS)

    Tomita, Y; Merrill, J T; Brown, K R

    2010-01-01

    The Hamiltonian control of n qubits requires precision control of both the strength and timing of interactions. Compensation pulses relax the precision requirements by reducing unknown but systematic errors. Using composite pulse techniques designed for single qubits, we show that systematic errors for n-qubit systems can be corrected to arbitrary accuracy given either two non-commuting control Hamiltonians with identical systematic errors or one error-free control Hamiltonian. We also examine composite pulses in the context of quantum computers controlled by two-qubit interactions. For quantum computers based on the XY interaction, single-qubit composite pulse sequences naturally correct systematic errors. For quantum computers based on the Heisenberg or exchange interaction, the composite pulse sequences reduce the logical single-qubit gate errors but increase the errors for logical two-qubit gates.

  4. Cassini Mission Sequence Subsystem (MSS)

    Science.gov (United States)

    Alland, Robert

    2011-01-01

    This paper describes my work with the Cassini Mission Sequence Subsystem (MSS) team during the summer of 2011. It gives some background on the motivation for this project and describes the expected benefit to the Cassini program. It then introduces the two tasks that I worked on - an automatic system auditing tool and a series of corrections to the Cassini Sequence Generator (SEQ_GEN) - and the specific objectives these tasks were to accomplish. Next, it details the approach I took to meet these objectives and the results of this approach, followed by a discussion of how the outcome of the project compares with my initial expectations. The paper concludes with a summary of my experience working on this project, lists what the next steps are, and acknowledges the help of my Cassini colleagues.

  5. Sequence complexity and work extraction

    International Nuclear Information System (INIS)

    Merhav, Neri

    2015-01-01

    We consider a simplified version of a solvable model by Mandal and Jarzynski, which constructively demonstrates the interplay between work extraction and the increase of the Shannon entropy of an information reservoir which is in contact with a physical system. We extend Mandal and Jarzynski’s main findings in several directions: first, we allow sequences of correlated bits rather than just independent bits. Secondly, at least for the case of binary information, we show that, in fact, the Shannon entropy is only one measure of complexity of the information that must increase in order for work to be extracted. The extracted work can also be upper bounded in terms of the increase in other quantities that measure complexity, like the predictability of future bits from past ones. Third, we provide an extension to the case of non-binary information (i.e. a larger alphabet), and finally, we extend the scope to the case where the incoming bits (before the interaction) form an individual sequence, rather than a random one. In this case, the entropy before the interaction can be replaced by the Lempel–Ziv (LZ) complexity of the incoming sequence, a fact that gives rise to an entropic meaning of the LZ complexity, not only in information theory, but also in physics. (paper)

  6. Entropic fluctuations in DNA sequences

    Science.gov (United States)

    Thanos, Dimitrios; Li, Wentian; Provata, Astero

    2018-03-01

    The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.

  7. Two-stage clustering (TSC: a pipeline for selecting operational taxonomic units for the high-throughput sequencing of PCR amplicons.

    Directory of Open Access Journals (Sweden)

    Xiao-Tao Jiang

    Full Text Available Clustering 16S/18S rRNA amplicon sequences into operational taxonomic units (OTUs is a critical step for the bioinformatic analysis of microbial diversity. Here, we report a pipeline for selecting OTUs with a relatively low computational demand and a high degree of accuracy. This pipeline is referred to as two-stage clustering (TSC because it divides tags into two groups according to their abundance and clusters them sequentially. The more abundant group is clustered using a hierarchical algorithm similar to that in ESPRIT, which has a high degree of accuracy but is computationally costly for large datasets. The rarer group, which includes the majority of tags, is then heuristically clustered to improve efficiency. To further improve the computational efficiency and accuracy, two preclustering steps are implemented. To maintain clustering accuracy, all tags are grouped into an OTU depending on their pairwise Needleman-Wunsch distance. This method not only improved the computational efficiency but also mitigated the spurious OTU estimation from 'noise' sequences. In addition, OTUs clustered using TSC showed comparable or improved performance in beta-diversity comparisons compared to existing OTU selection methods. This study suggests that the distribution of sequencing datasets is a useful property for improving the computational efficiency and increasing the clustering accuracy of the high-throughput sequencing of PCR amplicons. The software and user guide are freely available at http://hwzhoulab.smu.edu.cn/paperdata/.

  8. Grateloupia tenuis Wang et Luan sp. nov. (Halymeniaceae, Rhodophyta: A New Species from South China Sea Based on Morphological Observation and rbcL Gene Sequences Analysis

    Directory of Open Access Journals (Sweden)

    Ling Yu

    2013-01-01

    Full Text Available Grateloupia tenuis Wang et Luan sp. nov. is a new species described from Lingshui, Hainan Province, South China Sea. Based on the external form and internal structure, combined with rbcL gene sequence analysis, Grateloupia tenuis is distinct from other Grateloupia species as follows: (1 thalli is slippery and cartilaginous in texture; possess fewer branches, relatively slight main axes, and two or three dichotomous branches; (2 cortex is 5-6 layers; medulla is solid when young, but hollow in old branches; reproductive structures are dispersed in main axes of thalli and lower portions of branchlets; exhibits Grateloupia-type auxiliary cell ampullae; (3 the four studied G. tenuis sequences were positioned in a large Grateloupia clade of Halymeniaceae, which included sister group generitype G. filicina with 68 bp differences; G. tenuis was determined to be a sister taxon to the G. catenata, G. ramosissima, G. orientalis, and G. filiformis subclade. The pairwise distances between G. tenuis and these species were 39 to 50 bp. The sequences of G. tenuis differed by 81–108 bp from the sequences of other samples in Grateloupia; there are 114–133 bp changes between G. tenuis and other genera of Halymeniaceae. In final analysis, we considered Grateloupia tenuis Wang et Luan sp. nov. to be a new species of genus Grateloupia.

  9. Grateloupia tenuis Wang et Luan sp. nov. (Halymeniaceae, Rhodophyta): a new species from South China Sea based on morphological observation and rbcL gene sequences analysis.

    Science.gov (United States)

    Yu, Ling; Wang, Hongwei; Luan, Rixiao

    2013-01-01

    Grateloupia tenuis Wang et Luan sp. nov. is a new species described from Lingshui, Hainan Province, South China Sea. Based on the external form and internal structure, combined with rbcL gene sequence analysis, Grateloupia tenuis is distinct from other Grateloupia species as follows: (1) thalli is slippery and cartilaginous in texture; possess fewer branches, relatively slight main axes, and two or three dichotomous branches; (2) cortex is 5-6 layers; medulla is solid when young, but hollow in old branches; reproductive structures are dispersed in main axes of thalli and lower portions of branchlets; exhibits Grateloupia-type auxiliary cell ampullae; (3) the four studied G. tenuis sequences were positioned in a large Grateloupia clade of Halymeniaceae, which included sister group generitype G. filicina with 68 bp differences; G. tenuis was determined to be a sister taxon to the G. catenata, G. ramosissima, G. orientalis, and G. filiformis subclade. The pairwise distances between G. tenuis and these species were 39 to 50 bp. The sequences of G. tenuis differed by 81-108 bp from the sequences of other samples in Grateloupia; there are 114-133 bp changes between G. tenuis and other genera of Halymeniaceae. In final analysis, we considered Grateloupia tenuis Wang et Luan sp. nov. to be a new species of genus Grateloupia.

  10. Grateloupia tenuis Wang et Luan sp. nov. (Halymeniaceae, Rhodophyta): A New Species from South China Sea Based on Morphological Observation and rbcL Gene Sequences Analysis

    Science.gov (United States)

    Wang, Hongwei; Luan, Rixiao

    2013-01-01

    Grateloupia tenuis Wang et Luan sp. nov. is a new species described from Lingshui, Hainan Province, South China Sea. Based on the external form and internal structure, combined with rbcL gene sequence analysis, Grateloupia tenuis is distinct from other Grateloupia species as follows: (1) thalli is slippery and cartilaginous in texture; possess fewer branches, relatively slight main axes, and two or three dichotomous branches; (2) cortex is 5-6 layers; medulla is solid when young, but hollow in old branches; reproductive structures are dispersed in main axes of thalli and lower portions of branchlets; exhibits Grateloupia-type auxiliary cell ampullae; (3) the four studied G. tenuis sequences were positioned in a large Grateloupia clade of Halymeniaceae, which included sister group generitype G. filicina with 68 bp differences; G. tenuis was determined to be a sister taxon to the G. catenata, G. ramosissima, G. orientalis, and G. filiformis subclade. The pairwise distances between G. tenuis and these species were 39 to 50 bp. The sequences of G. tenuis differed by 81–108 bp from the sequences of other samples in Grateloupia; there are 114–133 bp changes between G. tenuis and other genera of Halymeniaceae. In final analysis, we considered Grateloupia tenuis Wang et Luan sp. nov. to be a new species of genus Grateloupia. PMID:24455703

  11. Method and apparatus for biological sequence comparison

    Science.gov (United States)

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  12. Memory and learning with rapid audiovisual sequences

    Science.gov (United States)

    Keller, Arielle S.; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed. PMID:26575193

  13. Memory and learning with rapid audiovisual sequences.

    Science.gov (United States)

    Keller, Arielle S; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed.

  14. Multineuronal Spike Sequences Repeat with Millisecond Precision

    Directory of Open Access Journals (Sweden)

    Koki eMatsumoto

    2013-06-01

    Full Text Available Cortical microcircuits are nonrandomly wired by neurons. As a natural consequence, spikes emitted by microcircuits are also nonrandomly patterned in time and space. One of the prominent spike organizations is a repetition of fixed patterns of spike series across multiple neurons. However, several questions remain unsolved, including how precisely spike sequences repeat, how the sequences are spatially organized, how many neurons participate in sequences, and how different sequences are functionally linked. To address these questions, we monitored spontaneous spikes of hippocampal CA3 neurons ex vivo using a high-speed functional multineuron calcium imaging technique that allowed us to monitor spikes with millisecond resolution and to record the location of spiking and nonspiking neurons. Multineuronal spike sequences were overrepresented in spontaneous activity compared to the statistical chance level. Approximately 75% of neurons participated in at least one sequence during our observation period. The participants were sparsely dispersed and did not show specific spatial organization. The number of sequences relative to the chance level decreased when larger time frames were used to detect sequences. Thus, sequences were precise at the millisecond level. Sequences often shared common spikes with other sequences; parts of sequences were subsequently relayed by following sequences, generating complex chains of multiple sequences.

  15. Origin and spread of photosynthesis based upon conserved sequence features in key bacteriochlorophyll biosynthesis proteins.

    Science.gov (United States)

    Gupta, Radhey S

    2012-11-01

    The origin of photosynthesis and how this capability has spread to other bacterial phyla remain important unresolved questions. I describe here a number of conserved signature indels (CSIs) in key proteins involved in bacteriochlorophyll (Bchl) biosynthesis that provide important insights in these regards. The proteins BchL and BchX, which are essential for Bchl biosynthesis, are derived by gene duplication in a common ancestor of all phototrophs. More ancient gene duplication gave rise to the BchX-BchL proteins and the NifH protein of the nitrogenase complex. The sequence alignment of NifH-BchX-BchL proteins contain two CSIs that are uniquely shared by all NifH and BchX homologs, but not by any BchL homologs. These CSIs and phylogenetic analysis of NifH-BchX-BchL protein sequences strongly suggest that the BchX homologs are ancestral to BchL and that the Bchl-based anoxygenic photosynthesis originated prior to the chlorophyll (Chl)-based photosynthesis in cyanobacteria. Another CSI in the BchX-BchL sequence alignment that is uniquely shared by all BchX homologs and the BchL sequences from Heliobacteriaceae, but absent in all other BchL homologs, suggests that the BchL homologs from Heliobacteriaceae are primitive in comparison to all other photosynthetic lineages. Several other identified CSIs in the BchN homologs are commonly shared by all proteobacterial homologs and a clade consisting of the marine unicellular Cyanobacteria (Clade C). These CSIs in conjunction with the results of phylogenetic analyses and pair-wise sequence similarity on the BchL, BchN, and BchB proteins, where the homologs from Clade C Cyanobacteria and Proteobacteria exhibited close relationship, provide strong evidence that these two groups have incurred lateral gene transfers. Additionally, phylogenetic analyses and several CSIs in the BchL-N-B proteins that are uniquely shared by all Chlorobi and Chloroflexi homologs provide evidence that the genes for these proteins have also been

  16. Loss of genetic variability in a hatchery strain of Senegalese sole (Solea senegalensis revealed by sequence data of the mitochondrial DNA control region and microsatellite markers

    Directory of Open Access Journals (Sweden)

    Pablo Sánchez

    2012-06-01

    Full Text Available Comparisons of the levels of genetic variation within and between a hatchery F1 (FAR, n=116 of Senegalese sole, Solea senegalensis, and its wild donor population (ATL, n = 26, both native to the SW Atlantic coast of the Iberian peninsula, as well as between the wild donor population and a wild western Mediterranean sample (MED, n=18, were carried out by characterizing 412 base pairs of the nucleotide sequence of the mitochondrial DNA control region I, and six polymorphic microsatellite loci. FAR showed a substantial loss of genetic variability (haplotypic diversity, h=0.49±0.066; nucleotide diversity, π=0.006±0.004; private allelic richness, pAg=0.28 to its donor population ATL (h=0.69±0.114; π=0.009±0.006; pAg=1.21. Pairwise FST values of microsatellite data were highly significant (P < 0.0001 between FAR and ATL (0.053 and FAR and MED (0.055. The comparison of wild samples revealed higher values of genetic variability in MED than in ATL, but only with mtDNA CR-I sequence data (h=0.948±0.033; π=0.030±0.016. However, pairwise ΦST and FST values between ATL and MED were highly significant (P < 0.0001 with mtDNA CR-I (0.228 and with microsatellite data (0.095, respectively. While loss of genetic variability in FAR could be associated with the sampling error when the broodstock was established, the results of parental and sibship inference suggest that most of these losses can be attributed to a high variance in reproductive success among members of the broodstock, particularly among females.

  17. Static multiplicities in heterogeneous azeotropic distillation sequences

    DEFF Research Database (Denmark)

    Esbjerg, Klavs; Andersen, Torben Ravn; Jørgensen, Sten Bay

    1998-01-01

    In this paper the results of a bifurcation analysis on heterogeneous azeotropic distillation sequences are given. Two sequences suitable for ethanol dehydration are compared: The 'direct' and the 'indirect' sequence. It is shown, that the two sequences, despite their similarities, exhibit very...... different static behavior. The method of Petlyuk and Avet'yan (1971), Bekiaris et al. (1993), which assumes infinite reflux and infinite number of stages, is extended to and applied on heterogeneous azeotropic distillation sequences. The predictions are substantiated through simulations. The static sequence...

  18. Blind sequence-length estimation of low-SNR cyclostationary sequences

    CSIR Research Space (South Africa)

    Vlok, JD

    2014-06-01

    Full Text Available Several existing direct-sequence spread spectrum (DSSS) detection and estimation algorithms assume prior knowledge of the symbol period or sequence length, although very few sequence-length estimation techniques are available in the literature...

  19. Measuring fit of sequence data to phylogenetic model: gain of power using marginal tests.

    Science.gov (United States)

    Waddell, Peter J; Ota, Rissa; Penny, David

    2009-10-01

    Testing fit of data to model is fundamentally important to any science, but publications in the field of phylogenetics rarely do this. Such analyses discard fundamental aspects of science as prescribed by Karl Popper. Indeed, not without cause, Popper (Unended quest: an intellectual autobiography. Fontana, London, 1976) once argued that evolutionary biology was unscientific as its hypotheses were untestable. Here we trace developments in assessing fit from Penny et al. (Nature 297:197-200, 1982) to the present. We compare the general log-likelihood ratio (the G or G (2) statistic) statistic between the evolutionary tree model and the multinomial model with that of marginalized tests applied to an alignment (using placental mammal coding sequence data). It is seen that the most general test does not reject the fit of data to model (P approximately 0.5), but the marginalized tests do. Tests on pairwise frequency (F) matrices, strongly (P < 0.001) reject the most general phylogenetic (GTR) models commonly in use. It is also clear (P < 0.01) that the sequences are not stationary in their nucleotide composition. Deviations from stationarity and homogeneity seem to be unevenly distributed amongst taxa; not necessarily those expected from examining other regions of the genome. By marginalizing the 4( t ) patterns of the i.i.d. model to observed and expected parsimony counts, that is, from constant sites, to singletons, to parsimony informative characters of a minimum possible length, then the likelihood ratio test regains power, and it too rejects the evolutionary model with P < 0.001. Given such behavior over relatively recent evolutionary time, readers in general should maintain a healthy skepticism of results, as the scale of the systematic errors in published trees may really be far larger than the analytical methods (e.g., bootstrap) report.

  20. Complete sequence analysis of 18S rDNA based on genomic DNA extraction from individual Demodex mites (Acari: Demodicidae).

    Science.gov (United States)

    Zhao, Ya-E; Xu, Ji-Ru; Hu, Li; Wu, Li-Ping; Wang, Zheng-Hang

    2012-05-01

    The study for the first time attempted to accomplish 18S ribosomal DNA (rDNA) complete sequence amplification and analysis for three Demodex species (Demodex folliculorum, Demodex brevis and Demodex canis) based on gDNA extraction from individual mites. The mites were treated by DNA Release Additive and Hot Start II DNA Polymerase so as to promote mite disruption and increase PCR specificity. Determination of D. folliculorum gDNA showed that the gDNA yield reached the highest at 1 mite, tending to descend with the increase of mite number. The individual mite gDNA was successfully used for 18S rDNA fragment (about 900 bp) amplification examination. The alignments of 18S rDNA complete sequences of individual mite samples and those of pooled mite samples ( ≥ 1000mites/sample) showed over 97% identities for each species, indicating that the gDNA extracted from a single individual mite was as satisfactory as that from pooled mites for PCR amplification. Further pairwise sequence analyses showed that average divergence, genetic distance, transition/transversion or phylogenetic tree could not effectively identify the three Demodex species, largely due to the differentiation in the D. canis isolates. It can be concluded that the individual Demodex mite gDNA can satisfy the molecular study of Demodex. 18S rDNA complete sequence is suitable for interfamily identification in Cheyletoidea, but whether it is suitable for intrafamily identification cannot be confirmed until the ascertainment of the types of Demodex mites parasitizing in dogs. Copyright © 2012 Elsevier Inc. All rights reserved.

  1. Genetic variability of Echinococcus granulosus complex in various geographical populations of Iran inferred by mitochondrial DNA sequences.

    Science.gov (United States)

    Spotin, Adel; Mahami-Oskouei, Mahmoud; Harandi, Majid Fasihi; Baratchian, Mehdi; Bordbar, Ali; Ahmadpour, Ehsan; Ebrahimi, Sahar

    2017-01-01

    To investigate the genetic variability and population structure of Echinococcus granulosus complex, 79 isolates were sequenced from different host species covering human, dog, camel, goat, sheep and cattle as of various geographical sub-populations of Iran (Northwestern, Northern, and Southeastern). In addition, 36 sequences of other geographical populations (Western, Southeastern and Central Iran), were directly retrieved from GenBank database for the mitochondrial cytochrome c oxidase subunit 1 (cox1) gene. The confirmed isolates were grouped as G1 genotype (n=92), G6 genotype (n=14), G3 genotype (n=8) and G2 genotype (n=1). 50 unique haplotypes were identified based on the analyzed sequences of cox1. A parsimonious network of the sequence haplotypes displayed star-like features in the overall population containing IR23 (22: 19.1%) as the most common haplotype. According to the analysis of molecular variance (AMOVA) test, the high value of haplotype diversity of E. granulosus complex was shown the total genetic variability within populations while nucleotide diversity was low in all populations. Neutrality indices of the cox1 (Tajima's D and Fu's Fs tests) were shown negative values in Western-Northwestern, Northern and Southeastern populations which indicating significant divergence from neutrality and positive but not significant in Central isolates. A pairwise fixation index (Fst) as a degree of gene flow was generally low value for all populations (0.00647-0.15198). The statistically Fst values indicate that Echinococcus sensu stricto (genotype G1-G3) populations are not genetically well differentiated in various geographical regions of Iran. To appraise the hypothetical evolutionary scenario, further study is needed to analyze concatenated mitogenomes and as well a panel of single locus nuclear markers should be considered in wider areas of Iran and neighboring countries. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. Infinite matrices and sequence spaces

    CERN Document Server

    Cooke, Richard G

    2014-01-01

    This clear and correct summation of basic results from a specialized field focuses on the behavior of infinite matrices in general, rather than on properties of special matrices. Three introductory chapters guide students to the manipulation of infinite matrices, covering definitions and preliminary ideas, reciprocals of infinite matrices, and linear equations involving infinite matrices.From the fourth chapter onward, the author treats the application of infinite matrices to the summability of divergent sequences and series from various points of view. Topics include consistency, mutual consi

  3. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed; Mansour, Essam; Kalnis, Panos

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern

  4. Computational analysis of sequence selection mechanisms.

    Science.gov (United States)

    Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

    2004-04-01

    Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.

  5. The recurrence sequences via Sylvester matrices

    Science.gov (United States)

    Karaduman, Erdal; Deveci, Ömür

    2017-07-01

    In this work, we define the Pell-Jacobsthal-Slyvester sequence and the Jacobsthal-Pell-Slyvester sequence by using the Slyvester matrices which are obtained from the characteristic polynomials of the Pell and Jacobsthal sequences and then, we study the sequences defined modulo m. Also, we obtain the cyclic groups and the semigroups from the generating matrices of these sequences when read modulo m and then, we derive the relationships among the orders of the cyclic groups and the periods of the sequences. Furthermore, we redefine Pell-Jacobsthal-Slyvester sequence and the Jacobsthal-Pell-Slyvester sequence by means of the elements of the groups and then, we examine them in the finite groups.

  6. ON SOME RECURRENCE TYPE SMARANDACHE SEQUENCES

    OpenAIRE

    MAJUMDAR, A.A.K.; GUNARTO, H.

    2000-01-01

    In this paper, we study some properties of ten recurrence type Smarandache sequences, namely, the Smarandache odd, even, prime product, square product, higher-power product, permutation, consecutive, reverse, symmetric, and pierced chain sequences.

  7. "First generation" automated DNA sequencing technology.

    Science.gov (United States)

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  8. Comparative analysis of sequences from PT 2013

    DEFF Research Database (Denmark)

    Mikkelsen, Susie Sommer

    Sheatfish and not EHNV. Generally, mistakes occurred at the ends of the sequences. This can be due to several factors. One is that the sequence has not been trimmed of the sequence primer sites. Another is the lack of quality control of the chromatogram. Finally, sequencing in just one direction can result...... diseases in Europe. As part of the EURL proficiency test for fish diseases it is required to sequence any RANA virus isolates found in any of the samples. It is also highly recommended to sequence the ISA virus to determine whether it be HPRΔ or HPR0. Furthermore, it is recommended that any VHSV and IHNV...... isolates be genotyped. As part of the evaluation of the proficiency results it was decided this year to look into the quality and similarity of the sequence results for selected viruses. Ampoule III in the proficiency test 2013 contained an EHNV isolate. The EURL received 43 sequences from 41 laboratories...

  9. Perfect sequences over the real quaternions

    OpenAIRE

    Kuznetsov, Oleg

    2017-01-01

    In this Thesis, perfect sequences over the real quaternions are first considered. Definitions for the right and left periodic autocorrelation functions are given, and right and left perfect sequences introduced. It is shown that the right (left) perfection of any sequence implies the left (right) perfection, so concepts of right and left perfect sequences over the real quaternions are equivalent. Unitary transformations of the quaternion space ℍ are then considered. Using the equivalence of t...

  10. Information decomposition method to analyze symbolical sequences

    International Nuclear Information System (INIS)

    Korotkov, E.V.; Korotkova, M.A.; Kudryashov, N.A.

    2003-01-01

    The information decomposition (ID) method to analyze symbolical sequences is presented. This method allows us to reveal a latent periodicity of any symbolical sequence. The ID method is shown to have advantages in comparison with application of the Fourier transformation, the wavelet transform and the dynamic programming method to look for latent periodicity. Examples of the latent periods for poetic texts, DNA sequences and amino acids are presented. Possible origin of a latent periodicity for different symbolical sequences is discussed

  11. Parallel sequencing lives, or what makes large sequencing projects successful.

    Science.gov (United States)

    Quilez, Javier; Vidal, Enrique; Dily, François Le; Serra, François; Cuartero, Yasmina; Stadhouders, Ralph; Graf, Thomas; Marti-Renom, Marc A; Beato, Miguel; Filion, Guillaume

    2017-11-01

    T47D_rep2 and b1913e6c1_51720e9cf were 2 Hi-C samples. They were born and processed at the same time, yet their fates were very different. The life of b1913e6c1_51720e9cf was simple and fruitful, while that of T47D_rep2 was full of accidents and sorrow. At the heart of these differences lies the fact that b1913e6c1_51720e9cf was born under a lab culture of Documentation, Automation, Traceability, and Autonomy and compliance with the FAIR Principles. Their lives are a lesson for those who wish to embark on the journey of managing high-throughput sequencing data. © The Author 2017. Published by Oxford University Press.

  12. MatrixPlot: visualizing sequence constraints

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Stærfeldt, Hans Henrik; Lund, Ole

    1999-01-01

    MatrixPlot: visualizing sequence constraints. Sub-title Abstract Summary : MatrixPlot is a program for making high-quality matrix plots, such as mutual information plots of sequence alignments and distance matrices of sequences with known three-dimensional coordinates. The user can add information...

  13. Comparative genomics beyond sequence-based alignments

    DEFF Research Database (Denmark)

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...

  14. DNA sequence modeling based on context trees

    NARCIS (Netherlands)

    Kusters, C.J.; Ignatenko, T.; Roland, J.; Horlin, F.

    2015-01-01

    Genomic sequences contain instructions for protein and cell production. Therefore understanding and identification of biologically and functionally meaningful patterns in DNA sequences is of paramount importance. Modeling of DNA sequences in its turn can help to better understand and identify such

  15. Compact flow diagrams for state sequences

    NARCIS (Netherlands)

    Buchin, K.A.; Buchin, M.E.; Gudmundsson, J.; Horton, M.J.; Sijben, S.

    2016-01-01

    We introduce the concept of compactly representing a large number of state sequences, e.g., sequences of activities, as a flow diagram. We argue that the flow diagram representation gives an intuitive summary that allows the user to detect patterns among large sets of state sequences. Simplified,

  16. Blazar Sequence in Fermi Era Liang Chen

    Indian Academy of Sciences (India)

    Abstract. In this paper, we review the latest research results on the topic of blazar sequence. It seems that the blazar sequence is phenomenally ruled out, while the theoretical blazar sequence still holds. We point out that black hole mass is a dominated parameter accounting for high-power- high-synchrotron-peaked and ...

  17. Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

    Science.gov (United States)

    Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

    2011-01-01

    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

  18. Permutation Entropy for Random Binary Sequences

    Directory of Open Access Journals (Sweden)

    Lingfeng Liu

    2015-12-01

    Full Text Available In this paper, we generalize the permutation entropy (PE measure to binary sequences, which is based on Shannon’s entropy, and theoretically analyze this measure for random binary sequences. We deduce the theoretical value of PE for random binary sequences, which can be used to measure the randomness of binary sequences. We also reveal the relationship between this PE measure with other randomness measures, such as Shannon’s entropy and Lempel–Ziv complexity. The results show that PE is consistent with these two measures. Furthermore, we use PE as one of the randomness measures to evaluate the randomness of chaotic binary sequences.

  19. Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences

    Directory of Open Access Journals (Sweden)

    Holland Barbara R

    2006-07-01

    Full Text Available Abstract Background Phylogenetic methods which do not rely on multiple sequence alignments are important tools in inferring trees directly from completely sequenced genomes. Here, we extend the recently described Genome BLAST Distance Phylogeny (GBDP strategy to compute phylogenetic trees from all completely sequenced plastid genomes currently available and from a selection of mitochondrial genomes representing the major eukaryotic lineages. BLASTN, TBLASTX, or combinations of both are used to locate high-scoring segment pairs (HSPs between two sequences from which pairwise similarities and distances are computed in different ways resulting in a total of 96 GBDP variants. The suitability of these distance formulae for phylogeny reconstruction is directly estimated by computing a recently described measure of "treelikeness", the so-called δ value, from the respective distance matrices. Additionally, we compare the trees inferred from these matrices using UPGMA, NJ, BIONJ, FastME, or STC, respectively, with the NCBI taxonomy tree of the taxa under study. Results Our results indicate that, at this taxonomic level, plastid genomes are much more valuable for inferring phylogenies than are mitochondrial genomes, and that distances based on breakpoints are of little use. Distances based on the proportion of "matched" HSP length to average genome length were best for tree estimation. Additionally we found that using TBLASTX instead of BLASTN and, particularly, combining TBLASTX and BLASTN leads to a small but significant increase in accuracy. Other factors do not significantly affect the phylogenetic outcome. The BIONJ algorithm results in phylogenies most in accordance with the current NCBI taxonomy, with NJ and FastME performing insignificantly worse, and STC performing as well if applied to high quality distance matrices. δ values are found to be a reliable predictor of phylogenetic accuracy. Conclusion Using the most treelike distance matrices, as

  20. PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids.

    Science.gov (United States)

    Kuznetsov, Igor B; McDuffie, Michael

    2015-05-07

    Alignment of amino acid sequences is the main sequence comparison method used in computational molecular biology. The selection of the amino acid substitution matrix best suitable for a given alignment problem is one of the most important decisions the user has to make. In a conventional amino acid substitution matrix all elements are fixed and their values cannot be easily adjusted. Moreover, most existing amino acid substitution matrices account for the average (dis)similarities between amino acid types and do not distinguish the contribution of a specific biochemical property to these (dis)similarities. PR2ALIGN is a stand-alone software program and a web-server that provide the functionality for implementing flexible user-specified alignment scoring functions and aligning pairs of amino acid sequences based on the comparison of the profiles of biochemical properties of these sequences. Unlike the conventional sequence alignment methods that use 20x20 fixed amino acid substitution matrices, PR2ALIGN uses a set of weighted biochemical properties of amino acids to measure the distance between pairs of aligned residues and to find an optimal minimal distance global alignment. The user can provide any number of amino acid properties and specify a weight for each property. The higher the weight for a given property, the more this property affects the final alignment. We show that in many cases the approach implemented in PR2ALIGN produces better quality pair-wise alignments than the conventional matrix-based approach. PR2ALIGN will be helpful for researchers who wish to align amino acid sequences by using flexible user-specified alignment scoring functions based on the biochemical properties of amino acids instead of the amino acid substitution matrix. To the best of the authors' knowledge, there are no existing stand-alone software programs or web-servers analogous to PR2ALIGN. The software is freely available from http://pr2align.rit.albany.edu.

  1. The 2016 Kumamoto earthquake sequence.

    Science.gov (United States)

    Kato, Aitaro; Nakamura, Kouji; Hiyama, Yohei

    2016-01-01

    Beginning in April 2016, a series of shallow, moderate to large earthquakes with associated strong aftershocks struck the Kumamoto area of Kyushu, SW Japan. An M j 7.3 mainshock occurred on 16 April 2016, close to the epicenter of an M j 6.5 foreshock that occurred about 28 hours earlier. The intense seismicity released the accumulated elastic energy by right-lateral strike slip, mainly along two known, active faults. The mainshock rupture propagated along multiple fault segments with different geometries. The faulting style is reasonably consistent with regional deformation observed on geologic timescales and with the stress field estimated from seismic observations. One striking feature of this sequence is intense seismic activity, including a dynamically triggered earthquake in the Oita region. Following the mainshock rupture, postseismic deformation has been observed, as well as expansion of the seismicity front toward the southwest and northwest.

  2. Data selector group sequencer interface

    International Nuclear Information System (INIS)

    Zizka, G.; Turko, B.

    1984-01-01

    A CAMAC-based module for high rate data selection and transfer to Tracor Northern TN-1700 multichannel analysis system is described. The module can select any group of 4096 consecutive addresses of events, in the range of 24 bits. This module solves the problem of connecting a number of time digitizing systems to the memory of a multichannel analyzer. Continuous processing rate up to 200,000 events per second along with the live display make the testing of the above systems very efficient and relatively inexpensive. The module also can be programmed for storing the preset group of addresses into more than one section of the memory. The events are analyzed in each section of the memory during the preset time. Multiple spectra can thus be taken automatically in a sequence

  3. A main sequence for quasars

    Science.gov (United States)

    Marziani, Paola; Dultzin, Deborah; Sulentic, Jack W.; Del Olmo, Ascensión; Negrete, C. A.; Martínez-Aldama, Mary L.; D'Onofrio, Mauro; Bon, Edi; Bon, Natasa; Stirpe, Giovanna M.

    2018-03-01

    The last 25 years saw a major step forward in the analysis of optical and UV spectroscopic data of large quasar samples. Multivariate statistical approaches have led to the definition of systematic trends in observational properties that are the basis of physical and dynamical modeling of quasar structure. We discuss the empirical correlates of the so-called “main sequence” associated with the quasar Eigenvector 1, its governing physical parameters and several implications on our view of the quasar structure, as well as some luminosity effects associated with the virialized component of the line emitting regions. We also briefly discuss quasars in a segment of the main sequence that includes the strongest FeII emitters. These sources show a small dispersion around a well-defined Eddington ratio value, a property which makes them potential Eddington standard candles.

  4. The 2016 Kumamoto earthquake sequence

    Science.gov (United States)

    KATO, Aitaro; NAKAMURA, Kouji; HIYAMA, Yohei

    2016-01-01

    Beginning in April 2016, a series of shallow, moderate to large earthquakes with associated strong aftershocks struck the Kumamoto area of Kyushu, SW Japan. An Mj 7.3 mainshock occurred on 16 April 2016, close to the epicenter of an Mj 6.5 foreshock that occurred about 28 hours earlier. The intense seismicity released the accumulated elastic energy by right-lateral strike slip, mainly along two known, active faults. The mainshock rupture propagated along multiple fault segments with different geometries. The faulting style is reasonably consistent with regional deformation observed on geologic timescales and with the stress field estimated from seismic observations. One striking feature of this sequence is intense seismic activity, including a dynamically triggered earthquake in the Oita region. Following the mainshock rupture, postseismic deformation has been observed, as well as expansion of the seismicity front toward the southwest and northwest. PMID:27725474

  5. A Main Sequence for Quasars

    Directory of Open Access Journals (Sweden)

    Paola Marziani

    2018-03-01

    Full Text Available The last 25 years saw a major step forward in the analysis of optical and UV spectroscopic data of large quasar samples. Multivariate statistical approaches have led to the definition of systematic trends in observational properties that are the basis of physical and dynamical modeling of quasar structure. We discuss the empirical correlates of the so-called “main sequence” associated with the quasar Eigenvector 1, its governing physical parameters and several implications on our view of the quasar structure, as well as some luminosity effects associated with the virialized component of the line emitting regions. We also briefly discuss quasars in a segment of the main sequence that includes the strongest FeII emitters. These sources show a small dispersion around a well-defined Eddington ratio value, a property which makes them potential Eddington standard candles.

  6. RANDNA: a random DNA sequence generator.

    Science.gov (United States)

    Piva, Francesco; Principato, Giovanni

    2006-01-01

    Monte Carlo simulations are useful to verify the significance of data. Genomic regularities, such as the nucleotide correlations or the not uniform distribution of the motifs throughout genomic or mature mRNA sequences, exist and their significance can be checked by means of the Monte Carlo test. The test needs good quality random sequences in order to work, moreover they should have the same nucleotide distribution as the sequences in which the regularities have been found. Random DNA sequences are also useful to estimate the background score of an alignment, that is a threshold below which the resulting score is merely due to chance. We have developed RANDNA, a free software which allows to produce random DNA or RNA sequences setting both their length and the percentage of nucleotide composition. Sequences having the same nucleotide distribution of exonic, intronic or intergenic sequences can be generated. Its graphic interface makes it possible to easily set the parameters that characterize the sequences being produced and saved in a text format file. The pseudo-random number generator function of Borland Delphi 6 is used, since it guarantees a good randomness, a long cycle length and a high speed. We have checked the quality of sequences generated by the software, by means of well-known tests, both by themselves and versus genuine random sequences. We show the good quality of the generated sequences. The software, complete with examples and documentation, is freely available to users from: http://www.introni.it/en/software.

  7. Locomotor sequence learning in visually guided walking

    DEFF Research Database (Denmark)

    Choi, Julia T; Jensen, Peter; Nielsen, Jens Bo

    2016-01-01

    walking. In addition, we determined how age (i.e., healthy young adults vs. children) and biomechanical factors (i.e., walking speed) affected the rate and magnitude of locomotor sequence learning. The results showed that healthy young adults (age 24 ± 5 years, N = 20) could learn a specific sequence...... of step lengths over 300 training steps. Younger children (age 6-10 years, N = 8) have lower baseline performance, but their magnitude and rate of sequence learning was the same compared to older children (11-16 years, N = 10) and healthy adults. In addition, learning capacity may be more limited...... to modify step length from one trial to the next. Our sequence learning paradigm is derived from the serial reaction-time (SRT) task that has been used in upper limb studies. Both random and ordered sequences of step lengths were used to measure sequence-specific and sequence non-specific learning during...

  8. The RNA world, automatic sequences and oncogenetics

    Energy Technology Data Exchange (ETDEWEB)

    Tahir Shah, K

    1993-04-01

    We construct a model of the RNA world in terms of naturally evolving nucleotide sequences assuming only Crick-Watson base pairing and self-cleaving/splicing capability. These sequences have the following properties. (1) They are recognizable by an automation (or automata). That is, to each k-sequence, there exist a k-automation which accepts, recognizes or generates the k-sequence. These are known as automatic sequences. Fibonacci and Morse-Thue sequences are the most natural outcome of pre-biotic chemical conditions. (2) Infinite (resp. large) sequences are self-similar (resp. nearly self-similar) under certain rewrite rules and consequently give rise to fractal (resp.fractal-like) structures. Computationally, such sequences can also be generated by their corresponding deterministic parallel re-write system, known as a DOL system. The self-similar sequences are fixed points of their respective rewrite rules. Some of these automatic sequences have the capability that they can read or ``accept`` other sequences while others can detect errors and trigger error-correcting mechanisms. They can be enlarged and have block and/or palindrome structure. Linear recurring sequences such as Fibonacci sequence are simply Feed-back Shift Registers, a well know model of information processing machines. We show that a mutation of any rewrite rule can cause a combinatorial explosion of error and relates this to oncogenetical behavior. On the other hand, a mutation of sequences that are not rewrite rules, leads to normal evolutionary change. Known experimental results support our hypothesis. (author). Refs.

  9. The RNA world, automatic sequences and oncogenetics

    International Nuclear Information System (INIS)

    Tahir Shah, K.

    1993-04-01

    We construct a model of the RNA world in terms of naturally evolving nucleotide sequences assuming only Crick-Watson base pairing and self-cleaving/splicing capability. These sequences have the following properties. 1) They are recognizable by an automation (or automata). That is, to each k-sequence, there exist a k-automation which accepts, recognizes or generates the k-sequence. These are known as automatic sequences. Fibonacci and Morse-Thue sequences are the most natural outcome of pre-biotic chemical conditions. 2) Infinite (resp. large) sequences are self-similar (resp. nearly self-similar) under certain rewrite rules and consequently give rise to fractal (resp.fractal-like) structures. Computationally, such sequences can also be generated by their corresponding deterministic parallel re-write system, known as a DOL system. The self-similar sequences are fixed points of their respective rewrite rules. Some of these automatic sequences have the capability that they can read or 'accept' other sequences while others can detect errors and trigger error-correcting mechanisms. They can be enlarged and have block and/or palindrome structure. Linear recurring sequences such as Fibonacci sequence are simply Feed-back Shift Registers, a well know model of information processing machines. We show that a mutation of any rewrite rule can cause a combinatorial explosion of error and relates this to oncogenetical behavior. On the other hand, a mutation of sequences that are not rewrite rules, leads to normal evolutionary change. Known experimental results support our hypothesis. (author). Refs

  10. Targeted assembly of short sequence reads.

    Directory of Open Access Journals (Sweden)

    René L Warren

    Full Text Available As next-generation sequence (NGS production continues to increase, analysis is becoming a significant bottleneck. However, in situations where information is required only for specific sequence variants, it is not necessary to assemble or align whole genome data sets in their entirety. Rather, NGS data sets can be mined for the presence of sequence variants of interest by localized assembly, which is a faster, easier, and more accurate approach. We present TASR, a streamlined assembler that interrogates very large NGS data sets for the presence of specific variants by only considering reads within the sequence space of input target sequences provided by the user. The NGS data set is searched for reads with an exact match to all possible short words within the target sequence, and these reads are then assembled stringently to generate a consensus of the target and flanking sequence. Typically, variants of a particular locus are provided as different target sequences, and the presence of the variant in the data set being interrogated is revealed by a successful assembly outcome. However, TASR can also be used to find unknown sequences that flank a given target. We demonstrate that TASR has utility in finding or confirming genomic mutations, polymorphisms, fusions and integration events. Targeted assembly is a powerful method for interrogating large data sets for the presence of sequence variants of interest. TASR is a fast, flexible and easy to use tool for targeted assembly.

  11. A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference.

    Science.gov (United States)

    Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

    2016-09-02

    Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal

  12. The chaperonin-60 universal target is a barcode for bacteria that enables de novo assembly of metagenomic sequence data.

    Science.gov (United States)

    Links, Matthew G; Dumonceaux, Tim J; Hemmingsen, Sean M; Hill, Janet E

    2012-01-01

    Barcoding with molecular sequences is widely used to catalogue eukaryotic biodiversity. Studies investigating the community dynamics of microbes have relied heavily on gene-centric metagenomic profiling using two genes (16S rRNA and cpn60) to identify and track Bacteria. While there have been criteria formalized for barcoding of eukaryotes, these criteria have not been used to evaluate gene targets for other domains of life. Using the framework of the International Barcode of Life we evaluated DNA barcodes for Bacteria. Candidates from the 16S rRNA gene and the protein coding cpn60 gene were evaluated. Within complete bacterial genomes in the public domain representing 983 species from 21 phyla, the largest difference between median pairwise inter- and intra-specific distances ("barcode gap") was found from cpn60. Distribution of sequence diversity along the ∼555 bp cpn60 target region was remarkably uniform. The barcode gap of the cpn60 universal target facilitated the faithful de novo assembly of full-length operational taxonomic units from pyrosequencing data from a synthetic microbial community. Analysis supported the recognition of both 16S rRNA and cpn60 as DNA barcodes for Bacteria. The cpn60 universal target was found to have a much larger barcode gap than 16S rRNA suggesting cpn60 as a preferred barcode for Bacteria. A large barcode gap for cpn60 provided a robust target for species-level characterization of data. The assembly of consensus sequences for barcodes was shown to be a reliable method for the identification and tracking of novel microbes in metagenomic studies.

  13. Single-nucleotide variant in multiple copies of a deleted in azoospermia (DAZ) sequence - a human Y chromosome quantitative polymorphism.

    Science.gov (United States)

    Szmulewicz, Martin N; Ruiz, Luis M; Reategui, Erika P; Hussini, Saeed; Herrera, Rene J

    2002-01-01

    The evolution of the deleted in azoospermia (DAZ) gene family supports prevalent theories on the origin and development of sex chromosomes and sexual dimorphism. The ancestral DAZL gene in human chromosome 3 is known to be involved in germline development of both males and females. The available phylogenetic data suggest that some time after the divergence of the New World and Old World monkey lineages, the DAZL gene, which is found in all mammals, was copied to the Y chromosome of an ancestor to the Old World monkeys, but not New World monkeys. In modern man, the Y-linked DAZ gene complex is located on the distal part of the q arm. It is thought that after being copied to the Y chromosome, and after the divergence of the human and great ape lineages, the DAZ gene in the former underwent internal rearrangements. This included tandem duplications as well as a T > C transition altering an MboI restriction enzyme site in a duplicated sequence. In this study, we report on the ratios of MboI-/MboI+ variant sequences in individuals from seven worldwide human populations (Basque, Benin, Egypt, Formosa, Kungurtug, Oman and Rwanda) in the DAZ complex. The ratio of PCR MboI- and MboI+ amplicons can be used to characterize individuals and populations. Our results show a nonrandom distribution of MboI-/MboI+ sequence ratios in all populations examined, as well as significant differences in ratios between populations when compared pairwise. The multiple ratios imply that there have been more than one recent reorganization events at this locus. Considering the dynamic nature of this locus and its involvement in male fertility, we investigated the extent and distribution of this polymorphism. Copyright 2002 S. Karger AG, Basel

  14. Hierarchically nested river landform sequences

    Science.gov (United States)

    Pasternack, G. B.; Weber, M. D.; Brown, R. A.; Baig, D.

    2017-12-01

    River corridors exhibit landforms nested within landforms repeatedly down spatial scales. In this study we developed, tested, and implemented a new way to create river classifications by mapping domains of fluvial processes with respect to the hierarchical organization of topographic complexity that drives fluvial dynamism. We tested this approach on flow convergence routing, a morphodynamic mechanism with different states depending on the structure of nondimensional topographic variability. Five nondimensional landform types with unique functionality (nozzle, wide bar, normal channel, constricted pool, and oversized) represent this process at any flow. When this typology is nested at base flow, bankfull, and floodprone scales it creates a system with up to 125 functional types. This shows how a single mechanism produces complex dynamism via nesting. Given the classification, we answered nine specific scientific questions to investigate the abundance, sequencing, and hierarchical nesting of these new landform types using a 35-km gravel/cobble river segment of the Yuba River in California. The nested structure of flow convergence routing landforms found in this study revealed that bankfull landforms are nested within specific floodprone valley landform types, and these types control bankfull morphodynamics during moderate to large floods. As a result, this study calls into question the prevailing theory that the bankfull channel of a gravel/cobble river is controlled by in-channel, bankfull, and/or small flood flows. Such flows are too small to initiate widespread sediment transport in a gravel/cobble river with topographic complexity.

  15. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  16. Sequencing Intractable DNA to Close Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Hurt, Jr., Richard Ashley [ORNL; Brown, Steven D [ORNL; Podar, Mircea [ORNL; Palumbo, Anthony Vito [ORNL; Elias, Dwayne A [ORNL

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  17. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    Science.gov (United States)

    de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084

  18. cis sequence effects on gene expression

    Directory of Open Access Journals (Sweden)

    Jacobs Kevin

    2007-08-01

    Full Text Available Abstract Background Sequence and transcriptional variability within and between individuals are typically studied independently. The joint analysis of sequence and gene expression variation (genetical genomics provides insight into the role of linked sequence variation in the regulation of gene expression. We investigated the role of sequence variation in cis on gene expression (cis sequence effects in a group of genes commonly studied in cancer research in lymphoblastoid cell lines. We estimated the proportion of genes exhibiting cis sequence effects and the proportion of gene expression variation explained by cis sequence effects using three different analytical approaches, and compared our results to the literature. Results We generated gene expression profiling data at N = 697 candidate genes from N = 30 lymphoblastoid cell lines for this study and used available candidate gene resequencing data at N = 552 candidate genes to identify N = 30 candidate genes with sufficient variance in both datasets for the investigation of cis sequence effects. We used two additive models and the haplotype phylogeny scanning approach of Templeton (Tree Scanning to evaluate association between individual SNPs, all SNPs at a gene, and diplotypes, with log-transformed gene expression. SNPs and diplotypes at eight candidate genes exhibited statistically significant (p cis sequence effects in our study, respectively. Conclusion Based on analysis of our results and the extant literature, one in four genes exhibits significant cis sequence effects, and for these genes, about 30% of gene expression variation is accounted for by cis sequence variation. Despite diverse experimental approaches, the presence or absence of significant cis sequence effects is largely supported by previously published studies.

  19. Multiple tag labeling method for DNA sequencing

    Science.gov (United States)

    Mathies, R.A.; Huang, X.C.; Quesada, M.A.

    1995-07-25

    A DNA sequencing method is described which uses single lane or channel electrophoresis. Sequencing fragments are separated in the lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radioisotope labels. 5 figs.

  20. Exome sequencing and genetic testing for MODY.

    Directory of Open Access Journals (Sweden)

    Stefan Johansson

    Full Text Available Genetic testing for monogenic diabetes is important for patient care. Given the extensive genetic and clinical heterogeneity of diabetes, exome sequencing might provide additional diagnostic potential when standard Sanger sequencing-based diagnostics is inconclusive.The aim of the study was to examine the performance of exome sequencing for a molecular diagnosis of MODY in patients who have undergone conventional diagnostic sequencing of candidate genes with negative results.We performed exome enrichment followed by high-throughput sequencing in nine patients with suspected MODY. They were Sanger sequencing-negative for mutations in the HNF1A, HNF4A, GCK, HNF1B and INS genes. We excluded common, non-coding and synonymous gene variants, and performed in-depth analysis on filtered sequence variants in a pre-defined set of 111 genes implicated in glucose metabolism.On average, we obtained 45 X median coverage of the entire targeted exome and found 199 rare coding variants per individual. We identified 0-4 rare non-synonymous and nonsense variants per individual in our a priori list of 111 candidate genes. Three of the variants were considered pathogenic (in ABCC8, HNF4A and PPARG, respectively, thus exome sequencing led to a genetic diagnosis in at least three of the nine patients. Approximately 91% of known heterozygous SNPs in the target exomes were detected, but we also found low coverage in some key diabetes genes using our current exome sequencing approach. Novel variants in the genes ARAP1, GLIS3, MADD, NOTCH2 and WFS1 need further investigation to reveal their possible role in diabetes.Our results demonstrate that exome sequencing can improve molecular diagnostics of MODY when used as a complement to Sanger sequencing. However, improvements will be needed, especially concerning coverage, before the full potential of exome sequencing can be realized.

  1. Identifying driver mutations in sequenced cancer genomes

    DEFF Research Database (Denmark)

    Raphael, Benjamin J; Dobson, Jason R; Oesper, Layla

    2014-01-01

    High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, nois...... patterns of mutual exclusivity. These techniques, coupled with advances in high-throughput DNA sequencing, are enabling precision medicine approaches to the diagnosis and treatment of cancer....

  2. EGNAS: an exhaustive DNA sequence design algorithm

    Directory of Open Access Journals (Sweden)

    Kick Alfred

    2012-06-01

    Full Text Available Abstract Background The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of sequences with defined properties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences offers the possibility of controlling both interstrand and intrastrand properties. The guanine-cytosine content can be adjusted. Sequences can be forced to start and end with guanine or cytosine. This option reduces the risk of “fraying” of DNA strands. It is possible to limit cross hybridizations of a defined length, and to adjust the uniqueness of sequences. Self-complementarity and hairpin structures of certain length can be avoided. Sequences and subsequences can optionally be forbidden. Furthermore, sequences can be designed to have minimum interactions with predefined strands and neighboring sequences. Results The algorithm is realized in a C++ program. TAG sequences can be generated and combined with primers for single-base extension reactions, which were described for multiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldback through intrastrand interaction of TAG-primer pairs can be limited. The design of sequences for specific attachment of molecular constructs to DNA origami is presented. Conclusions We developed a new software tool called EGNAS for the design of unique nucleic acid sequences. The presented exhaustive algorithm allows to generate greater sets of sequences than with previous software and equal constraints. EGNAS is freely available for noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS.

  3. Genome Sequencing and Analysis Conference IV

    Energy Technology Data Exchange (ETDEWEB)

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  4. FRESCO: Referential compression of highly similar sequences.

    Science.gov (United States)

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.

  5. Robustness analysis of chiller sequencing control

    International Nuclear Information System (INIS)

    Liao, Yundan; Sun, Yongjun; Huang, Gongsheng

    2015-01-01

    Highlights: • Uncertainties with chiller sequencing control were systematically quantified. • Robustness of chiller sequencing control was systematically analyzed. • Different sequencing control strategies were sensitive to different uncertainties. • A numerical method was developed for easy selection of chiller sequencing control. - Abstract: Multiple-chiller plant is commonly employed in the heating, ventilating and air-conditioning system to increase operational feasibility and energy-efficiency under part load condition. In a multiple-chiller plant, chiller sequencing control plays a key role in achieving overall energy efficiency while not sacrifices the cooling sufficiency for indoor thermal comfort. Various sequencing control strategies have been developed and implemented in practice. Based on the observation that (i) uncertainty, which cannot be avoided in chiller sequencing control, has a significant impact on the control performance and may cause the control fail to achieve the expected control and/or energy performance; and (ii) in current literature few studies have systematically addressed this issue, this paper therefore presents a study on robustness analysis of chiller sequencing control in order to understand the robustness of various chiller sequencing control strategies under different types of uncertainty. Based on the robustness analysis, a simple and applicable method is developed to select the most robust control strategy for a given chiller plant in the presence of uncertainties, which will be verified using case studies

  6. Multiplexed microsatellite recovery using massively parallel sequencing

    Science.gov (United States)

    Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.

    2011-01-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).

  7. Digital Recovery Sequencer - Advanced Concept Ejection Seats

    National Research Council Canada - National Science Library

    Ross, David A; Cotter, Lee; Culhane, David; Press, Matthew J

    2005-01-01

    .... Continued usage of the Analog Sequencer is undesirable due to limitations with respect to its installed life, electronic component obsolescence, flexibility to accommodate seat safety improvements...

  8. Quantitative phenotyping via deep barcode sequencing.

    Science.gov (United States)

    Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

    2009-10-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.

  9. Hardware Accelerated Sequence Alignment with Traceback

    Directory of Open Access Journals (Sweden)

    Scott Lloyd

    2009-01-01

    in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations. With 256 processing elements in FPGA technology, a performance gain over 300 times that of a desktop computer is demonstrated on sequence lengths of 16000. For greater performance, the architecture is scalable to more processing elements.

  10. Multipliers on Generalized Mixed Norm Sequence Spaces

    Directory of Open Access Journals (Sweden)

    Oscar Blasco

    2014-01-01

    Full Text Available Given 1≤p,q≤∞ and sequences of integers (nkk and (nk′k such that nk≤nk′≤nk+1, the generalized mixed norm space ℓℐ(p,q is defined as those sequences (ajj such that ((∑j∈Ik‍|aj|p1/pk∈ℓq where Ik={j∈ℕ0 s.t. nk≤jsequence λ=(λjj to belong to the space of multipliers (ℓℐ(r,s,ℓ(u,v, for different sequences ℐ and of intervals in ℕ0, are determined.

  11. Recursive sequences in first-year calculus

    Science.gov (United States)

    Krainer, Thomas

    2016-02-01

    This article provides ready-to-use supplementary material on recursive sequences for a second-semester calculus class. It equips first-year calculus students with a basic methodical procedure based on which they can conduct a rigorous convergence or divergence analysis of many simple recursive sequences on their own without the need to invoke inductive arguments as is typically required in calculus textbooks. The sequences that are accessible to this kind of analysis are predominantly (eventually) monotonic, but also certain recursive sequences that alternate around their limit point as they converge can be considered.

  12. A measurement of disorder in binary sequences

    Science.gov (United States)

    Gong, Longyan; Wang, Haihong; Cheng, Weiwen; Zhao, Shengmei

    2015-03-01

    We propose a complex quantity, AL, to characterize the degree of disorder of L-length binary symbolic sequences. As examples, we respectively apply it to typical random and deterministic sequences. One kind of random sequences is generated from a periodic binary sequence and the other is generated from the logistic map. The deterministic sequences are the Fibonacci and Thue-Morse sequences. In these analyzed sequences, we find that the modulus of AL, denoted by |AL | , is a (statistically) equivalent quantity to the Boltzmann entropy, the metric entropy, the conditional block entropy and/or other quantities, so it is a useful quantitative measure of disorder. It can be as a fruitful index to discern which sequence is more disordered. Moreover, there is one and only one value of |AL | for the overall disorder characteristics. It needs extremely low computational costs. It can be easily experimentally realized. From all these mentioned, we believe that the proposed measure of disorder is a valuable complement to existing ones in symbolic sequences.

  13. Polynomial sequences generated by infinite Hessenberg matrices

    Directory of Open Access Journals (Sweden)

    Verde-Star Luis

    2017-01-01

    Full Text Available We show that an infinite lower Hessenberg matrix generates polynomial sequences that correspond to the rows of infinite lower triangular invertible matrices. Orthogonal polynomial sequences are obtained when the Hessenberg matrix is tridiagonal. We study properties of the polynomial sequences and their corresponding matrices which are related to recurrence relations, companion matrices, matrix similarity, construction algorithms, and generating functions. When the Hessenberg matrix is also Toeplitz the polynomial sequences turn out to be of interpolatory type and we obtain additional results. For example, we show that every nonderogative finite square matrix is similar to a unique Toeplitz-Hessenberg matrix.

  14. Genomic sequencing of Pleistocene cave bears

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  15. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  16. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding.

    Directory of Open Access Journals (Sweden)

    Marta Brozynska

    Full Text Available Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina and Ion Torrent (Life Technology sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare. Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.

  17. Step out - Step in Sequencing Games

    NARCIS (Netherlands)

    Musegaas, M.; Borm, P.E.M.; Quant, M.

    2014-01-01

    In this paper a new class of relaxed sequencing games is introduced: the class of Step out - Step in sequencing games. In this relaxation any player within a coalition is allowed to step out from his position in the processing order and to step in at any position later in the processing order.

  18. Step out-step in sequencing games

    NARCIS (Netherlands)

    Musegaas, Marieke; Borm, Peter; Quant, Marieke

    2015-01-01

    In this paper a new class of relaxed sequencing games is introduced: the class of Step out–Step in sequencing games. In this relaxation any player within a coalition is allowed to step out from his position in the processing order and to step in at any position later in the processing order. First,

  19. Enhanced throughput for infrared automated DNA sequencing

    Science.gov (United States)

    Middendorf, Lyle R.; Gartside, Bill O.; Humphrey, Pat G.; Roemer, Stephen C.; Sorensen, David R.; Steffens, David L.; Sutter, Scott L.

    1995-04-01

    Several enhancements have been developed and applied to infrared automated DNA sequencing resulting in significantly higher throughput. A 41 cm sequencing gel (31 cm well- to-read distance) combines high resolution of DNA sequencing fragments with optimized run times yielding two runs per day of 500 bases per sample. A 66 cm sequencing gel (56 cm well-to-read distance) produces sequence read lengths of up to 1000 bases for ds and ss templates using either T7 polymerase or cycle-sequencing protocols. Using a multichannel syringe to load 64 lanes allows 16 samples (compatible with 96-well format) to be visualized for each run. The 41 cm gel configuration allows 16,000 bases per day (16 samples X 500 bases/sample X 2 ten hour runs/day) to be sequenced with the advantages of infrared technology. Enhancements to internal labeling techniques using an infrared-labeled dATP molecule (Boehringer Mannheim GmbH, Penzberg, Germany; Sequenase (U.S. Biochemical) have also been made. The inclusion of glycerol in the sequencing reactions yields greatly improved results for some primer and template combinations. The inclusion of (alpha) -Thio-dNTP's in the labeling reaction increases signal intensity two- to three-fold.

  20. Thread extraction for polyadic instruction sequences

    NARCIS (Netherlands)

    Bergstra, J.; Middelburg, C.

    2011-01-01

    In this paper, we study the phenomenon that instruction sequences are split into fragments which somehow produce a joint behaviour. In order to bring this phenomenon better into the picture, we formalize a simple mechanism by which several instruction sequence fragments can produce a joint

  1. Genome sequence of Lactobacillus rhamnosus ATCC 8530.

    Science.gov (United States)

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R; Ziola, Barry

    2012-02-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences.

  2. Genome Sequence of Lactobacillus rhamnosus ATCC 8530

    OpenAIRE

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R.; Ziola, Barry

    2012-01-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences.

  3. Question-answer sequences in survey interviews

    NARCIS (Netherlands)

    Dijkstra, W.; Ongena, Y.P.

    2006-01-01

    Interaction analysis was used to analyze a total of 14,265 question-answer sequences of (Q-A Sequences) 80 questions that originated from two face-to-face and three telephone surveys. The analysis was directed towards the causes and effects of particular interactional problems. Our results showed

  4. Trace maps for arbitrary substitution sequences

    International Nuclear Information System (INIS)

    Avishai, Y.

    1993-01-01

    The discovery of quasi-crystals and their 1-dimensional modeling have led to a deep mathematical study of Schroedinger operators with an arbitrary deterministic potential sequence. In this work we address this problem and find trace maps for an arbitrary substitution sequence. our trace maps have lower dimensionality than those of Kolar and Nori, which make them quite attractive for actual applications. (authors)

  5. Stochastic modelling of daily rainfall sequences

    NARCIS (Netherlands)

    Buishand, T.A.

    1977-01-01

    Rainfall series of different climatic regions were analysed with the aim of generating daily rainfall sequences. A survey of the data is given in I, 1. When analysing daily rainfall sequences one must be aware of the following points:
    a. Seasonality. Because of seasonal variation

  6. Learning of Sensory Sequences in Cerebellar Patients

    Science.gov (United States)

    Frings, Markus; Boenisch, Raoul; Gerwig, Marcus; Diener, Hans-Christoph; Timmann, Dagmar

    2004-01-01

    A possible role of the cerebellum in detecting and recognizing event sequences has been proposed. The present study sought to determine whether patients with cerebellar lesions are impaired in the acquisition and discrimination of sequences of sensory stimuli of different modalities. A group of 26 cerebellar patients and 26 controls matched for…

  7. On peculiar Šindel sequences

    Czech Academy of Sciences Publication Activity Database

    Křížek, Michal; Somer, L.

    2010-01-01

    Roč. 17, č. 2 (2010), s. 129-140 ISSN 0972-5555 R&D Projects: GA AV ČR(CZ) IAA100190803 Institutional research plan: CEZ:AV0Z10190503 Keywords : quadratic residue * Chinese remainder theorem * primitive Šindel sequences * Prague clock sequence Subject RIV: BA - General Mathematics http://www.pphmj.com/abstract/5095.htm

  8. Protecting genomic sequence anonymity with generalization lattices.

    Science.gov (United States)

    Malin, B A

    2005-01-01

    Current genomic privacy technologies assume the identity of genomic sequence data is protected if personal information, such as demographics, are obscured, removed, or encrypted. While demographic features can directly compromise an individual's identity, recent research demonstrates such protections are insufficient because sequence data itself is susceptible to re-identification. To counteract this problem, we introduce an algorithm for anonymizing a collection of person-specific DNA sequences. The technique is termed DNA lattice anonymization (DNALA), and is based upon the formal privacy protection schema of k -anonymity. Under this model, it is impossible to observe or learn features that distinguish one genetic sequence from k-1 other entries in a collection. To maximize information retained in protected sequences, we incorporate a concept generalization lattice to learn the distance between two residues in a single nucleotide region. The lattice provides the most similar generalized concept for two residues (e.g. adenine and guanine are both purines). The method is tested and evaluated with several publicly available human population datasets ranging in size from 30 to 400 sequences. Our findings imply the anonymization schema is feasible for the protection of sequences privacy. The DNALA method is the first computational disclosure control technique for general DNA sequences. Given the computational nature of the method, guarantees of anonymity can be formally proven. There is room for improvement and validation, though this research provides the groundwork from which future researchers can construct genomics anonymization schemas tailored to specific datasharing scenarios.

  9. Occupational Sequences: Auto Engines 1. AT 121.

    Science.gov (United States)

    Korb, A. W.; And Others

    In an attempt to individualize an automotive course, the Vocational-Technical Division of Northern Montana College has developed Occupational Sequences for an engine rebuilding course. Occupational Sequences, a learning or teaching aid, is an analysis of numbered operations involved in engine rebuilding. Job sheets, included in the book, provide a…

  10. Sequencing Events: Exploring Art and Art Jobs.

    Science.gov (United States)

    Stephens, Pamela Geiger; Shaddix, Robin K.

    2000-01-01

    Presents an activity for upper-elementary students that correlates the actions of archaeologists, patrons, and artists with the sequencing of events in a logical order. Features ancient Egyptian art images. Discusses the preparation of materials, motivation, a pre-writing activity, and writing a story in sequence. (CMK)

  11. Wijsman Orlicz Asymptotically Ideal -Statistical Equivalent Sequences

    Directory of Open Access Journals (Sweden)

    Bipan Hazarika

    2013-01-01

    in Wijsman sense and present some definitions which are the natural combination of the definition of asymptotic equivalence, statistical equivalent, -statistical equivalent sequences in Wijsman sense. Finally, we introduce the notion of Cesaro Orlicz asymptotically -equivalent sequences in Wijsman sense and establish their relationship with other classes.

  12. Nitrogen chronology of massive main sequence stars

    NARCIS (Netherlands)

    Köhler, K.; Borzyszkowski, M.; Brott, I.; Langer, N.; de Koter, A.

    2012-01-01

    Context. Rotational mixing in massive main sequence stars is predicted to monotonically increase their surface nitrogen abundance with time. Aims. We use this effect to design a method for constraining the age and the inclination angle of massive main sequence stars, given their observed luminosity,

  13. Massively parallel sequencing of forensic STRs

    DEFF Research Database (Denmark)

    Parson, Walther; Ballard, David; Budowle, Bruce

    2016-01-01

    The DNA Commission of the International Society for Forensic Genetics (ISFG) is reviewing factors that need to be considered ahead of the adoption by the forensic community of short tandem repeat (STR) genotyping by massively parallel sequencing (MPS) technologies. MPS produces sequence data that...

  14. Novel algorithms for protein sequence analysis

    NARCIS (Netherlands)

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  15. Cyprinus carpio Genome sequencing and assembly

    NARCIS (Netherlands)

    Kolder, I.C.R.M.; Plas-Duivesteijn, van der Suzanne J.; Tan, G.; Wiegertjes, G.; Forlenza, M.; Guler, A.T.; Travin, D.Y.; Nakao, M.; Moritomo, T.; Irnazarow, I.; Jansen, H.J.

    2013-01-01

    Sequencing of the common carp (Cyprinus carpio carpio Linnaeus, 1758) genome, with the objective of establishing carp as a model organism to supplement the closely related zebrafish (Danio rerio). The sequenced individual is a homozygous female (by gynogenesis) of R3 x R8 carp, the heterozygous

  16. Sequence Comparison: Close and Open problems

    NARCIS (Netherlands)

    Lenzini, Gabriele; Cerrai, P.; Freguglia, P.

    Comparing sequences is a very important activity both in computer science and in a many other areas as well. For example thank to text editors, everyone knows the particular instance of a sequence comparison problem knonw as ``string mathcing problem''. It consists in searching a given work

  17. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. CONCLUSIONS...

  18. Swab-to-Sequence: Real-time Data Analysis Platform for the Biomolecule Sequencer

    Data.gov (United States)

    National Aeronautics and Space Administration — DNA was successfully sequenced on the ISS in 2016, but the DNA sequenced was prepared on the ground. With FY’16 IRAD funds, the same team developed a...

  19. From Sequence to Morphology - Long-Range Correlations in Complete Sequenced Genomes

    NARCIS (Netherlands)

    T.A. Knoch (Tobias)

    2004-01-01

    textabstractThe largely unresolved sequential organization, i.e. the relations within DNA sequences, and its connection to the three-dimensional organization of genomes was investigated by correlation analyses of completely sequenced chromosomes from Viroids, Archaea, Bacteria, Arabidopsis

  20. Genetic Evaluation of Natural Populations of the Endangered Conifer Thuja koraiensis Using Microsatellite Markers by Restriction-Associated DNA Sequencing

    Directory of Open Access Journals (Sweden)

    Lu Hou

    2018-04-01

    Full Text Available Thuja koraiensis Nakai is an endangered conifer of high economic and ecological value in Jilin Province, China. However, studies on its population structure and conservation genetics have been limited by the lack of genomic data. Here, 37,761 microsatellites (simple sequence repeat, SSR were detected based on 875,792 de novo-assembled contigs using a restriction-associated DNA (RAD approach. Among these SSRs, 300 were randomly selected to test for polymorphisms and 96 obtained loci were able to amplify a fragment of expected size. Twelve polymorphic SSR markers were developed to analyze the genetic diversity and population structure of three natural populations. High genetic diversity (mean NA = 5.481, HE = 0.548 and moderate population differentiation (pairwise Fst = 0.048–0.078, Nm = 2.940–4.958 were found in this species. Molecular variance analysis suggested that most of the variation (83% existed within populations. Combining the results of STRUCTURE, principal coordinate, and neighbor-joining analysis, the 232 individuals were divided into three genetic clusters that generally correlated with their geographical distributions. Finally, appropriate conservation strategies were proposed to protect this species. This study provides genetic information for the natural resource conservation and utilization of T. koraiensis and will facilitate further studies of the evolution and phylogeography of the species.

  1. De novo sequencing, assembly, and analysis of Iris lactea var. chinensis roots' transcriptome in response to salt stress.

    Science.gov (United States)

    Gu, Chunsun; Xu, Sheng; Wang, Zhiquan; Liu, Liangqin; Zhang, Yongxia; Deng, Yanming; Huang, Suzhen

    2018-04-01

    As a halophyte, Iris lactea var. chinensis (I. lactea var. chinensis) is widely distributed and has good drought and heavy metal resistance. Moreover, it is an excellent ornamental plant. I. lactea var. chinensis has extensive application prospects owing to the global impacts of salinization. To better understand its molecular mechanism involved in salt resistance, the de novo sequencing, assembly, and analysis of I. lactea var. chinensis roots' transcriptome in response to salt-stress conditions was performed. On average, 74.17% of the clean reads were mapped to unigenes. A total of 121,093 unigenes were constructed and 56,398 (46.57%) were annotated. Among these, 13,522 differentially expressed genes (DEGs) were identified between salt-treated and control samples Compared to the transcriptional level of control, 7037 DEGs were up-regulated and 6539 down-regulated. In addition, 129 up-regulated and 1609 down-regulated genes were simultaneously detected in all three pairwise comparisons between control and salt-stressed libraries. At least 247 and 250 DEGs encoding transcription factors and transporter proteins were identified. Meanwhile, 130 DEGs regarding reactive oxygen species (ROS) scavenging system were also summarized. Based on real-time quantitative RT-PCR, we verified the changes in the expression patterns of 10 unigenes. Our study identified potential salt-responsive candidate genes and increased the understanding of halophyte responses to salinity stress. Copyright © 2018 Elsevier Masson SAS. All rights reserved.

  2. Rare variant testing across methods and thresholds using the multi-kernel sequence kernel association test (MK-SKAT).

    Science.gov (United States)

    Urrutia, Eugene; Lee, Seunggeun; Maity, Arnab; Zhao, Ni; Shen, Judong; Li, Yun; Wu, Michael C

    Analysis of rare genetic variants has focused on region-based analysis wherein a subset of the variants within a genomic region is tested for association with a complex trait. Two important practical challenges have emerged. First, it is difficult to choose which test to use. Second, it is unclear which group of variants within a region should be tested. Both depend on the unknown true state of nature. Therefore, we develop the Multi-Kernel SKAT (MK-SKAT) which tests across a range of rare variant tests and groupings. Specifically, we demonstrate that several popular rare variant tests are special cases of the sequence kernel association test which compares pair-wise similarity in trait value to similarity in the rare variant genotypes between subjects as measured through a kernel function. Choosing a particular test is equivalent to choosing a kernel. Similarly, choosing which group of variants to test also reduces to choosing a kernel. Thus, MK-SKAT uses perturbation to test across a range of kernels. Simulations and real data analyses show that our framework controls type I error while maintaining high power across settings: MK-SKAT loses power when compared to the kernel for a particular scenario but has much greater power than poor choices.

  3. Nucleotide sequence preservation of human mitochondrial DNA

    International Nuclear Information System (INIS)

    Monnat, R.J. Jr.; Loeb, L.A.

    1985-01-01

    Recombinant DNA techniques have been used to quantitate the amount of nucleotide sequence divergence in the mitochondrial DNA population of individual normal humans. Mitochondrial DNA was isolated from the peripheral blood lymphocytes of five normal humans and cloned in M13 mp11; 49 kilobases of nucleotide sequence information was obtained from 248 independently isolated clones from the five normal donors. Both between- and within-individual differences were identified. Between-individual differences were identified in approximately = to 1/200 nucleotides. In contrast, only one within-individual difference was identified in 49 kilobases of nucleotide sequence information. This high degree of mitochondrial nucleotide sequence homogeneity in human somatic cells is in marked contrast to the rapid evolutionary divergence of human mitochondrial DNA and suggests the existence of mechanisms for the concerted preservation of mammalian mitochondrial DNA sequences in single organisms

  4. Snake Genome Sequencing: Results and Future Prospects.

    Science.gov (United States)

    Kerkkamp, Harald M I; Kini, R Manjunatha; Pospelov, Alexey S; Vonk, Freek J; Henkel, Christiaan V; Richardson, Michael K

    2016-12-01

    Snake genome sequencing is in its infancy-very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.

  5. Sequencing Cyclic Peptides by Multistage Mass Spectrometry

    Science.gov (United States)

    Mohimani, Hosein; Yang, Yu-Liang; Liu, Wei-Ting; Hsieh, Pei-Wen; Dorrestein, Pieter C.; Pevzner, Pavel A.

    2012-01-01

    Some of the most effective antibiotics (e.g., Vancomycin and Daptomycin) are cyclic peptides produced by non-ribosomal biosynthetic pathways. While hundreds of biomedically important cyclic peptides have been sequenced, the computational techniques for sequencing cyclic peptides are still in their infancy. Previous methods for sequencing peptide antibiotics and other cyclic peptides are based on Nuclear Magnetic Resonance spectroscopy, and require large amount (miligrams) of purified materials that, for most compounds, are not possible to obtain. Recently, development of mass spectrometry based methods has provided some hope for accurate sequencing of cyclic peptides using picograms of materials. In this paper we develop a method for sequencing of cyclic peptides by multistage mass spectrometry, and show its advantages over single stage mass spectrometry. The method is tested on known and new cyclic peptides from Bacillus brevis, Dianthus superbus and Streptomyces griseus, as well as a new family of cyclic peptides produced by marine bacteria. PMID:21751357

  6. Snake Genome Sequencing: Results and Future Prospects

    Directory of Open Access Journals (Sweden)

    Harald M. I. Kerkkamp

    2016-12-01

    Full Text Available Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.

  7. Sequencing and comparing whole mitochondrial genomes ofanimals

    Energy Technology Data Exchange (ETDEWEB)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  8. Divide and conquer: enriching environmental sequencing data.

    Directory of Open Access Journals (Sweden)

    Anne Bergeron

    2007-09-01

    Full Text Available In environmental sequencing projects, a mix of DNA from a whole microbial community is fragmented and sequenced, with one of the possible goals being to reconstruct partial or complete genomes of members of the community. In communities with high diversity of species, a significant proportion of the sequences do not overlap any other fragment in the sample. This problem will arise not only in situations with a relatively even distribution of many species, but also when the community in a particular environment is routinely dominated by the same few species. In the former case, no genomes may be assembled at all, while in the latter case a few dominant species in an environment will always be sequenced at high coverage to the detriment of coverage of the greater number of sparse species.Here we show that, with the same global sequencing effort, separating the species into two or more sub-communities prior to sequencing can yield a much higher proportion of sequences that can be assembled. We first use the Lander-Waterman model to show that, if the expected percentage of singleton sequences is higher than 25%, then, under the uniform distribution hypothesis, splitting the community is always a wise choice. We then construct simulated microbial communities to show that the results hold for highly non-uniform distributions. We also show that, for the distributions considered in the experiments, it is possible to estimate quite accurately the relative diversity of the two sub-communities.Given the fact that several methods exist to split microbial communities based on physical properties such as size, density, surface biochemistry, or optical properties, we strongly suggest that groups involved in environmental sequencing, and expecting high diversity, consider splitting their communities in order to maximize the information content of their sequencing effort.

  9. Sequence Matters but How Exactly? A Method for Evaluating Activity Sequences from Data

    Science.gov (United States)

    Doroudi, Shayan; Holstein, Kenneth; Aleven, Vincent; Brunskill, Emma

    2016-01-01

    How should a wide variety of educational activities be sequenced to maximize student learning? Although some experimental studies have addressed this question, educational data mining methods may be able to evaluate a wider range of possibilities and better handle many simultaneous sequencing constraints. We introduce Sequencing Constraint…

  10. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    DEFF Research Database (Denmark)

    de Souza, S J; Camargo, A A; Briones, M R

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central ...

  11. Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

    Science.gov (United States)

    Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

    2017-07-01

    PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.

  12. A comparative evaluation of sequence classification programs

    Directory of Open Access Journals (Sweden)

    Bazinet Adam L

    2012-05-01

    Full Text Available Abstract Background A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics. Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classification, some methods screen sequence reads for ’barcoding genes’ like 16S rRNA, or various types of protein-coding genes. Due to the sheer number and complexity of methods, it can be difficult for a researcher to choose one that is well-suited for a particular analysis. Results We divided the very large number of programs that have been released in recent years for solving the sequence classification problem into three main categories based on the general algorithm they use to compare a query sequence against a database of sequences. We also evaluated the performance of the leading programs in each category on data sets whose taxonomic and functional composition is known. Conclusions We found significant variability in classification accuracy, precision, and resource consumption of sequence classification programs when used to analyze various metagenomics data sets. However, we observe some general trends and patterns that will be useful to researchers who use sequence classification programs.

  13. CATEGORIZATION OF EVENT SEQUENCES FOR LICENSE APPLICATION

    Energy Technology Data Exchange (ETDEWEB)

    G.E. Ragan; P. Mecheret; D. Dexheimer

    2005-04-14

    The purposes of this analysis are: (1) Categorize (as Category 1, Category 2, or Beyond Category 2) internal event sequences that may occur before permanent closure of the repository at Yucca Mountain. (2) Categorize external event sequences that may occur before permanent closure of the repository at Yucca Mountain. This includes examining DBGM-1 seismic classifications and upgrading to DBGM-2, if appropriate, to ensure Beyond Category 2 categorization. (3) State the design and operational requirements that are invoked to make the categorization assignments valid. (4) Indicate the amount of material put at risk by Category 1 and Category 2 event sequences. (5) Estimate frequencies of Category 1 event sequences at the maximum capacity and receipt rate of the repository. (6) Distinguish occurrences associated with normal operations from event sequences. It is beyond the scope of the analysis to propose design requirements that may be required to control radiological exposure associated with normal operations. (7) Provide a convenient compilation of the results of the analysis in tabular form. The results of this analysis are used as inputs to the consequence analyses in an iterative design process that is depicted in Figure 1. Categorization of event sequences for permanent retrieval of waste from the repository is beyond the scope of this analysis. Cleanup activities that take place after an event sequence and other responses to abnormal events are also beyond the scope of the analysis.

  14. Exploration of noncoding sequences in metagenomes.

    Directory of Open Access Journals (Sweden)

    Fabián Tobar-Tosse

    Full Text Available Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C content, Codon Usage (Cd, Trinucleotide Usage (Tn, and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.

  15. CATEGORIZATION OF EVENT SEQUENCES FOR LICENSE APPLICATION

    International Nuclear Information System (INIS)

    G.E. Ragan; P. Mecheret; D. Dexheimer

    2005-01-01

    The purposes of this analysis are: (1) Categorize (as Category 1, Category 2, or Beyond Category 2) internal event sequences that may occur before permanent closure of the repository at Yucca Mountain. (2) Categorize external event sequences that may occur before permanent closure of the repository at Yucca Mountain. This includes examining DBGM-1 seismic classifications and upgrading to DBGM-2, if appropriate, to ensure Beyond Category 2 categorization. (3) State the design and operational requirements that are invoked to make the categorization assignments valid. (4) Indicate the amount of material put at risk by Category 1 and Category 2 event sequences. (5) Estimate frequencies of Category 1 event sequences at the maximum capacity and receipt rate of the repository. (6) Distinguish occurrences associated with normal operations from event sequences. It is beyond the scope of the analysis to propose design requirements that may be required to control radiological exposure associated with normal operations. (7) Provide a convenient compilation of the results of the analysis in tabular form. The results of this analysis are used as inputs to the consequence analyses in an iterative design process that is depicted in Figure 1. Categorization of event sequences for permanent retrieval of waste from the repository is beyond the scope of this analysis. Cleanup activities that take place after an event sequence and other responses to abnormal events are also beyond the scope of the analysis

  16. On site DNA barcoding by nanopore sequencing.

    Directory of Open Access Journals (Sweden)

    Michele Menegon

    Full Text Available Biodiversity research is becoming increasingly dependent on genomics, which allows the unprecedented digitization and understanding of the planet's biological heritage. The use of genetic markers i.e. DNA barcoding, has proved to be a powerful tool in species identification. However, full exploitation of this approach is hampered by the high sequencing costs and the absence of equipped facilities in biodiversity-rich countries. In the present work, we developed a portable sequencing laboratory based on the portable DNA sequencer from Oxford Nanopore Technologies, the MinION. Complementary laboratory equipment and reagents were selected to be used in remote and tough environmental conditions. The performance of the MinION sequencer and the portable laboratory was tested for DNA barcoding in a mimicking tropical environment, as well as in a remote rainforest of Tanzania lacking electricity. Despite the relatively high sequencing error-rate of the MinION, the development of a suitable pipeline for data analysis allowed the accurate identification of different species of vertebrates including amphibians, reptiles and mammals. In situ sequencing of a wild frog allowed us to rapidly identify the species captured, thus confirming that effective DNA barcoding in the field is possible. These results open new perspectives for real-time-on-site DNA sequencing thus potentially increasing opportunities for the understanding of biodiversity in areas lacking conventional laboratory facilities.

  17. Comparison of two Next Generation sequencing platforms for full genome sequencing of Classical Swine Fever Virus

    DEFF Research Database (Denmark)

    Fahnøe, Ulrik; Pedersen, Anders Gorm; Höper, Dirk

    2013-01-01

    to the consensus sequence. Additionally, we got an average sequence depth for the genome of 4000 for the Iontorrent PGM and 400 for the FLX platform making the mapping suitable for single nucleotide variant (SNV) detection. The analysis revealed a single non-silent SNV A10665G leading to the amino acid change D......Next Generation Sequencing (NGS) is becoming more adopted into viral research and will be the preferred technology in the years to come. We have recently sequenced several strains of Classical Swine Fever Virus (CSFV) by NGS on both Genome Sequencer FLX (GS FLX) and Iontorrent PGM platforms...

  18. Arabidopsis ASYMMETRIC LEAVES2 protein required for leaf morphogenesis consistently forms speckles during mitosis of tobacco BY-2 cells via signals in its specific sequence.

    Science.gov (United States)

    Luo, Lilan; Ando, Sayuri; Sasabe, Michiko; Machida, Chiyoko; Kurihara, Daisuke; Higashiyama, Tetsuya; Machida, Yasunori

    2012-09-01

    Leaf primordia with high division and developmental competencies are generated around the periphery of stem cells at the shoot apex. Arabidopsis ASYMMETRIC-LEAVES2 (AS2) protein plays a key role in the regulation of many genes responsible for flat symmetric leaf formation. The AS2 gene, expressed in leaf primordia, encodes a plant-specific nuclear protein containing an AS2/LOB domain with cysteine repeats (C-motif). AS2 proteins are present in speckles in and around the nucleoli, and in the nucleoplasm of some leaf epidermal cells. We used the tobacco cultured cell line BY-2 expressing the AS2-fused yellow fluorescent protein to examine subnuclear localization of AS2 in dividing cells. AS2 mainly localized to speckles (designated AS2 bodies) in cells undergoing mitosis and distributed in a pairwise manner during the separation of sets of daughter chromosomes. Few interphase cells contained AS2 bodies. Deletion analyses showed that a short stretch of the AS2 amino-terminal sequence and the C-motif play negative and positive roles, respectively, in localizing AS2 to the bodies. These results suggest that AS2 bodies function to properly distribute AS2 to daughter cells during cell division in leaf primordia; and this process is controlled at least partially by signals encoded by the AS2 sequence itself.

  19. A combination of PhP typing and β-d-glucuronidase gene sequence variation analysis for differentiation of Escherichia coli from humans and animals.

    Science.gov (United States)

    Masters, N; Christie, M; Katouli, M; Stratton, H

    2015-06-01

    We investigated the usefulness of the β-d-glucuronidase gene variance in Escherichia coli as a microbial source tracking tool using a novel algorithm for comparison of sequences from a prescreened set of host-specific isolates using a high-resolution PhP typing method. A total of 65 common biochemical phenotypes belonging to 318 E. coli strains isolated from humans and domestic and wild animals were analysed for nucleotide variations at 10 loci along a 518 bp fragment of the 1812 bp β-d-glucuronidase gene. Neighbour-joining analysis of loci variations revealed 86 (76.8%) human isolates and 91.2% of animal isolates were correctly identified. Pairwise hierarchical clustering improved assignment; where 92 (82.1%) human and 204 (99%) animal strains were assigned to their respective cluster. Our data show that initial typing of isolates and selection of common types from different hosts prior to analysis of the β-d-glucuronidase gene sequence improves source identification. We also concluded that numerical profiling of the nucleotide variations can be used as a valuable approach to differentiate human from animal E. coli. This study signifies the usefulness of the β-d-glucuronidase gene as a marker for differentiating human faecal pollution from animal sources.

  20. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.