WorldWideScience

Sample records for sequencing mekhanicheskie napryazhenyiya

  1. Automatic sequences

    CERN Document Server

    Haeseler, Friedrich

    2003-01-01

    Automatic sequences are sequences which are produced by a finite automaton. Although they are not random they may look as being random. They are complicated, in the sense of not being not ultimately periodic, they may look rather complicated, in the sense that it may not be easy to name the rule by which the sequence is generated, however there exists a rule which generates the sequence. The concept automatic sequences has special applications in algebra, number theory, finite automata and formal languages, combinatorics on words. The text deals with different aspects of automatic sequences, in particular:· a general introduction to automatic sequences· the basic (combinatorial) properties of automatic sequences· the algebraic approach to automatic sequences· geometric objects related to automatic sequences.

  2. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...... on transcriptional evidence. Analysis of repetitive sequences suggests that they are underrepresented in the reference assembly, reflecting an enrichment of gene-rich regions in the current assembly. Characterization of Lotus natural variation by resequencing of L. japonicus accessions and diploid Lotus species...... is currently ongoing, facilitated by the MG20 reference sequence...

  3. Dna Sequencing

    Science.gov (United States)

    Tabor, Stanley; Richardson, Charles C.

    1995-04-25

    A method for sequencing a strand of DNA, including the steps off: providing the strand of DNA; annealing the strand with a primer able to hybridize to the strand to give an annealed mixture; incubating the mixture with four deoxyribonucleoside triphosphates, a DNA polymerase, and at least three deoxyribonucleoside triphosphates in different amounts, under conditions in favoring primer extension to form nucleic acid fragments complementory to the DNA to be sequenced; labelling the nucleic and fragments; separating them and determining the position of the deoxyribonucleoside triphosphates by differences in the intensity of the labels, thereby to determine the DNA sequence.

  4. Moebius sequence

    DEFF Research Database (Denmark)

    Pedersen, Line Kjeldgaard; Maimburg, Rikke Damkjær; Hertz, Jens Michael

    2017-01-01

    and photographical evaluation. Five patients maintained the diagnosis of MS according to the diagnostic criteria. RESULTS: All five patients had bilateral facial and abducens paralysis confirmed by ophthalmological examination. Three of five had normal brain MR imaging. Two had missing facial nerves and one had......BACKGROUND: Moebius Sequence (MS) is a rare disorder defined by bilateral congenital paralysis of the abducens and facial nerves in combination with various odontological, craniofacial, ophthalmological and orthopaedic conditions. The aetiology is still unknown; but both genetic (de novo mutations...

  5. Shotgun protein sequencing.

    Energy Technology Data Exchange (ETDEWEB)

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  6. The sequence of sequencers: The history of sequencing DNA

    Science.gov (United States)

    Heather, James M.; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. PMID:26554401

  7. Sequence Read Archive (SRA)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome...

  8. Massively parallel signature sequencing.

    Science.gov (United States)

    Zhou, Daixing; Rao, Mahendra S; Walker, Roger; Khrebtukova, Irina; Haudenschild, Christian D; Miura, Takumi; Decola, Shannon; Vermaas, Eric; Moon, Keith; Vasicek, Thomas J

    2006-01-01

    Massively parallel signature sequencing is an ultra-high throughput sequencing technology. It can simultaneously sequence millions of sequence tags, and, therefore, is ideal for whole genome analysis. When applied to expression profiling, it reveals almost every transcript in the sample and provides its accurate expression level. This chapter describes the technology and its application in establishing stem cell transcriptome databases.

  9. Goldbach Partitions and Sequences

    Indian Academy of Sciences (India)

    IAS Admin

    as a sum of two primes (for even numbers) and three primes (for odd numbers). We call this the Goldbach sequence g(n), which may be converted into a binary sequence b(n) by mapping each even number to 0 and each odd number to 1. The resulting binary sequences may be used as pseudorandom sequences in ...

  10. Nonparametric combinatorial sequence models.

    Science.gov (United States)

    Wauthier, Fabian L; Jordan, Michael I; Jojic, Nebojsa

    2011-11-01

    This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This article presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three biological sequence families which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution over sequence representations induced by the prior. By integrating out the posterior, our method compares favorably to leading binding predictors.

  11. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  12. Anomaly Detection in Sequences

    Data.gov (United States)

    National Aeronautics and Space Administration — We present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that...

  13. DNA sequencing conference, 2

    Energy Technology Data Exchange (ETDEWEB)

    Cook-Deegan, R.M. [Georgetown Univ., Kennedy Inst. of Ethics, Washington, DC (United States); Venter, J.C. [National Inst. of Neurological Disorders and Strokes, Bethesda, MD (United States); Gilbert, W. [Harvard Univ., Cambridge, MA (United States); Mulligan, J. [Stanford Univ., CA (United States); Mansfield, B.K. [Oak Ridge National Lab., TN (United States)

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  14. sequenceMiner algorithm

    Data.gov (United States)

    National Aeronautics and Space Administration — Detecting and describing anomalies in large repositories of discrete symbol sequences. sequenceMiner has been open-sourced! Download the file below to try it out....

  15. Protein sequence databases.

    Science.gov (United States)

    Apweiler, Rolf; Bairoch, Amos; Wu, Cathy H

    2004-02-01

    A variety of protein sequence databases exist, ranging from simple sequence repositories, which store data with little or no manual intervention in the creation of the records, to expertly curated universal databases that cover all species and in which the original sequence data are enhanced by the manual addition of further information in each sequence record. As the focus of researchers moves from the genome to the proteins encoded by it, these databases will play an even more important role as central comprehensive resources of protein information. Several the leading protein sequence databases are discussed here, with special emphasis on the databases now provided by the Universal Protein Knowledgebase (UniProt) consortium.

  16. Epigenomics: sequencing the methylome.

    Science.gov (United States)

    Hirst, Martin

    2013-01-01

    DNA methylation patterns are increasingly surveyed through methods that utilize massively parallel sequencing. Sequence-based assays developed to detect DNA methylation can be broadly divided into those that depend on affinity enrichment, chemical conversion, or enzymatic restriction. The DNA fragments resulting from these methods are uniformly subjected to library construction and massively parallel sequencing. The sequence reads are subsequently aligned to a reference genome and subjected to specialized analytical tools to extract the underlying methylation signature. This chapter will outline these emerging techniques.

  17. Next-generation sequencing

    DEFF Research Database (Denmark)

    Rieneck, Klaus; Bak, Mads; Jønson, Lars

    2013-01-01

    the feasibility of predicting the fetal KEL1 phenotype using next-generation sequencing (NGS) technology. STUDY DESIGN AND METHODS: The KEL1/2 single-nucleotide polymorphism was polymerase chain reaction (PCR) amplified with one adjoining base, and the PCR product was sequenced using a genome analyzer (GAIIx......, Illumina); several millions of PCR sequences were analyzed. RESULTS: The results demonstrated the feasibility of diagnosing the fetal KEL1 or KEL2 blood group from cell-free DNA purified from maternal plasma. CONCLUSION: This method requires only one primer pair, and the large amount of sequence...

  18. Delayed Sequence Intubation

    DEFF Research Database (Denmark)

    Weingart, Scott D; Trueger, N Seth; Wong, Nelson

    2015-01-01

    assessed. RESULTS: A total of 62 patients were enrolled: 19 patients required delayed sequence intubation to allow nonrebreather mask, 39 patients required it to allow NIPPV, and 4 patients required it for nasogastric tube placement. Saturations increased from a mean of 89.9% before delayed sequence...

  19. Cosmetology: Scope and Sequence.

    Science.gov (United States)

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a cosmetology vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  20. Sequences for Student Investigation

    Science.gov (United States)

    Barton, Jeffrey; Feil, David; Lartigue, David; Mullins, Bernadette

    2004-01-01

    We describe two classes of sequences that give rise to accessible problems for undergraduate research. These problems may be understood with virtually no prerequisites and are well suited for computer-aided investigation. The first sequence is a variation of one introduced by Stephen Wolfram in connection with his study of cellular automata. The…

  1. Sequence History Update Tool

    Science.gov (United States)

    Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

    2008-01-01

    The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.

  2. HIV Sequence Compendium 2015

    Energy Technology Data Exchange (ETDEWEB)

    Foley, Brian Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas Kenneth [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Cristian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Pennsylvania, Philadelphia, PA (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette Tina Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  3. Evolution of DNA sequencing.

    Science.gov (United States)

    Tipu, Hamid Nawaz; Shabbir, Ambreen

    2015-03-01

    Sanger and coworkers introduced DNA sequencing in 1970s for the first time. It principally relied on termination of growing nucleotide chain when a dideoxythymidine triphosphate (ddTTP) was inserted in it. Detection of terminated sequences was done radiographically on Polyacrylamide Gel Electrophoresis (PAGE). Improvements that have evolved over time in original Sanger sequencing include replacement of radiography with fluorescence, use of separate fluorescent markers for each nucleotide, use of capillary electrophoresis instead of polyacrylamide gel electrophoresis and then introduction of capillary array electrophoresis. However, this technique suffered from few inherent limitations like decreased sensitivity for low level mutant alleles, complexities in analyzing highly polymorphic regions like Major Histocompatibility Complex (MHC) and high DNA concentrations required. Several Next Generation Sequencing (NGS) technologies have been introduced by Roche, Illumina and other commercial manufacturers that tend to overcome Sanger sequencing limitations and have been reviewed. Introduction of NGS in clinical research and medical diagnostics is expected to change entire diagnostic approach. These include study of cancer variants, detection of minimal residual disease, exome sequencing, detection of Single Nucleotide Polymorphisms (SNPs) and their disease association, epigenetic regulation of gene expression and sequencing of microorganisms genome.

  4. The Colliding Beams Sequencer

    International Nuclear Information System (INIS)

    Johnson, D.E.; Johnson, R.P.

    1989-01-01

    The Colliding Beam Sequencer (CBS) is a computer program used to operate the pbar-p Collider by synchronizing the applications programs and simulating the activities of the accelerator operators during filling and storage. The Sequencer acts as a meta-program, running otherwise stand alone applications programs, to do the set-up, beam transfers, acceleration, low beta turn on, and diagnostics for the transfers and storage. The Sequencer and its operational performance will be described along with its special features which include a periodic scheduler and command logger. 14 refs., 3 figs

  5. Phylogenetic Trees From Sequences

    Science.gov (United States)

    Ryvkin, Paul; Wang, Li-San

    In this chapter, we review important concepts and approaches for phylogeny reconstruction from sequence data.We first cover some basic definitions and properties of phylogenetics, and briefly explain how scientists model sequence evolution and measure sequence divergence. We then discuss three major approaches for phylogenetic reconstruction: distance-based phylogenetic reconstruction, maximum parsimony, and maximum likelihood. In the third part of the chapter, we review how multiple phylogenies are compared by consensus methods and how to assess confidence using bootstrapping. At the end of the chapter are two sections that list popular software packages and additional reading.

  6. Gomphid DNA sequence data

    Data.gov (United States)

    U.S. Environmental Protection Agency — DNA sequence data for several genetic loci. This dataset is not publicly accessible because: It's already publicly available on GenBank. It can be accessed through...

  7. General LTE Sequence

    OpenAIRE

    Billal, Masum

    2015-01-01

    In this paper,we have characterized sequences which maintain the same property described in Lifting the Exponent Lemma. Lifting the Exponent Lemma is a very powerful tool in olympiad number theory and recently it has become very popular. We generalize it to all sequences that maintain a property like it i.e. if p^{\\alpha}||a_k and p^\\b{eta}||n, then p^{{\\alpha}+\\b{eta}}||a_{nk}.

  8. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene......This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis...

  9. HIV Sequence Compendium 2010

    Energy Technology Data Exchange (ETDEWEB)

    Kuiken, Carla [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Foley, Brian [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Christian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Alabama, Tuscaloosa, AL (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  10. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed A.

    2012-01-26

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  11. Yeast genome sequencing:

    DEFF Research Database (Denmark)

    Piskur, Jure; Langkjær, Rikke Breinhold

    2004-01-01

    For decades, unicellular yeasts have been general models to help understand the eukaryotic cell and also our own biology. Recently, over a dozen yeast genomes have been sequenced, providing the basis to resolve several complex biological questions. Analysis of the novel sequence data has shown...... that the minimum number of genes from each species that need to be compared to produce a reliable phylogeny is about 20. Yeast has also become an attractive model to study speciation in eukaryotes, especially to understand molecular mechanisms behind the establishment of reproductive isolation. Comparison...... they are short and degenerate and occupy different positions. Comparative genomics helps to understand the origin of yeasts and points out crucial molecular events in yeast evolutionary history, such as whole-genome duplication and horizontal gene transfer(s). In addition, the accumulating sequence data provide...

  12. Palindromic sequence impedes sequencing-by-ligation mechanism.

    Science.gov (United States)

    Huang, Yu-Feng; Chen, Sheng-Chung; Chiang, Yih-Shien; Chen, Tzu-Han; Chiu, Kuo-Ping

    2012-01-01

    Current next-generation sequencing (NGS) platforms adopt two types of sequencing mechanisms: by synthesis or by ligation. The former is employed by 454 and Solexa systems, while the latter by SOLiD system. Although the pros and cons for each sequencing mechanism have more or less been discussed in a number of occasions, the potential obstacle imposed by palindromic sequences has not yet been addressed. To test the effect of the palindromic region on sequencing efficacy, we clonally amplified a paired-end ditag sequence composed of a 24-bp palindromic sequence flanked by a pair of tags from the E. coli genome. We used the near homogeneous fragments produced from MmeI digestion of the amplified clone to generate a sequencing library for SOLiD 5500xl sequencer. Results showed that, traditional ABI sequencers, which adopt sequencing-by-synthesis mechanism, were able to read through the palindromic region. However, SOLiD 5500xl was unable to do so. Instead, the palindromic region was read as miscellaneous random sequences. Moreover, readable tag sequence turned obscure ~2 bp prior to the palindromic region. Taken together, we demonstrate that SOLiD machines, which employ sequencing-by-ligation mechanism, are unable to read through the palindromic region. On the other hand, sequencing-by-synthesis sequencers had no difficulty in doing so.

  13. Goldbach Partitions and Sequences

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 19; Issue 11. Goldbach Partitions and Sequences. Subhash Kak. General Article Volume 19 Issue 11 November 2014 pp 1028-1037. Fulltext. Click here to view fulltext PDF. Permanent link: http://www.ias.ac.in/article/fulltext/reso/019/11/1028-1037 ...

  14. THE RHIC SEQUENCER

    International Nuclear Information System (INIS)

    VAN ZEIJTS, J.; DOTTAVIO, T.; FRAK, B.; MICHNOFF, R.

    2001-01-01

    The Relativistic Heavy Ion Collider (RHIC) has a high level asynchronous time-line driven by a controlling program called the ''Sequencer''. Most high-level magnet and beam related issues are orchestrated by this system. The system also plays an important task in coordinated data acquisition and saving. We present the program, operator interface, operational impact and experience

  15. Goldbach Partitions and Sequences

    Indian Academy of Sciences (India)

    IAS Admin

    Properties of Goldbach partitions of numbers, as sums of primes, are presented and their potential applications to cryptography are described. The sequence of the number of partitions has excel- lent randomness properties. Goldbach partitions can be used to create ellipses and circles on the number line and they can also ...

  16. Metric representation of DNA sequences.

    Science.gov (United States)

    Wu, Z B

    2000-07-01

    A metric representation of DNA sequences is borrowed from symbolic dynamics. In view of this method, the pattern seen in the chaos game representation of DNA sequences is explained as the suppression of certain nucleotide strings in the DNA sequences. Frequencies of short nucleotide strings and suppression of the shortest ones in the DNA sequences can be determined by using the metric representation.

  17. Image sequence analysis

    CERN Document Server

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  18. Sequencing BPS spectra

    Energy Technology Data Exchange (ETDEWEB)

    Gukov, Sergei [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Max-Planck-Institut für Mathematik,Vivatsgasse 7, D-53111 Bonn (Germany); Nawata, Satoshi [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Centre for Quantum Geometry of Moduli Spaces, University of Aarhus,Nordre Ringgade 1, DK-8000 (Denmark); Saberi, Ingmar [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Stošić, Marko [CAMGSD, Departamento de Matemática, Instituto Superior Técnico,Av. Rovisco Pais, 1049-001 Lisbon (Portugal); Mathematical Institute SANU,Knez Mihajlova 36, 11000 Belgrade (Serbia); Sułkowski, Piotr [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Faculty of Physics, University of Warsaw,ul. Pasteura 5, 02-093 Warsaw (Poland)

    2016-03-02

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincaré polynomials in numerous examples. Among these structural properties is a novel “sliding” property, which can be explained by using (refined) modular S-matrix. This leads to the identification of modular transformations in Chern-Simons theory and 3d N=2 theory via the 3d/3d correspondence. Lastly, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  19. A vision for ubiquitous sequencing.

    Science.gov (United States)

    Erlich, Yaniv

    2015-10-01

    Genomics has recently celebrated reaching the $1000 genome milestone, making affordable DNA sequencing a reality. With this goal successfully completed, the next goal of the sequencing revolution can be sequencing sensors--miniaturized sequencing devices that are manufactured for real-time applications and deployed in large quantities at low costs. The first part of this manuscript envisions applications that will benefit from moving the sequencers to the samples in a range of domains. In the second part, the manuscript outlines the critical barriers that need to be addressed in order to reach the goal of ubiquitous sequencing sensors. © 2015 Erlich; Published by Cold Spring Harbor Laboratory Press.

  20. Psychoacoustic Properties of Fibonacci Sequences

    Directory of Open Access Journals (Sweden)

    J. Sokoll

    2008-01-01

    Full Text Available 1202, Fibonacci set up one of the most interesting sequences in number theory. This sequence can be represented by so-called Fibonacci Numbers, and by a binary sequence of zeros and ones. If such a binary Fibonacci Sequence is played back as an audio file, a very dissonant sound results. This is caused by the “almost-periodic”, “self-similar” property of the binary sequence. The ratio of zeros and ones converges to the golden ratio, as do the primary and secondary spectral components intheir frequencies and amplitudes. These Fibonacci Sequences will be characterized using listening tests and psychoacoustic analyses. 

  1. Novel expressed sequence tag- simple sequence repeats (EST ...

    African Journals Online (AJOL)

    Using different bioinformatic criteria, the SUCEST database was used to mine for simple sequence repeat (SSR) markers. Among 42,189 clusters, 1,425 expressed sequence tag- simple sequence repeats (EST-SSRs) were identified in silico. Trinucleotide repeats were the most abundant SSRs detected. Of 212 primer pairs ...

  2. Platyrrhine dental eruption sequences.

    Science.gov (United States)

    Henderson, Emily

    2007-10-01

    To determine dental eruption sequences of extant platyrrhines, 367 mandibles and maxillae of informative juvenile specimens from all 16 genera were scored for presence of permanent teeth including three intermediate eruption stages following Harvati (Am J Phys Anthropol 112 (2000) 69-85). The timing of molar eruption relative to that of the anterior dentition is variable in platyrrhines. Aotus is precocious, with all molars erupting in succession before replacement of any deciduous teeth, while Cebus is delayed in M2-3 eruption relative to I1-2. Callitrichines have a distinct tendency toward delayed canine and premolar development. Platyrrhine eruption sequences presented here show some evidence of conformity to Schultz's Rule, with relatively early replacement of deciduous dentition in "slower"-growing animals. The relationship of dental eruption sequences to degree of folivory, body mass, brain mass, and dietary quality is also examined. The early eruption of molars relative to anterior teeth in Pithecia, Chiropotes, and Cacajao, in comparison to genera such as Ateles, Lagothrix, and Alouatta, showing relatively later eruption of the molars, appears to be consistent with current phylogenetic hypotheses. Schultz (Am J Phys Anthropol 19 (1935) 489-581) postulated early relative molar eruption as the primitive dental eruption schedule for primates. The extremely early molar eruption of Aotus versus Callicebus (where both incisors erupt before M2 and M3, with M3 usually last) may lend support to the status of Aotus as a basal taxon. The early relative molar eruption of the fossil platyrrhine species Branisella boliviana is also consistent with this hypothesis (Takai et al.: Am J Phys Anthropol 111 (2000) 263-281). (c) 2007 Wiley-Liss, Inc.

  3. DNA Sequencing apparatus

    Science.gov (United States)

    Tabor, Stanley; Richardson, Charles C.

    1992-01-01

    An automated DNA sequencing apparatus having a reactor for providing at least two series of DNA products formed from a single primer and a DNA strand, each DNA product of a series differing in molecular weight and having a chain terminating agent at one end; separating means for separating the DNA products to form a series bands, the intensity of substantially all nearby bands in a different series being different, band reading means for determining the position an This invention was made with government support including a grant from the U.S. Public Health Service, contract number AI-06045. The U.S. government has certain rights in the invention.

  4. Infinite sequences and series

    CERN Document Server

    Knopp, Konrad

    1956-01-01

    One of the finest expositors in the field of modern mathematics, Dr. Konrad Knopp here concentrates on a topic that is of particular interest to 20th-century mathematicians and students. He develops the theory of infinite sequences and series from its beginnings to a point where the reader will be in a position to investigate more advanced stages on his own. The foundations of the theory are therefore presented with special care, while the developmental aspects are limited by the scope and purpose of the book. All definitions are clearly stated; all theorems are proved with enough detail to ma

  5. Allele Re-sequencing Technologies

    DEFF Research Database (Denmark)

    Byrne, Stephen; Farrell, Jacqueline Danielle; Asp, Torben

    2013-01-01

    The development of next-generation sequencing technologies has made sequencing an affordable approach for detection of genetic variations associated with various traits. However, the cost of whole genome re-sequencing still remains too high to be feasible for many plant species with large and com...... alternative to whole genome re-sequencing to identify causative genetic variations in plants. One challenge, however, will be efficient bioinformatics strategies for data handling and analysis from the increasing amount of sequence information....... and complex genomes. Recent developments in strategies for target-enrichment, transcriptome re-sequencing, and partial genome re-sequencing allows for enrichment for regions of interest at a scale that is matched to the throughput of next-generation sequencing platforms, and has emerged as a promising...

  6. Spaces of ideal convergent sequences.

    Science.gov (United States)

    Mursaleen, M; Sharma, Sunil K

    2014-01-01

    In the present paper, we introduce some sequence spaces using ideal convergence and Musielak-Orlicz function ℳ = (M(k)). We also examine some topological properties of the resulting sequence spaces.

  7. Sequence Handling by Sequence Analysis Toolbox v1.0

    DEFF Research Database (Denmark)

    Ingrell, Christian Ravnsborg; Matthiesen, Rune; Jensen, Ole Nørregaard

    2006-01-01

    analysis toolbox v1.0 was to have a general purpose sequence analyzing tool that can import sequences obtained by high-throughput sequencing methods. The program includes algorithms for calculation or prediction of isoelectric point, hydropathicity index, transmembrane segments, and glycosylphosphatidyl......The fact that mass spectrometry have become a high-throughput method calls for bioinformatic tools for automated sequence handling and prediction. For efficient use of bioinformatic tools, it is important that these tools are integrated or interfaced with each other. The purpose of sequence...... inositol-anchored proteins....

  8. Rapid Polymer Sequencer

    Science.gov (United States)

    Stolc, Viktor (Inventor); Brock, Matthew W (Inventor)

    2013-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component. The nanopore preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  9. A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data

    Directory of Open Access Journals (Sweden)

    Masanori Kakuta

    2017-10-01

    Full Text Available Sequence similarity searches have been widely used in the analyses of metagenomic sequencing data. Finding homologous sequences in a reference database enables the estimation of taxonomic and functional characteristics of each query sequence. Because current metagenomic sequencing data consist of a large number of nucleotide sequences, the time required for sequence similarity searches account for a large proportion of the total time. This time-consuming step makes it difficult to perform large-scale analyses. To analyze large-scale metagenomic data, such as those found in the human oral microbiome, we developed GHOST-MP (Genome-wide HOmology Search Tool on Massively Parallel system, a parallel sequence similarity search tool for massively parallel computing systems. This tool uses a fast search algorithm based on suffix arrays of query and database sequences and a hierarchical parallel search to accelerate the large-scale sequence similarity search of metagenomic sequencing data. The parallel computing efficiency and the search speed of this tool were evaluated. GHOST-MP was shown to be scalable over 10,000 CPU (Central Processing Unit cores, and achieved over 80-fold acceleration compared with mpiBLAST using the same computational resources. We applied this tool to human oral metagenomic data, and the results indicate that the oral cavity, the oral vestibule, and plaque have different characteristics based on the functional gene category.

  10. Putting instruction sequences into effect

    NARCIS (Netherlands)

    Bergstra, J.A.

    2011-01-01

    An attempt is made to define the concept of execution of an instruction sequence. It is found to be a special case of directly putting into effect of an instruction sequence. Directly putting into effect of an instruction sequences comprises interpretation as well as execution. Directly putting into

  11. Sequencing Games with Repeated Players

    NARCIS (Netherlands)

    Estevez Fernandez, M.A.; Borm, P.; Calleja, P.; Hamers, H.

    2008-01-01

    Two classes of one machine sequencing situations are considered in which each job corresponds to exactly one player but a player may have more than one job to be processed, so called RP(repeated player) sequencing situations. In max-RP sequencing situations it is assumed that each player's cost

  12. Blazar Sequence in Fermi Era

    Indian Academy of Sciences (India)

    2016-01-27

    Jan 27, 2016 ... In this paper, we review the latest research results on the topic of blazar sequence. It seems that the blazar sequence is phenomenally ruled out, while the theoretical blazar sequence still holds. We point out that black hole mass is a dominated parameter accounting for high-power-high-synchrotron-peaked ...

  13. Sequence analysis on microcomputers.

    Science.gov (United States)

    Cannon, G C

    1987-10-02

    Overall, each of the program packages performed their tasks satisfactorily. For analyses where there was a well-defined answer, such as a search for a restriction site, there were few significant differences between the program sets. However, for tasks in which a degree of flexibility is desirable, such as homology or similarity determinations and database searches, DNASTAR consistently afforded the user more options in conducting the required analysis than did the other two packages. However, for laboratories where sequence analysis is not a major effort and the expense of a full sequence analysis workstation cannot be justified, MicroGenie and IBI-Pustell offer a satisfactory alternative. MicroGenie is a polished program system. Many may find that its user interface is more "user friendly" than the standard menu-driven interfaces. Its system of filing sequences under individual passwords facilitates use by more than one person. MicroGenie uses a hardware device for software protection that occupies a card slot in the computer on which it is used. Although I am sympathetic to the problem of software piracy, I feel that a less drastic solution is in order for a program likely to be sharing limited computer space with other software packages. The IBI-Pustell package performs the required analysis functions as accurately and quickly as MicroGenie but it lacks the clearness and ease of use. The menu system seems disjointed, and new or infrequent users often find themselves at apparent "dead-end menus" where the only clear alternative is to restart the entire program package. It is suggested from published accounts that the user interface is going to be upgraded and perhaps when that version is available, use of the system will be improved. The documentation accompanying each package was relatively clear as to how to run the programs, but all three packages assumed that the user was familiar with the computational techniques employed. MicroGenie and IBI-Pustell further

  14. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    Science.gov (United States)

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can

  15. RIKEN integrated sequence analysis (RISA) system--384-format sequencing pipeline with 384 multicapillary sequencer.

    Science.gov (United States)

    Shibata, K; Itoh, M; Aizawa, K; Nagaoka, S; Sasaki, N; Carninci, P; Konno, H; Akiyama, J; Nishi, K; Kitsunai, T; Tashiro, H; Itoh, M; Sumi, N; Ishii, Y; Nakamura, S; Hazama, M; Nishine, T; Harada, A; Yamamoto, R; Matsumoto, H; Sakaguchi, S; Ikegami, T; Kashiwagi, K; Fujiwake, S; Inoue, K; Togawa, Y

    2000-11-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3' end and 5' end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can be

  16. New MR pulse sequence

    International Nuclear Information System (INIS)

    Harms, S.E.; Flamig, D.P.; Griffey, R.H.

    1990-01-01

    This paper describes a method for fat suppression for three-dimensional MR imaging. The FATS (fat-suppressed acquisition with echo time shortened) sequence employs a pair of opposing adiabatic half-passage RF pulses tuned on fat resonance. The imaging parameters are as follows: TR, 20 msec; TE, 21.7-3.2 msec; 1,024 x 128 x 128 acquired matrix; imaging time, approximately 11 minutes. A series of 54 examinations were performed. Excellent fat suppression with water excitation is achieved in all cases. The orbital images demonstrate superior resolution of small orbital lesions. The high signal-to-noise ratio (SNR) in cranial studies demonstrates excellent petrous bone and internal auditory canal anatomy

  17. The evolution of nanopore sequencing

    Science.gov (United States)

    Wang, Yue; Yang, Qiuping; Wang, Zhimin

    2014-01-01

    The “$1000 Genome” project has been drawing increasing attention since its launch a decade ago. Nanopore sequencing, the third-generation, is believed to be one of the most promising sequencing technologies to reach four gold standards set for the “$1000 Genome” while the second-generation sequencing technologies are bringing about a revolution in life sciences, particularly in genome sequencing-based personalized medicine. Both of protein and solid-state nanopores have been extensively investigated for a series of issues, from detection of ionic current blockage to field-effect-transistor (FET) sensors. A newly released protein nanopore sequencer has shown encouraging potential that nanopore sequencing will ultimately fulfill the gold standards. In this review, we address advances, challenges, and possible solutions of nanopore sequencing according to these standards. PMID:25610451

  18. The Evolution of Nanopore Sequencing

    Directory of Open Access Journals (Sweden)

    Yue eWang

    2015-01-01

    Full Text Available The $1,000 Genome project has been drawing increasing attention since its launch a decade ago. Nanopore sequencing, the third-generation, is believed to be one of the most promising sequencing technologies to reach four gold standards set for the $1,000 Genome while the second-generation sequencing technologies are bringing about a revolution in life sciences, particularly in genome sequencing-based personalized medicine. Both of protein and solid-state nanopores have been extensively investigated for a series of issues, from detection of ionic current blockage to field-effect-transistor (FET sensors. A newly released protein nanopore sequencer has shown encouraging potential that nanopore sequencing will ultimately fulfill the gold standards. In this review, we address advances, challenges, and possible solutions of nanopore sequencing according to these standards.

  19. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    Science.gov (United States)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  20. Graphene nanodevices for DNA sequencing

    Science.gov (United States)

    Heerema, Stephanie J.; Dekker, Cees

    2016-02-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with nanopores. Owing to its unique structure and properties, graphene provides interesting opportunities for the development of a new sequencing technology. In recent years, a wide range of creative ideas for graphene sequencers have been theoretically proposed and the first experimental demonstrations have begun to appear. Here, we review the different approaches to using graphene nanodevices for DNA sequencing, which involve DNA passing through graphene nanopores, nanogaps, and nanoribbons, and the physisorption of DNA on graphene nanostructures. We discuss the advantages and problems of each of these key techniques, and provide a perspective on the use of graphene in future DNA sequencing technology.

  1. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  2. Nonlinear analysis of biological sequences

    Energy Technology Data Exchange (ETDEWEB)

    Torney, D.C.; Bruno, W.; Detours, V. [and others

    1998-11-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The main objectives of this project involved deriving new capabilities for analyzing biological sequences. The authors focused on tabulating the statistical properties exhibited by Human coding DNA sequences and on techniques of inferring the phylogenetic relationships among protein sequences related by descent.

  3. Biosensors for DNA sequence detection

    Science.gov (United States)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  4. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal

    2011-11-01

    Bioinformatics database is growing exponentially in size. Processing these large amount of data may take hours of time even if super computers are used. One of the most important processing tool in Bioinformatics is sequence alignment. We introduce fast alignment algorithm, called \\'Alignment By Scanning\\' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the \\'GAP\\' (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 51% enhancement in alignment score when it is compared with the GAP Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  5. SNMR pulse sequence phase cycling

    Science.gov (United States)

    Walsh, David O; Grunewald, Elliot D

    2013-11-12

    Technologies applicable to SNMR pulse sequence phase cycling are disclosed, including SNMR acquisition apparatus and methods, SNMR processing apparatus and methods, and combinations thereof. SNMR acquisition may include transmitting two or more SNMR pulse sequences and applying a phase shift to a pulse in at least one of the pulse sequences, according to any of a variety cycling techniques. SNMR processing may include combining SNMR from a plurality of pulse sequences comprising pulses of different phases, so that desired signals are preserved and indesired signals are canceled.

  6. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal

    2011-08-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  7. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS...

  8. Dog Y chromosomal DNA sequence: identification, sequencing and SNP discovery

    Directory of Open Access Journals (Sweden)

    Kirkness Ewen

    2006-10-01

    Full Text Available Abstract Background Population genetic studies of dogs have so far mainly been based on analysis of mitochondrial DNA, describing only the history of female dogs. To get a picture of the male history, as well as a second independent marker, there is a need for studies of biallelic Y-chromosome polymorphisms. However, there are no biallelic polymorphisms reported, and only 3200 bp of non-repetitive dog Y-chromosome sequence deposited in GenBank, necessitating the identification of dog Y chromosome sequence and the search for polymorphisms therein. The genome has been only partially sequenced for one male dog, disallowing mapping of the sequence into specific chromosomes. However, by comparing the male genome sequence to the complete female dog genome sequence, candidate Y-chromosome sequence may be identified by exclusion. Results The male dog genome sequence was analysed by Blast search against the human genome to identify sequences with a best match to the human Y chromosome and to the female dog genome to identify those absent in the female genome. Candidate sequences were then tested for male specificity by PCR of five male and five female dogs. 32 sequences from the male genome, with a total length of 24 kbp, were identified as male specific, based on a match to the human Y chromosome, absence in the female dog genome and male specific PCR results. 14437 bp were then sequenced for 10 male dogs originating from Europe, Southwest Asia, Siberia, East Asia, Africa and America. Nine haplotypes were found, which were defined by 14 substitutions. The genetic distance between the haplotypes indicates that they originate from at least five wolf haplotypes. There was no obvious trend in the geographic distribution of the haplotypes. Conclusion We have identified 24159 bp of dog Y-chromosome sequence to be used for population genetic studies. We sequenced 14437 bp in a worldwide collection of dogs, identifying 14 SNPs for future SNP analyses, and

  9. Diesel Mechanics: Scope and Sequence.

    Science.gov (United States)

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a diesel mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  10. Farey sequences and resistor networks

    Indian Academy of Sciences (India)

    c Indian Academy of Sciences. Farey sequences and resistor networks. SAMEEN AHMED KHAN. Department of Engineering, Salalah College of Technology, Post Box No. 608, ..... 2.61n, strictly fixed by the Farey sequence method. For n ≥ 7, all the three basic sets have odd number of elements since A(n) is odd for n ≥ 6.

  11. Graphene nanodevices for DNA sequencing

    NARCIS (Netherlands)

    Heerema, S.J.; Dekker, C.

    2016-01-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with

  12. Chameleon sequences in neurodegenerative diseases

    Energy Technology Data Exchange (ETDEWEB)

    Bahramali, Golnaz [Institute of Biochemistry and Biophysics, University of Tehran, Tehran (Iran, Islamic Republic of); Goliaei, Bahram, E-mail: goliaei@ut.ac.ir [Institute of Biochemistry and Biophysics, University of Tehran, Tehran (Iran, Islamic Republic of); Minuchehr, Zarrin, E-mail: minuchehr@nigeb.ac.ir [Department of Systems Biotechnology, National Institute of Genetic Engineering and Biotechnology, (NIGEB), Tehran (Iran, Islamic Republic of); Salari, Ali [Department of Systems Biotechnology, National Institute of Genetic Engineering and Biotechnology, (NIGEB), Tehran (Iran, Islamic Republic of)

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to “helix to strand (HE)”, “helix to coil (HC)” and “strand to coil (CE)” alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.

  13. DNA Sequencing Sensors: An Overview

    Directory of Open Access Journals (Sweden)

    Jose Antonio Garrido-Cardenas

    2017-03-01

    Full Text Available The first sequencing of a complete genome was published forty years ago by the double Nobel Prize in Chemistry winner Frederick Sanger. That corresponded to the small sized genome of a bacteriophage, but since then there have been many complex organisms whose DNA have been sequenced. This was possible thanks to continuous advances in the fields of biochemistry and molecular genetics, but also in other areas such as nanotechnology and computing. Nowadays, sequencing sensors based on genetic material have little to do with those used by Sanger. The emergence of mass sequencing sensors, or new generation sequencing (NGS meant a quantitative leap both in the volume of genetic material that was able to be sequenced in each trial, as well as in the time per run and its cost. One can envisage that incoming technologies, already known as fourth generation sequencing, will continue to cheapen the trials by increasing DNA reading lengths in each run. All of this would be impossible without sensors and detection systems becoming smaller and more precise. This article provides a comprehensive overview on sensors for DNA sequencing developed within the last 40 years.

  14. Chameleon sequences in neurodegenerative diseases.

    Science.gov (United States)

    Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Salari, Ali

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to "helix to strand (HE)", "helix to coil (HC)" and "strand to coil (CE)" alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. Combinatorial representations of token sequences

    NARCIS (Netherlands)

    Elzinga, C.H.

    2005-01-01

    This paper presents new representations of token sequences, with and without associated quantities, in Euclidean space. The representations are free of assumptions about the nature of the sequences or the processes that generate them. Algorithms and applications from the domains of structured

  16. A criterion for regular sequences

    Indian Academy of Sciences (India)

    R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22

    Note that every sequence is a strongly regular as well as regular sequence on the zero ... following statement given in Chapter II, 6.1 of [4]. .... for financial support. The authors sincerely thank Harmut Wiebe for stimulating discus- sions. References. [1] Bruns W and Herzog J, Cohen–Macaulay rings (Cambridge Studies in ...

  17. Chameleon sequences in neurodegenerative diseases

    International Nuclear Information System (INIS)

    Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Salari, Ali

    2016-01-01

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to “helix to strand (HE)”, “helix to coil (HC)” and “strand to coil (CE)” alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.

  18. DNA Sequencing by Capillary Electrophoresis

    Science.gov (United States)

    Karger, Barry L.; Guttman, Andras

    2009-01-01

    Sequencing of human and other genomes has been at the center of interest in the biomedical field over the past several decades and is now leading toward an era of personalized medicine. During this time, DNA sequencing methods have evolved from the labor intensive slab gel electrophoresis, through automated multicapillary electrophoresis systems using fluorophore labeling with multispectral imaging, to the “next generation” technologies of cyclic array, hybridization based, nanopore and single molecule sequencing. Deciphering the genetic blueprint and follow-up confirmatory sequencing of Homo sapiens and other genomes was only possible by the advent of modern sequencing technologies that was a result of step by step advances with a contribution of academics, medical personnel and instrument companies. While next generation sequencing is moving ahead at break-neck speed, the multicapillary electrophoretic systems played an essential role in the sequencing of the Human Genome, the foundation of the field of genomics. In this prospective, we wish to overview the role of capillary electrophoresis in DNA sequencing based in part of several of our articles in this journal. PMID:19517496

  19. Rapid Diagnostics of Onboard Sequences

    Science.gov (United States)

    Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

    2012-01-01

    Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command

  20. Accident sequence quantification with KIRAP

    International Nuclear Information System (INIS)

    Kim, Tae Un; Han, Sang Hoon; Kim, Kil You; Yang, Jun Eon; Jeong, Won Dae; Chang, Seung Cheol; Sung, Tae Yong; Kang, Dae Il; Park, Jin Hee; Lee, Yoon Hwan; Hwang, Mi Jeong.

    1997-01-01

    The tasks of probabilistic safety assessment(PSA) consists of the identification of initiating events, the construction of event tree for each initiating event, construction of fault trees for event tree logics, the analysis of reliability data and finally the accident sequence quantification. In the PSA, the accident sequence quantification is to calculate the core damage frequency, importance analysis and uncertainty analysis. Accident sequence quantification requires to understand the whole model of the PSA because it has to combine all event tree and fault tree models, and requires the excellent computer code because it takes long computation time. Advanced Research Group of Korea Atomic Energy Research Institute(KAERI) has developed PSA workstation KIRAP(Korea Integrated Reliability Analysis Code Package) for the PSA work. This report describes the procedures to perform accident sequence quantification, the method to use KIRAP's cut set generator, and method to perform the accident sequence quantification with KIRAP. (author). 6 refs

  1. Poisson process approximation for sequence repeats, and sequencing by hybridization.

    Science.gov (United States)

    Arratia, R; Martin, D; Reinert, G; Waterman, M S

    1996-01-01

    Sequencing by hybridization is a tool to determine a DNA sequence from the unordered list of all l-tuples contained in this sequence; typical numbers for l are l = 8, 10, 12. For theoretical purposes we assume that the multiset of all l-tuples is known. This multiset determines the DNA sequence uniquely if none of the so-called Ukkonen transformations are possible. These transformations require repeats of (l-1)-tuples in the sequence, with these repeats occurring in certain spatial patterns. We model DNA as an i.i.d. sequence. We first prove Poisson process approximations for the process of indicators of all leftmost long repeats allowing self-overlap and for the process of indicators of all left-most long repeats without self-overlap. Using the Chen-Stein method, we get bounds on the error of these approximations. As a corollary, we approximate the distribution of longest repeats. In the second step we analyze the spatial patterns of the repeats. Finally we combine these two steps to prove an approximation for the probability that a random sequence is uniquely recoverable from its list of l-tuples. For all our results we give some numerical examples including error bounds.

  2. Modelling passive margin sequence stratigraphy

    Science.gov (United States)

    Steckler, M.S.; Reynolds, D.; Coakley, B.; Swift, B.A.; Jarrard, R.D.

    1993-01-01

    We have modelled stratigraphic sequences to aid in deciphering the sedimentary response to sea-level change. Sequence geometry is found to be most sensitive to sea level, but other factors, including subsidence rate and sediment supply, can produce similar changes. Sediment loading and compaction also play a major role in generating accommodation, a factor often neglected in sequence-stratigraphic models. All of these parameters can control whether a type 1 or type 2 sequence boundary is produced. The models indicate that variations in margin characteristics produce systematic shifts in sequence boundary timing and systems tract distribution. The timing of the sequence boundary formation and systems tracts may differ by up to one-half of a sea-level cycle. Thus correlative sequence boundaries will not be synchronous. While rates of sea-level change may exceed the rate of thermal subsidence, isostasy and compaction may amplify the rate of total subsidence to several times greater than the thermal subsidence. Thus, total subsidence does not vary uniformly across the margin since it is modified by the sediment load. The amplitude of sea-level changes cannot be determined accurately without accounting for the major processes that affect sediment accumulation. Backstripping of a seismic line on the New Jersey margin is used to reconstruct continental margin geometry. The reconstructions show that the pre-existing ramp-margin geometry, rather than sea level, controls clinoform heights and slopes and sedimentary bypass. Backstripping also reveals progressive deformation of sequences due to compaction. Further work is still needed to understand quantitatively the role of sea level and the tectonic and sedimentary processes controlling sequence formation and influencing sequence architecture.

  3. Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models.

    Science.gov (United States)

    Liu, Bowen; Ramsundar, Bharath; Kawthekar, Prasad; Shi, Jade; Gomes, Joseph; Luu Nguyen, Quang; Ho, Stephen; Sloane, Jack; Wender, Paul; Pande, Vijay

    2017-10-25

    We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem. The end-to-end trained model has an encoder-decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation. The model is trained on 50,000 experimental reaction examples from the United States patent literature, which span 10 broad reaction types that are commonly used by medicinal chemists. We find that our model performs comparably with a rule-based expert system baseline model, and also overcomes certain limitations associated with rule-based expert systems and with any machine learning approach that contains a rule-based expert system component. Our model provides an important first step toward solving the challenging problem of computational retrosynthetic analysis.

  4. Image analysis for DNA sequencing

    International Nuclear Information System (INIS)

    Palaniappan, K.; Huang, T.S.

    1991-01-01

    This paper reports that there is a great deal of interest in automating the process of DNA (deoxyribonucleic acid) sequencing to support the analysis of genomic DNA such as the Human and Mouse Genome projects. In one class of gel-based sequencing protocols autoradiograph images are generated in the final step and usually require manual interpretation to reconstruct the DNA sequence represented by the image. The need to handle a large volume of sequence information necessitates automation of the manual autoradiograph reading step through image analysis in order to reduce the length of time required to obtain sequence data and reduce transcription errors. Various adaptive image enhancement, segmentation and alignment methods were applied to autoradiograph images. The methods are adaptive to the local characteristics of the image such as noise, background signal, or presence of edges. Once the two-dimensional data is converted to a set of aligned one-dimensional profiles waveform analysis is used to determine the location of each band which represents one nucleotide in the sequence. Different classification strategies including a rule-based approach are investigated to map the profile signals, augmented with the original two-dimensional image data as necessary, to textual DNA sequence information

  5. The Dynamics of DNA Sequencing.

    Science.gov (United States)

    Morvillo, Nancy

    1997-01-01

    Describes a paper-and-pencil activity that helps students understand DNA sequencing and expands student understanding of DNA structure, replication, and gel electrophoresis. Appropriate for advanced biology students who are familiar with the Sanger method. (DDR)

  6. Fractal nature of stratigraphic sequences

    NARCIS (Netherlands)

    Schlager, W.

    2004-01-01

    Orders of stratigraphic sequences are being used loosely and with widely varying definitions. The orders seem to be subdivisions of convenience rather than an indication of natural structure. It is proposed that, at least at time scales of 10

  7. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  8. Integrated sequence analysis. Final report

    International Nuclear Information System (INIS)

    Andersson, K.; Pyy, P.

    1998-02-01

    The NKS/RAK subprojet 3 'integrated sequence analysis' (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term 'methodology' denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  9. Nanogrid rolling circle DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Church, George M.; Porreca, Gregory J.; Shendure, Jay; Rosenbaum, Abraham Meir

    2017-04-18

    The present invention relates to methods for sequencing a polynucleotide immobilized on an array having a plurality of specific regions each having a defined diameter size, including synthesizing a concatemer of a polynucleotide by rolling circle amplification, wherein the concatemer has a cross-sectional diameter greater than the diameter of a specific region, immobilizing the concatemer to the specific region to make an immobilized concatemer, and sequencing the immobilized concatemer.

  10. Graphene Nanopores for Protein Sequencing

    Science.gov (United States)

    Wilson, James; Sloman, Leila; He, Zhiren

    2016-01-01

    An inexpensive, reliable method for protein sequencing is essential to unraveling the biological mechanisms governing cellular behavior and disease. Current protein sequencing methods suffer from limitations associated with the size of proteins that can be sequenced, the time, and the cost of the sequencing procedures. Here, we report the results of all-atom molecular dynamics simulations that investigated the feasibility of using graphene nanopores for protein sequencing. We focus our study on the biologically significant phenylalanine-glycine repeat peptides (FG-nups)—parts of the nuclear pore transport machinery. Surprisingly, we found FG-nups to behave similarly to single stranded DNA: the peptides adhere to graphene and exhibit step-wise translocation when subject to a transmembrane bias or a hydrostatic pressure gradient. Reducing the peptide’s charge density or increasing the peptide’s hydrophobicity was found to decrease the translocation speed. Yet, unidirectional and stepwise translocation driven by a transmembrane bias was observed even when the ratio of charged to hydrophobic amino acids was as low as 1:8. The nanopore transport of the peptides was found to produce stepwise modulations of the nanopore ionic current correlated with the type of amino acids present in the nanopore, suggesting that protein sequencing by measuring ionic current blockades may be possible. PMID:27746710

  11. Long-range barcode labeling-sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Chen, Feng; Zhang, Tao; Singh, Kanwar K.; Pennacchio, Len A.; Froula, Jeff L.; Eng, Kevin S.

    2016-10-18

    Methods for sequencing single large DNA molecules by clonal multiple displacement amplification using barcoded primers. Sequences are binned based on barcode sequences and sequenced using a microdroplet-based method for sequencing large polynucleotide templates to enable assembly of haplotype-resolved complex genomes and metagenomes.

  12. Ossification sequence heterochrony among amphibians.

    Science.gov (United States)

    Harrington, Sean M; Harrison, Luke B; Sheil, Christopher A

    2013-01-01

    Heterochrony is an important mechanism in the evolution of amphibians. Although studies have centered on the relationship between size and shape and the rates of development, ossification sequence heterochrony also may have been important. Rigorous, phylogenetic methods for assessing sequence heterochrony are relatively new, and a comprehensive study of the relative timing of ossification of skeletal elements has not been used to identify instances of sequence heterochrony across Amphibia. In this study, a new version of the program Parsimov-based genetic inference (PGi) was used to identify shifts in ossification sequences across all extant orders of amphibians, for all major structural units of the skeleton. PGi identified a number of heterochronic sequence shifts in all analyses, the most interesting of which seem to be tied to differences in metamorphic patterns among major clades. Early ossification of the vomer, premaxilla, and dentary is retained by Apateon caducus and members of Gymnophiona and Urodela, which lack the strongly biphasic development seen in anurans. In contrast, bones associated with the jaws and face were identified as shifting late in the ancestor of Anura. The bones that do not shift late, and thereby occupy the earliest positions in the anuran cranial sequence, are those in regions of the skull that undergo the least restructuring throughout anuran metamorphosis. Additionally, within Anura, bones of the hind limb and pelvic girdle were also identified as shifting early in the sequence of ossification, which may be a result of functional constraints imposed by the drastic metamorphosis of most anurans. © 2013 Wiley Periodicals, Inc.

  13. Sequence Factorization with Multiple References.

    Science.gov (United States)

    Wandelt, Sebastian; Leser, Ulf

    2015-01-01

    The success of high-throughput sequencing has lead to an increasing number of projects which sequence large populations of a species. Storage and analysis of sequence data is a key challenge in these projects, because of the sheer size of the datasets. Compression is one simple technology to deal with this challenge. Referential factorization and compression schemes, which store only the differences between input sequence and a reference sequence, gained lots of interest in this field. Highly-similar sequences, e.g., Human genomes, can be compressed with a compression ratio of 1,000:1 and more, up to two orders of magnitude better than with standard compression techniques. Recently, it was shown that the compression against multiple references from the same species can boost the compression ratio up to 4,000:1. However, a detailed analysis of using multiple references is lacking, e.g., for main memory consumption and optimality. In this paper, we describe one key technique for the referential compression against multiple references: The factorization of sequences. Based on the notion of an optimal factorization, we propose optimization heuristics and identify parameter settings which greatly influence 1) the size of the factorization, 2) the time for factorization, and 3) the required amount of main memory. We evaluate a total of 30 setups with a varying number of references on data from three different species. Our results show a wide range of factorization sizes (optimal to an overhead of up to 300%), factorization speed (0.01 MB/s to more than 600 MB/s), and main memory usage (few dozen MB to dozens of GB). Based on our evaluation, we identify the best configurations for common use cases. Our evaluation shows that multi-reference factorization is much better than single-reference factorization.

  14. Sequence Factorization with Multiple References.

    Directory of Open Access Journals (Sweden)

    Sebastian Wandelt

    Full Text Available The success of high-throughput sequencing has lead to an increasing number of projects which sequence large populations of a species. Storage and analysis of sequence data is a key challenge in these projects, because of the sheer size of the datasets. Compression is one simple technology to deal with this challenge. Referential factorization and compression schemes, which store only the differences between input sequence and a reference sequence, gained lots of interest in this field. Highly-similar sequences, e.g., Human genomes, can be compressed with a compression ratio of 1,000:1 and more, up to two orders of magnitude better than with standard compression techniques. Recently, it was shown that the compression against multiple references from the same species can boost the compression ratio up to 4,000:1. However, a detailed analysis of using multiple references is lacking, e.g., for main memory consumption and optimality. In this paper, we describe one key technique for the referential compression against multiple references: The factorization of sequences. Based on the notion of an optimal factorization, we propose optimization heuristics and identify parameter settings which greatly influence 1 the size of the factorization, 2 the time for factorization, and 3 the required amount of main memory. We evaluate a total of 30 setups with a varying number of references on data from three different species. Our results show a wide range of factorization sizes (optimal to an overhead of up to 300%, factorization speed (0.01 MB/s to more than 600 MB/s, and main memory usage (few dozen MB to dozens of GB. Based on our evaluation, we identify the best configurations for common use cases. Our evaluation shows that multi-reference factorization is much better than single-reference factorization.

  15. ARC Code TI: sequenceMiner

    Data.gov (United States)

    National Aeronautics and Space Administration — The sequenceMiner was developed to address the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. sequenceMiner works...

  16. A Demonstration of Automated DNA Sequencing.

    Science.gov (United States)

    Latourelle, Sandra; Seidel-Rogol, Bonnie

    1998-01-01

    Details a simulation that employs a paper-and-pencil model to demonstrate the principles behind automated DNA sequencing. Discusses the advantages of automated sequencing as well as the chemistry of automated DNA sequencing. (DDR)

  17. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko

    2018-02-14

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  18. Transformed composite sequences for improved qubit addressing

    Science.gov (United States)

    Merrill, J. True; Doret, S. Charles; Vittorini, Grahame; Addison, J. P.; Brown, Kenneth R.

    2014-10-01

    Selective laser addressing of a single atom or atomic ion qubit can be improved using narrow-band composite pulse sequences. We describe a Lie-algebraic technique to generalize known narrow-band sequences and introduce sequences related by dilation and rotation of sequence generators. Our method improves known narrow-band sequences by decreasing both the pulse time and the residual error. Finally, we experimentally demonstrate these composite sequences using 40Ca+ ions trapped in a surface-electrode ion trap.

  19. Comparative analysis of sequences from PT 2013

    DEFF Research Database (Denmark)

    Mikkelsen, Susie Sommer

    . All but one sequence mapped to the MCP gene while the last sequence mapped to the Neurofilament gene. Approx. half of the sequences contained no errors while the rest differed with 88-99 percent similarity with most having 99% similarity. One sequence, when BLASTed, showed most similarity to European...... Sheatfish and not EHNV. Generally, mistakes occurred at the ends of the sequences. This can be due to several factors. One is that the sequence has not been trimmed of the sequence primer sites. Another is the lack of quality control of the chromatogram. Finally, sequencing in just one direction can result...

  20. Sequences, groups, and number theory

    CERN Document Server

    Rigo, Michel

    2018-01-01

    This collaborative book presents recent trends on the study of sequences, including combinatorics on words and symbolic dynamics, and new interdisciplinary links to group theory and number theory. Other chapters branch out from those areas into subfields of theoretical computer science, such as complexity theory and theory of automata. The book is built around four general themes: number theory and sequences, word combinatorics, normal numbers, and group theory. Those topics are rounded out by investigations into automatic and regular sequences, tilings and theory of computation, discrete dynamical systems, ergodic theory, numeration systems, automaton semigroups, and amenable groups.  This volume is intended for use by graduate students or research mathematicians, as well as computer scientists who are working in automata theory and formal language theory. With its organization around unified themes, it would also be appropriate as a supplemental text for graduate level courses.

  1. Operator Ideal of Cesaro Type Sequence Spaces Involving Lacunary Sequence

    Directory of Open Access Journals (Sweden)

    Awad A. Bakery

    2014-01-01

    Full Text Available The aim of this paper is to give the sufficient conditions on the sequence space Cesθ,p defined in Lim (1977 such that the class of all bounded linear operators between any arbitrary Banach spaces with nth approximation numbers of the bounded linear operators in Cesθ,p form an operator ideal.

  2. Integrated sequence analysis. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Andersson, K.; Pyy, P

    1998-02-01

    The NKS/RAK subprojet 3 `integrated sequence analysis` (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term `methodology` denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  3. Probabilistic studies of accident sequences

    International Nuclear Information System (INIS)

    Villemeur, A.; Berger, J.P.

    1986-01-01

    For several years, Electricite de France has carried out probabilistic assessment of accident sequences for nuclear power plants. In the framework of this program many methods were developed. As the interest in these studies was increasing and as adapted methods were developed, Electricite de France has undertaken a probabilistic safety assessment of a nuclear power plant [fr

  4. On primes in Lucas sequences

    Czech Academy of Sciences Publication Activity Database

    Křížek, Michal; Somer, L.

    2015-01-01

    Roč. 53, č. 1 (2015), s. 2-23 ISSN 0015-0517 R&D Projects: GA ČR GA14-02067S Institutional support: RVO:67985840 Keywords : Lucas sequence * primes Subject RIV: BA - General Math ematics http://www.fq. math .ca/Abstracts/53-1/somer.pdf

  5. MRI sequences and their parameters

    International Nuclear Information System (INIS)

    Teissier, J.M.

    1993-01-01

    Listing basic sequences and their present variants makes a synthetic classification of the various acquisition modes possible. The knowledge of the advantages of each of them, as well as of their disadvantages and restraints, seems to be an essential prerequisite to an optimal utilization of each magnetic resonance imaging system. (author)

  6. Curious Consequences of Simple Sequences

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 12; Issue 1. Curious Consequences of Simple Sequences. A K Mallik. General Article Volume 12 Issue 1 January 2007 pp ... Author Affiliations. A K Mallik1. Department of Mechanical Engineering, Indian Institute of Technology, Kanpur 208 016, India.

  7. On primes in Lucas sequences

    Czech Academy of Sciences Publication Activity Database

    Křížek, Michal; Somer, L.

    2015-01-01

    Roč. 53, č. 1 (2015), s. 2-23 ISSN 0015-0517 R&D Projects: GA ČR GA14-02067S Institutional support: RVO:67985840 Keywords : Lucas sequence * primes Subject RIV: BA - General Mathematics http://www.fq.math.ca/Abstracts/53-1/somer.pdf

  8. Crop Sequence Calculator, v. 3

    Science.gov (United States)

    Producers need to know how to sequence crops to develop sustainable dynamic cropping systems that take advantage of inherent internal resources, such as crop synergism, nutrient cycling, and soil water, and capitalize on external resources, such as weather, markets, and government programs. Version ...

  9. Sequences in language and text

    CERN Document Server

    Mikros, George K

    2015-01-01

    The aim of this volume is to present the diverse but highly interesting area of the quantitative analysis of the sequence of various linguistic structures. The collected articles present a wide spectrum of quantitative analyses of linguistic syntagmatic structures and explore novel sequential linguistic entities. This volume will be interesting to all researchers studying linguistics using quantitative methods.

  10. Exome sequencing for syndrome diagnostics

    DEFF Research Database (Denmark)

    Østergaard, Elsebet; Risom, Lotte; Ek, Jakob

    2017-01-01

    The majority of rare congenital disorders and syndromes have a genetic cause, but the diagnostic rate using standard workup is only around 50%. Whole exome and whole genome sequencing methods have improved the genetic diagnosis of syndromes during the latest few years. This article...

  11. Repdigits in k-Lucas sequences

    Indian Academy of Sciences (India)

    57(2) 2000 243-254) proved that 11 is the largest number with only one distinct digit (the so-called repdigit) in the sequence ( L n ( 2 ) ) n . In this paper, we address a similar problem in the family of -Lucas sequences. We also show that the -Lucas sequences have similar properties to those of -Fibonacci sequences ...

  12. An analysis of sequence alignment: heuristic algorithms.

    Science.gov (United States)

    Bucak, I Ö; Uslan, V

    2010-01-01

    Sequence alignment becomes challenging with an increase in size and number of sequences. Finding optimal or near optimal solutions for sequence alignment is one of the most important operations in bioinformatics. This study aims to survey heuristics applied for the sequence alignment problem summarized in a time line.

  13. Protein sequence analysis using Hewlett-Packard biphasic sequencing cartridges in an applied biosystems 473A protein sequencer.

    Science.gov (United States)

    Tang, S; Mozdzanowski, J; Anumula, K R

    1999-01-01

    Protein sequence analysis using an adsorptive biphasic sequencing cartridge, a set of two coupled columns introduced by Hewlett-Packard for protein sequencing by Edman degradation, in an Applied Biosystems 473A protein sequencer has been demonstrated. Samples containing salts, detergents, excipients, etc. (e.g., formulated protein drugs) can be easily analyzed using the ABI sequencer. Simple modifications to the ABI sequencer to accommodate the cartridge extend its utility in the analysis of difficult samples. The ABI sequencer solvents and reagents were compatible with the HP cartridge for sequencing. Sequence information up to ten residues can be easily generated by this nonoptimized procedure, and it is sufficient for identifying proteins by database search and for preparing a DNA probe for cloning novel proteins.

  14. Nonparametric Inference for Periodic Sequences

    KAUST Repository

    Sun, Ying

    2012-02-01

    This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.

  15. Apparatus for improved DNA sequencing

    Science.gov (United States)

    Douthart, Richard J.; Crowell, Shannon L.

    1996-01-01

    This invention is a means for the rapid sequencing of DNA samples. More specifically, it consists of a new design direct blotting electrophoresis unit. The DNA sequence is deposited on a membrane attached to a rotating drum. Initial data compaction is facilitated by the use of a machined multi-channeled plate called a ribbon channel plate. Each channel is an isolated mini gel system much like a gel filled capillary. The system as a whole, however, is in a slab gel like format with the advantages of uniformity and easy reusability. The system can be used in different embodiments. The drum system is unique in that after deposition the drum rotates the deposited DNA into a large non-buffer open space where processing and detection can occur. The drum can also be removed in toto to special workstations for downstream processing, multiplexing and detection.

  16. Cassini Mission Sequence Subsystem (MSS)

    Science.gov (United States)

    Alland, Robert

    2011-01-01

    This paper describes my work with the Cassini Mission Sequence Subsystem (MSS) team during the summer of 2011. It gives some background on the motivation for this project and describes the expected benefit to the Cassini program. It then introduces the two tasks that I worked on - an automatic system auditing tool and a series of corrections to the Cassini Sequence Generator (SEQ_GEN) - and the specific objectives these tasks were to accomplish. Next, it details the approach I took to meet these objectives and the results of this approach, followed by a discussion of how the outcome of the project compares with my initial expectations. The paper concludes with a summary of my experience working on this project, lists what the next steps are, and acknowledges the help of my Cassini colleagues.

  17. Sequence correlations shape protein promiscuity

    Science.gov (United States)

    Lukatsky, David B.; Afek, Ariel; Shakhnovich, Eugene I.

    2011-08-01

    We predict analytically that diagonal correlations of amino acid positions within protein sequences statistically enhance protein propensity for nonspecific binding. We use the term "promiscuity" to describe such nonspecific binding. Diagonal correlations represent statistically significant repeats of sequence patterns where amino acids of the same type are clustered together. The predicted effect is qualitatively robust with respect to the form of the microscopic interaction potentials and the average amino acid composition. Our analytical results provide an explanation for the enhanced diagonal correlations observed in hubs of eukaryotic organismal proteomes [J. Mol. Biol. 409, 439 (2011)], 10.1016/j.jmb.2011.03.056. We suggest experiments that will allow direct testing of the predicted effect.

  18. Replacement collision sequences in metals

    International Nuclear Information System (INIS)

    Blewitt, T.H.; Kirk, M.A.; Scott, T.L.

    1975-10-01

    The concept of radiation-induced defects traveling large distances by focussed collision sequences (focusons) without thermal activation has important consequences in radiation effect studies. The focussed collision sequences are of two types: (1) ''Silsbee focussing'' or momentum focussing which can cause defect pairs to form large distances from the primary knock-on and (2) focussed replacement collisions also called ''dynamic crowdions'' where mass transport causes a large separation between the vacancy and its interstitial. Direct experimental evidence for focussed collision sequences is in short supply and conflicting. The sputtering patterns associated with close packed crystalline directions from the backscattering of charged particles seemed to substantiate long-range focussed collisions until it was pointed out that collision chains need not be long to yield such patterns. More recently, transmission sputtering has been used with conflicting results. Ecker et al. found no evidence for focusons greater than 17 atom distances whereas preliminary results of Siedman et al. suggest several hundred atom distances. Keil and co-workers found evidence for replacement collision sequences of 100 atom distances by stereo electron microscopy of interstitial agglomerates interjected by low energy heavy ion bombardment. Experiments by Kirk et al. and Becker and co-workers on ordered alloys, are only sensitive to dynamic crowdions. Kirk and co-workers result on the changes in magnetic properties of Ni 3 Mn induced by thermal neutron bombardment strongly support long range focusons (greater than 30 atom distances) whereas Wollenberger found no evidence for focusons with 1 and 3 MeV electron irradiation. Theoretical treatments of Liebfried suggest a maximum length of 30 atom distances whereas Holmes' modified treatment suggests less than 10 atom distances. (10 fig, 23 references)

  19. Channel plate for DNA sequencing

    Science.gov (United States)

    Douthart, Richard J.; Crowell, Shannon L.

    1998-01-01

    This invention is a channel plate that facilitates data compaction in DNA sequencing. The channel plate has a length, a width and a thickness, and further has a plurality of channels that are parallel. Each channel has a depth partially through the thickness of the channel plate. Additionally an interface edge permits electrical communication across an interface through a buffer to a deposition membrane surface.

  20. Entropic fluctuations in DNA sequences

    Science.gov (United States)

    Thanos, Dimitrios; Li, Wentian; Provata, Astero

    2018-03-01

    The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.

  1. Application of deep sequence technology in hepatology.

    Science.gov (United States)

    Ninomiya, Masashi; Ueno, Yoshiyuki; Shimosegawa, Tooru

    2014-02-01

    Deep sequencing technologies are currently cutting edge, and are opening fascinating opportunities in biomedicine, producing over 100-times more data compared to the conventional capillary sequencers based on the Sanger method. Next-generation sequencing (NGS) is now generally defined as the sequencing technology that, by employing parallel sequencing processes, producing thousands or millions of sequence reads simultaneously. Since the GS20 was released as the first NGS sequencer on the market by 454 Life Sciences, the competition in the development of the new sequencers has become intense. In this review, we describe the current deep sequencing systems and discuss the application of advanced technologies in the field of hepatology. © 2013 The Japan Society of Hepatology.

  2. Memory and learning with rapid audiovisual sequences

    Science.gov (United States)

    Keller, Arielle S.; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed. PMID:26575193

  3. Memory and learning with rapid audiovisual sequences.

    Science.gov (United States)

    Keller, Arielle S; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed.

  4. Dog Y chromosomal DNA sequence: identification, sequencing and SNP discovery

    OpenAIRE

    Natanaelsson, Christian; Oskarsson, Mattias CR; Angleby, Helen; Lundeberg, Joakim; Kirkness, Ewen; Savolainen, Peter

    2006-01-01

    Abstract Background Population genetic studies of dogs have so far mainly been based on analysis of mitochondrial DNA, describing only the history of female dogs. To get a picture of the male history, as well as a second independent marker, there is a need for studies of biallelic Y-chromosome polymorphisms. However, there are no biallelic polymorphisms reported, and only 3200 bp of non-repetitive dog Y-chromosome sequence deposited in GenBank, necessitating the identification of dog Y chromo...

  5. Targeted next-generation sequencing can replace Sanger sequencing in clinical diagnostics

    NARCIS (Netherlands)

    Sikkema-Raddatz, B.; Johansson, L.F.; de Boer, E.N.; Almomani, R.; Boven, L.G.; van den Berg, M.P.; van Spaendonck-Zwarts, K.Y.; van Tintelen, J.P.; Sijmons, R.H.; Jongbloed, J.D.H.; Sinke, R.J.

    Mutation detection through exome sequencing allows simultaneous analysis of all coding sequences of genes. However, it cannot yet replace Sanger sequencing (SS) in diagnostics because of incomplete representation and coverage of exons leading to missing clinically relevant mutations. Targeted

  6. Challenges to genome sequence dissection in sweetpotato

    Science.gov (United States)

    Isobe, Sachiko; Shirasawa, Kenta; Hirakawa, Hideki

    2017-01-01

    The development of next generation sequencing (NGS) technologies has enabled the determination of whole genome sequences in many non-model plant species. However, genome sequencing in sweetpotato (Ipomoea batatas (L.) Lam) is still difficult because of the hexaploid genome structure. Previous studies suggested that a diploid wild relative, I. trifida (H.B.K.) Don., is the most possible ancestor of sweetpotato. Therefore, the genetic and genomic features of I. trifida have been studied as a potential reference for sweetpotato. Meanwhile, several research groups have begun the challenging task of directly sequencing the sweetpotato genome. In this manuscript, we review the recent results and activities of large-scale genome and transcriptome analysis related to genome sequence dissection in sweetpotato under the sections as follows: I. trifida genome and transcript sequencing, genome sequences of I. nil (Japanese morning glory), transcript sequences in sweetpotato, chloroplast sequences, transposable elements and transfer DNA. The recent international activities of de novo whole genome sequencing in sweetpotato are also described. The large-scale publically available genome and transcript sequence resources and the international genome sequencing streams are expected to promote the genome sequence dissection in sweetpotato. PMID:28465666

  7. DNA Sequencing in Undergraduate Laboratory Courses.

    Science.gov (United States)

    Hamilton, Robert G.

    1997-01-01

    Discusses strategies to duplicate current research protocols using biochemical methods of analysis. Describes the use of the Silver Sequence kit that provides a technically simple and relatively inexpensive DNA sequencing exercise. (JRH)

  8. On Paranorm Zweier -Convergent Sequence Spaces

    Directory of Open Access Journals (Sweden)

    Vakeel A. Khan

    2013-01-01

    Full Text Available In this paper, we introduce the paranorm Zweier -convergent sequence spaces , , and , a sequence of positive real numbers. We study some topological properties, prove the decomposition theorem, and study some inclusion relations on these spaces.

  9. "First generation" automated DNA sequencing technology.

    Science.gov (United States)

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  10. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    BACKGROUND: The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. RESULTS: Assemblies...... of the BAC clone derived genome sequence have been annotated using the Pre-Ensembl and Ensembl automated pipelines and made accessible through the Pre-Ensembl/Ensembl browsers. The current annotated genome assembly (Sscrofa9) was released with Ensembl 56 in September 2009. A revised assembly (Sscrofa10......) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30x genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were...

  11. Repdigits in k-Lucas sequences

    Indian Academy of Sciences (India)

    Math. 57(2) 2000 243–254) proved that 11 is the largest number with only one distinct digit (the so-called repdigit) in the sequence (L. (2) n )n. In this paper, we address a similar problem in the family of k-Lucas sequences. We also show that the k-Lucas sequences have similar properties to those of k-Fibonacci sequences ...

  12. Mappings of Type Special Space of Sequences

    Directory of Open Access Journals (Sweden)

    Awad A. Bakery

    2016-01-01

    Full Text Available We give sufficient conditions on a special space of sequences defined by Mohamed and Bakery (2013 such that the finite rank operators are dense in the complete space of operators whose approximation numbers belong to this sequence space. Hence, under a few conditions, every compact operator would be approximated by finite rank operators. We apply it on the sequence space defined by Tripathy and Mahanta (2003. Our results match those known for p-absolutely summable sequences of reals.

  13. DNA sequencing technologies: 2006-2016.

    Science.gov (United States)

    Mardis, Elaine R

    2017-02-01

    Recent advances in the field of genomics have largely been due to the ability to sequence DNA at increasing throughput and decreasing cost. DNA sequencing was first introduced in 1977, and next-generation sequencing technologies have been available only during the past decade, but the diverse experiments and corresponding analyses facilitated by these techniques have transformed biological and biomedical research. Here, I review developments in DNA sequencing technologies over the past 10 years and look to the future for further applications.

  14. Information decomposition method to analyze symbolical sequences

    International Nuclear Information System (INIS)

    Korotkov, E.V.; Korotkova, M.A.; Kudryashov, N.A.

    2003-01-01

    The information decomposition (ID) method to analyze symbolical sequences is presented. This method allows us to reveal a latent periodicity of any symbolical sequence. The ID method is shown to have advantages in comparison with application of the Fourier transformation, the wavelet transform and the dynamic programming method to look for latent periodicity. Examples of the latent periods for poetic texts, DNA sequences and amino acids are presented. Possible origin of a latent periodicity for different symbolical sequences is discussed

  15. Movement sequencing in Huntington disease.

    Science.gov (United States)

    Georgiou-Karistianis, Nellie; Long, Jeffrey D; Lourens, Spencer G; Stout, Julie C; Mills, James A; Paulsen, Jane S

    2014-08-01

    To examine longitudinal changes in movement sequencing in prodromal Huntington's disease (HD) participants (795 prodromal HD; 225 controls) from the PREDICT-HD study. Prodromal HD participants were tested over seven annual visits and were stratified into three groups (low, medium, high) based on their CAG-Age Product (CAP) score, which indicates likely increasing proximity to diagnosis. A cued movement sequence task assessed the impact of advance cueing on response initiation and execution via three levels of advance information. Compared to controls, all CAP groups showed longer initiation and movement times across all conditions at baseline, demonstrating a disease gradient for the majority of outcomes. Across all conditions, the high CAP group had the highest mean for baseline testing, but also demonstrated an increase in movement time across the study. For initiation time, the high CAP group showed the highest mean baseline time across all conditions, but also faster decreasing rates of change over time. With progress to diagnosis, participants may increasingly use compensatory strategies, as evidenced by faster initiation. However, this occurred in conjunction with slowed execution times, suggesting a decline in effectively accessing control processes required to translate movement into effective execution.

  16. Parallel sequencing lives, or what makes large sequencing projects successful.

    Science.gov (United States)

    Quilez, Javier; Vidal, Enrique; Dily, François Le; Serra, François; Cuartero, Yasmina; Stadhouders, Ralph; Graf, Thomas; Marti-Renom, Marc A; Beato, Miguel; Filion, Guillaume

    2017-11-01

    T47D_rep2 and b1913e6c1_51720e9cf were 2 Hi-C samples. They were born and processed at the same time, yet their fates were very different. The life of b1913e6c1_51720e9cf was simple and fruitful, while that of T47D_rep2 was full of accidents and sorrow. At the heart of these differences lies the fact that b1913e6c1_51720e9cf was born under a lab culture of Documentation, Automation, Traceability, and Autonomy and compliance with the FAIR Principles. Their lives are a lesson for those who wish to embark on the journey of managing high-throughput sequencing data. © The Author 2017. Published by Oxford University Press.

  17. Blazar Sequence in Fermi Era Liang Chen

    Indian Academy of Sciences (India)

    Abstract. In this paper, we review the latest research results on the topic of blazar sequence. It seems that the blazar sequence is phenomenally ruled out, while the theoretical blazar sequence still holds. We point out that black hole mass is a dominated parameter accounting for high-power- high-synchrotron-peaked and ...

  18. Perspectives in Biochemistry: Methods for DNA Sequencing.

    Science.gov (United States)

    Wood, Anne T.

    1984-01-01

    Describes two frequently used DNA sequencing methods: Sander's enzymatic dideoxy method and Maxam and Gilbert's chemical sequencing method. Indicates that studying these methods provides students with knowledge of the chemical structure of DNA and how DNA sequence data are obtained. (JN)

  19. RNAome sequencing delineates the complete RNA landscape

    NARCIS (Netherlands)

    K.W.J. Derks (Kasper); J. Pothof (Joris)

    2015-01-01

    textabstractStandard RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species. For example, small and large RNAs from the same sample cannot be sequenced in a single sequence run. We designed RNAome sequencing, which is a

  20. Quasistationary sequences in Hilbert spaces | Muriuki | African ...

    African Journals Online (AJOL)

    In this paper the concept of covariance differences of a sequence is introduced and its relationship with the covariance function is established. The criteria of linear representability of sequences in Hilbert space are proved. The necessary and sufficient conditions for a linearly representable sequence to be quasistationary ...

  1. Sequencing nucleic acids: from chemistry to medicine.

    Science.gov (United States)

    Balasubramanian, Shankar

    2011-07-14

    Chemistry has played a vital role in making routine, affordable sequencing of human genomes a reality. This article focuses on the genesis and development of Solexa sequencing that originated in Cambridge, UK. This sequencing approach is helping transform science and offers intriguing prospects for the future of medicine.

  2. Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

    Science.gov (United States)

    Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

    2011-01-01

    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

  3. Exome sequencing: what clinicians need to know

    Directory of Open Access Journals (Sweden)

    Sastre L

    2014-03-01

    Full Text Available Leandro SastreInstituto de Investigaciones Biomédicas, CSIC/UAM, C/Arturo Duperier 4, Madrid, Spain; Terapias Experimentales y Biomarcadores en Cáncer, IdiPaz, Madrid, Spain; CIBER de Enfermedades Raras, CIBERER, Valencia, SpainAbstract: The recent development of high throughput methods of deoxyribonucleic acid (DNA sequencing has made it possible to determine individual genome sequences and their specific variations. A region of particular interest is the protein-coding part of the genome, or exome, which is composed of gene exons. The principles of exome purification and sequencing will be described in this review, as well as analyses of the data generated. Results will be discussed in terms of their possible functional and clinical significance. The advantages and limitations of exome sequencing will be compared to those of other massive sequencing approaches such as whole-genome sequencing, ribonucleic acid sequencing or selected DNA sequencing. Exome sequencing has been used recently in the study of various diseases. Monogenic diseases with Mendelian inheritance are among these, but studies have also been carried out on genetic variations that represent risk factors for complex diseases. Cancer is another intensive area for exome sequencing studies. Several examples of the use of exome sequencing in the diagnosis, prognosis, and treatment of these diseases will be described. Finally, remaining challenges and some practical and ethical considerations for the clinical application of exome sequencing will be discussed.Keywords: massively parallel sequencing, RNA sequencing, whole-genome sequencing, genetic variants, molecular diagnosis, pharmacogenomics, personalized medicine, NGS, SGS, SNP, SNV

  4. Chip-based sequencing nucleic acids

    Science.gov (United States)

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  5. Permutation Entropy for Random Binary Sequences

    Directory of Open Access Journals (Sweden)

    Lingfeng Liu

    2015-12-01

    Full Text Available In this paper, we generalize the permutation entropy (PE measure to binary sequences, which is based on Shannon’s entropy, and theoretically analyze this measure for random binary sequences. We deduce the theoretical value of PE for random binary sequences, which can be used to measure the randomness of binary sequences. We also reveal the relationship between this PE measure with other randomness measures, such as Shannon’s entropy and Lempel–Ziv complexity. The results show that PE is consistent with these two measures. Furthermore, we use PE as one of the randomness measures to evaluate the randomness of chaotic binary sequences.

  6. Integer sequence discovery from small graphs.

    Science.gov (United States)

    Hoppe, Travis; Petrone, Anna

    2016-03-11

    We have exhaustively enumerated all simple, connected graphs of a finite order and have computed a selection of invariants over this set. Integer sequences were constructed from these invariants and checked against the Online Encyclopedia of Integer Sequences (OEIS). 141 new sequences were added and six sequences were extended. From the graph database, we were able to programmatically suggest relationships among the invariants. It will be shown that we can readily visualize any sequence of graphs with a given criteria. The code has been released as an open-source framework for further analysis and the database was constructed to be extensible to invariants not considered in this work.

  7. Maximal zero sequences for Fock spaces

    OpenAIRE

    Zhu, Kehe

    2011-01-01

    A sequence $Z$ in the complex plane $\\C$ is called a zero sequence for the Fock space $F^p_\\alpha$ if there exists a function $f\\in F^p_\\alpha$, not identically zero, such that $Z$ is the zero set of $f$, counting multiplicities. We show that there exist zero sequences $Z$ for $F^p_\\alpha$ with the following properties: (1) For any $a\\in\\C$ the sequence $Z\\cup\\{a\\}$ is no longer a zero sequence for $F^p_\\alpha$; (2) the space $I_Z$ consisting of all functions in $F^p_\\alpha$ that vanish on $Z...

  8. Integer sequence discovery from small graphs

    Science.gov (United States)

    Hoppe, Travis; Petrone, Anna

    2015-01-01

    We have exhaustively enumerated all simple, connected graphs of a finite order and have computed a selection of invariants over this set. Integer sequences were constructed from these invariants and checked against the Online Encyclopedia of Integer Sequences (OEIS). 141 new sequences were added and six sequences were extended. From the graph database, we were able to programmatically suggest relationships among the invariants. It will be shown that we can readily visualize any sequence of graphs with a given criteria. The code has been released as an open-source framework for further analysis and the database was constructed to be extensible to invariants not considered in this work. PMID:27034526

  9. A Main Sequence for Quasars

    Directory of Open Access Journals (Sweden)

    Paola Marziani

    2018-03-01

    Full Text Available The last 25 years saw a major step forward in the analysis of optical and UV spectroscopic data of large quasar samples. Multivariate statistical approaches have led to the definition of systematic trends in observational properties that are the basis of physical and dynamical modeling of quasar structure. We discuss the empirical correlates of the so-called “main sequence” associated with the quasar Eigenvector 1, its governing physical parameters and several implications on our view of the quasar structure, as well as some luminosity effects associated with the virialized component of the line emitting regions. We also briefly discuss quasars in a segment of the main sequence that includes the strongest FeII emitters. These sources show a small dispersion around a well-defined Eddington ratio value, a property which makes them potential Eddington standard candles.

  10. The 2016 Kumamoto earthquake sequence.

    Science.gov (United States)

    Kato, Aitaro; Nakamura, Kouji; Hiyama, Yohei

    2016-01-01

    Beginning in April 2016, a series of shallow, moderate to large earthquakes with associated strong aftershocks struck the Kumamoto area of Kyushu, SW Japan. An M j 7.3 mainshock occurred on 16 April 2016, close to the epicenter of an M j 6.5 foreshock that occurred about 28 hours earlier. The intense seismicity released the accumulated elastic energy by right-lateral strike slip, mainly along two known, active faults. The mainshock rupture propagated along multiple fault segments with different geometries. The faulting style is reasonably consistent with regional deformation observed on geologic timescales and with the stress field estimated from seismic observations. One striking feature of this sequence is intense seismic activity, including a dynamically triggered earthquake in the Oita region. Following the mainshock rupture, postseismic deformation has been observed, as well as expansion of the seismicity front toward the southwest and northwest.

  11. The 2016 Kumamoto earthquake sequence

    Science.gov (United States)

    KATO, Aitaro; NAKAMURA, Kouji; HIYAMA, Yohei

    2016-01-01

    Beginning in April 2016, a series of shallow, moderate to large earthquakes with associated strong aftershocks struck the Kumamoto area of Kyushu, SW Japan. An Mj 7.3 mainshock occurred on 16 April 2016, close to the epicenter of an Mj 6.5 foreshock that occurred about 28 hours earlier. The intense seismicity released the accumulated elastic energy by right-lateral strike slip, mainly along two known, active faults. The mainshock rupture propagated along multiple fault segments with different geometries. The faulting style is reasonably consistent with regional deformation observed on geologic timescales and with the stress field estimated from seismic observations. One striking feature of this sequence is intense seismic activity, including a dynamically triggered earthquake in the Oita region. Following the mainshock rupture, postseismic deformation has been observed, as well as expansion of the seismicity front toward the southwest and northwest. PMID:27725474

  12. RANDNA: a random DNA sequence generator.

    Science.gov (United States)

    Piva, Francesco; Principato, Giovanni

    2006-01-01

    Monte Carlo simulations are useful to verify the significance of data. Genomic regularities, such as the nucleotide correlations or the not uniform distribution of the motifs throughout genomic or mature mRNA sequences, exist and their significance can be checked by means of the Monte Carlo test. The test needs good quality random sequences in order to work, moreover they should have the same nucleotide distribution as the sequences in which the regularities have been found. Random DNA sequences are also useful to estimate the background score of an alignment, that is a threshold below which the resulting score is merely due to chance. We have developed RANDNA, a free software which allows to produce random DNA or RNA sequences setting both their length and the percentage of nucleotide composition. Sequences having the same nucleotide distribution of exonic, intronic or intergenic sequences can be generated. Its graphic interface makes it possible to easily set the parameters that characterize the sequences being produced and saved in a text format file. The pseudo-random number generator function of Borland Delphi 6 is used, since it guarantees a good randomness, a long cycle length and a high speed. We have checked the quality of sequences generated by the software, by means of well-known tests, both by themselves and versus genuine random sequences. We show the good quality of the generated sequences. The software, complete with examples and documentation, is freely available to users from: http://www.introni.it/en/software.

  13. RNAome sequencing delineates the complete RNA landscape

    Directory of Open Access Journals (Sweden)

    Kasper W.J. Derks

    2015-09-01

    Full Text Available Standard RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species. For example, small and large RNAs from the same sample cannot be sequenced in a single sequence run. We designed RNAome sequencing, which is a strand-specific method to determine the expression of small and large RNAs from ribosomal RNA-depleted total RNA in a single sequence run. RNAome sequencing quantitatively preserves all RNA classes. This characteristic allows comparisons between RNA classes, thereby facilitating relationships between different RNA classes. Here, we describe in detail the experimental procedure associated with RNAome sequencing published by Derks and colleagues in RNA Biology (2015 [1]. We also provide the R code for the developed Total Rna Analysis Pipeline (TRAP, an algorithm to analyze RNAome sequencing datasets (deposited at the Gene Expression Omnibus data repository, accession number GSE48084.

  14. Children's Recall of Script-Based Event Sequences: The Effect of Sequencing.

    Science.gov (United States)

    Catellani, Patrizia

    1991-01-01

    Preschool and first grade children's recall of script-based event sequences was studied in four different instruction conditions. Differences in sequencing ability were observed in relation to age and sequence. Findings indicate that at both ages, the effort involved in sequencing aids semantic processing of the material and enhances recall. (SH)

  15. The RNA world, automatic sequences and oncogenetics

    International Nuclear Information System (INIS)

    Tahir Shah, K.

    1993-04-01

    We construct a model of the RNA world in terms of naturally evolving nucleotide sequences assuming only Crick-Watson base pairing and self-cleaving/splicing capability. These sequences have the following properties. 1) They are recognizable by an automation (or automata). That is, to each k-sequence, there exist a k-automation which accepts, recognizes or generates the k-sequence. These are known as automatic sequences. Fibonacci and Morse-Thue sequences are the most natural outcome of pre-biotic chemical conditions. 2) Infinite (resp. large) sequences are self-similar (resp. nearly self-similar) under certain rewrite rules and consequently give rise to fractal (resp.fractal-like) structures. Computationally, such sequences can also be generated by their corresponding deterministic parallel re-write system, known as a DOL system. The self-similar sequences are fixed points of their respective rewrite rules. Some of these automatic sequences have the capability that they can read or 'accept' other sequences while others can detect errors and trigger error-correcting mechanisms. They can be enlarged and have block and/or palindrome structure. Linear recurring sequences such as Fibonacci sequence are simply Feed-back Shift Registers, a well know model of information processing machines. We show that a mutation of any rewrite rule can cause a combinatorial explosion of error and relates this to oncogenetical behavior. On the other hand, a mutation of sequences that are not rewrite rules, leads to normal evolutionary change. Known experimental results support our hypothesis. (author). Refs

  16. Targeted assembly of short sequence reads.

    Directory of Open Access Journals (Sweden)

    René L Warren

    Full Text Available As next-generation sequence (NGS production continues to increase, analysis is becoming a significant bottleneck. However, in situations where information is required only for specific sequence variants, it is not necessary to assemble or align whole genome data sets in their entirety. Rather, NGS data sets can be mined for the presence of sequence variants of interest by localized assembly, which is a faster, easier, and more accurate approach. We present TASR, a streamlined assembler that interrogates very large NGS data sets for the presence of specific variants by only considering reads within the sequence space of input target sequences provided by the user. The NGS data set is searched for reads with an exact match to all possible short words within the target sequence, and these reads are then assembled stringently to generate a consensus of the target and flanking sequence. Typically, variants of a particular locus are provided as different target sequences, and the presence of the variant in the data set being interrogated is revealed by a successful assembly outcome. However, TASR can also be used to find unknown sequences that flank a given target. We demonstrate that TASR has utility in finding or confirming genomic mutations, polymorphisms, fusions and integration events. Targeted assembly is a powerful method for interrogating large data sets for the presence of sequence variants of interest. TASR is a fast, flexible and easy to use tool for targeted assembly.

  17. Deciphering the RNA landscape by RNAome sequencing.

    Science.gov (United States)

    Derks, Kasper W J; Misovic, Branislav; van den Hout, Mirjam C G N; Kockx, Christel E M; Gomez, Cesar Payan; Brouwer, Rutger W W; Vrieling, Harry; Hoeijmakers, Jan H J; van IJcken, Wilfred F J; Pothof, Joris

    2015-01-01

    Current RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species in an unperturbed manner. We report strand-specific RNAome sequencing that determines expression of small and large RNAs from rRNA-depleted total RNA in a single sequence run. Since current analysis pipelines cannot reliably analyze small and large RNAs simultaneously, we developed TRAP, Total Rna Analysis Pipeline, a robust interface that is also compatible with existing RNA sequencing protocols. RNAome sequencing quantitatively preserved all RNA classes, allowing cross-class comparisons that facilitates the identification of relationships between different RNA classes. We demonstrate the strength of RNAome sequencing in mouse embryonic stem cells treated with cisplatin. MicroRNA and mRNA expression in RNAome sequencing significantly correlated between replicates and was in concordance with both existing RNA sequencing methods and gene expression arrays generated from the same samples. Moreover, RNAome sequencing also detected additional RNA classes such as enhancer RNAs, anti-sense RNAs, novel RNA species and numerous differentially expressed RNAs undetectable by other methods. At the level of complete RNA classes, RNAome sequencing also identified a specific global repression of the microRNA and microRNA isoform classes after cisplatin treatment whereas all other classes such as mRNAs were unchanged. These characteristics of RNAome sequencing will significantly improve expression analysis as well as studies on RNA biology not covered by existing methods.

  18. Effects of an Additional Sequence of Color Stimuli on Visuomotor Sequence Learning

    Directory of Open Access Journals (Sweden)

    Kanji Tanaka

    2017-06-01

    Full Text Available Through practice, people are able to integrate a secondary sequence (e.g., a stimulus-based sequence into a primary sequence (e.g., a response-based sequence, but it is still controversial whether the integrated sequences lead to better learning than only the primary sequence. In the present study, we aimed to investigate the effects of a sequence that integrated space and color sequences on early and late learning phases (corresponding to effector-independent and effector-dependent learning, respectively and how the effects differed in the integrated and primary sequences in each learning phase. In the task, the participants were required to learn a sequence of button presses using trial-and-error and to perform the sequence successfully for 20 trials (m × n task. First, in the baseline task, all participants learned a non-colored sequence, in which the response button always turned red. Then, in the learning task, the participants were assigned to two groups: a colored sequence group (i.e., space and color or a non-colored sequence group (i.e., space. In the colored sequence, the response button turned a pre-determined color and the participants were instructed to attend to the sequences of both location and color as much as they could. The results showed that the participants who performed the colored sequence acquired the correct button presses of the sequence earlier, but showed a slower mean performance time than those who performed the non-colored sequence. Moreover, the slower performance time in the colored sequence group remained in a subsequent transfer task in which the spatial configurations of the buttons were vertically mirrored from the learning task. These results indicated that if participants explicitly attended to both the spatial response sequence and color stimulus sequence at the same time, they could develop their spatial representations of the sequence earlier (i.e., early development of the effector

  19. Pareto optimal pairwise sequence alignment.

    Science.gov (United States)

    DeRonne, Kevin W; Karypis, George

    2013-01-01

    Sequence alignment using evolutionary profiles is a commonly employed tool when investigating a protein. Many profile-profile scoring functions have been developed for use in such alignments, but there has not yet been a comprehensive study of Pareto optimal pairwise alignments for combining multiple such functions. We show that the problem of generating Pareto optimal pairwise alignments has an optimal substructure property, and develop an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. All possible sets of two, three, and four profile scoring functions are used from a pool of 11 functions and applied to 588 pairs of proteins in the ce_ref data set. The performance of the best objective combinations on ce_ref is also evaluated on an independent set of 913 protein pairs extracted from the BAliBASE RV11 data set. Our dynamic-programming-based heuristic approach produces approximated Pareto optimal frontiers of pairwise alignments that contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments whose quality is better than the alignments obtained by single objectives. However, the task of identifying a single high-quality alignment among those in the Pareto frontier remains challenging.

  20. On the Origin of Sequence

    Directory of Open Access Journals (Sweden)

    Peter T. S. van der Gulik

    2015-11-01

    Full Text Available Three aspects which make planet Earth special, and which must be taken in consideration with respect to the emergence of peptides, are the mineralogical composition, the Moon which is in the same size class, and the triple environment consisting of ocean, atmosphere, and continent. GlyGly is a remarkable peptide because it stimulates peptide bond formation in the Salt-Induced Peptide Formation reaction. The role glycine and aspartic acid play in the active site of RNA polymerase is remarkable too. GlyGly might have been the original product of coded peptide synthesis because of its importance in stimulating the production of oligopeptides with a high aspartic acid content, which protected small RNA molecules by binding Mg2+ ions. The feedback loop, which is closed by having RNA molecules producing GlyGly, is proposed as the essential element fundamental to life. Having this system running, longer sequences could evolve, gradually solving the problem of error catastrophe. The basic structure of the standard genetic code (8 fourfold degenerate codon boxes and 8 split codon boxes is an example of the way information concerning the emergence of life is frozen in the biological constitution of organisms: the structure of the code contains historical information.

  1. Hierarchically nested river landform sequences

    Science.gov (United States)

    Pasternack, G. B.; Weber, M. D.; Brown, R. A.; Baig, D.

    2017-12-01

    River corridors exhibit landforms nested within landforms repeatedly down spatial scales. In this study we developed, tested, and implemented a new way to create river classifications by mapping domains of fluvial processes with respect to the hierarchical organization of topographic complexity that drives fluvial dynamism. We tested this approach on flow convergence routing, a morphodynamic mechanism with different states depending on the structure of nondimensional topographic variability. Five nondimensional landform types with unique functionality (nozzle, wide bar, normal channel, constricted pool, and oversized) represent this process at any flow. When this typology is nested at base flow, bankfull, and floodprone scales it creates a system with up to 125 functional types. This shows how a single mechanism produces complex dynamism via nesting. Given the classification, we answered nine specific scientific questions to investigate the abundance, sequencing, and hierarchical nesting of these new landform types using a 35-km gravel/cobble river segment of the Yuba River in California. The nested structure of flow convergence routing landforms found in this study revealed that bankfull landforms are nested within specific floodprone valley landform types, and these types control bankfull morphodynamics during moderate to large floods. As a result, this study calls into question the prevailing theory that the bankfull channel of a gravel/cobble river is controlled by in-channel, bankfull, and/or small flood flows. Such flows are too small to initiate widespread sediment transport in a gravel/cobble river with topographic complexity.

  2. Water buffalo kappa-casein gene sequence

    Directory of Open Access Journals (Sweden)

    A. Mancusi

    2010-02-01

    Full Text Available The aim of the present work was to determine the nucleotide sequence of the water buffalo CSN3 gene (κ-casein. Two overlapping clones from a genomic water buffalo library were sequenced. The sequence comprises the five exons, the relative introns, 1057 nt at the 5’ end of the gene and 476 nt downstream the polyadenylation site. In order to identify polymorphisms responsible for amino acid differences, all the five exons from 10 genetically unrelated water buffaloes were sequenced. The comparison of the obtained sequences confirmed the two single nucleotide polymorphisms already reported in literature at the fourth exon: T versus C at codon 135 (IleATC versus ThrACC and the silent mutation T versus C at codon 136. The comparison of the promoter sequences of two animals homozygous for 135Thr and 135Ile respectively, evidenced 3 single nucleotide polymorphisms that could alter the expression of the gene.

  3. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  4. Multiple tag labeling method for DNA sequencing

    Science.gov (United States)

    Mathies, Richard A.; Huang, Xiaohua C.; Quesada, Mark A.

    1995-01-01

    A DNA sequencing method described which uses single lane or channel electrophoresis. Sequencing fragments are separated in said lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radio-isotope labels.

  5. EGNAS: an exhaustive DNA sequence design algorithm

    Directory of Open Access Journals (Sweden)

    Kick Alfred

    2012-06-01

    Full Text Available Abstract Background The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of sequences with defined properties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences offers the possibility of controlling both interstrand and intrastrand properties. The guanine-cytosine content can be adjusted. Sequences can be forced to start and end with guanine or cytosine. This option reduces the risk of “fraying” of DNA strands. It is possible to limit cross hybridizations of a defined length, and to adjust the uniqueness of sequences. Self-complementarity and hairpin structures of certain length can be avoided. Sequences and subsequences can optionally be forbidden. Furthermore, sequences can be designed to have minimum interactions with predefined strands and neighboring sequences. Results The algorithm is realized in a C++ program. TAG sequences can be generated and combined with primers for single-base extension reactions, which were described for multiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldback through intrastrand interaction of TAG-primer pairs can be limited. The design of sequences for specific attachment of molecular constructs to DNA origami is presented. Conclusions We developed a new software tool called EGNAS for the design of unique nucleic acid sequences. The presented exhaustive algorithm allows to generate greater sets of sequences than with previous software and equal constraints. EGNAS is freely available for noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS.

  6. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    Science.gov (United States)

    de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084

  7. Repetitive sequence environment distinguishes housekeeping genes.

    Science.gov (United States)

    Eller, C Daniel; Regelson, Moira; Merriman, Barry; Nelson, Stan; Horvath, Steve; Marahrens, York

    2007-04-01

    Housekeeping genes are expressed across a wide variety of tissues. Since repetitive sequences have been reported to influence the expression of individual genes, we employed a novel approach to determine whether housekeeping genes can be distinguished from tissue-specific genes by their repetitive sequence context. We show that Alu elements are more highly concentrated around housekeeping genes while various longer (>400-bp) repetitive sequences ("repeats"), including Long Interspersed Nuclear Element-1 (LINE-1) elements, are excluded from these regions. We further show that isochore membership does not distinguish housekeeping genes from tissue-specific genes and that repetitive sequence environment distinguishes housekeeping genes from tissue-specific genes in every isochore. The distinct repetitive sequence environment, in combination with other previously published sequence properties of housekeeping genes, was used to develop a method of predicting housekeeping genes on the basis of DNA sequence alone. Using expression across tissue types as a measure of success, we demonstrate that repetitive sequence environment is by far the most important sequence feature identified to date for distinguishing housekeeping genes.

  8. Robustness analysis of chiller sequencing control

    International Nuclear Information System (INIS)

    Liao, Yundan; Sun, Yongjun; Huang, Gongsheng

    2015-01-01

    Highlights: • Uncertainties with chiller sequencing control were systematically quantified. • Robustness of chiller sequencing control was systematically analyzed. • Different sequencing control strategies were sensitive to different uncertainties. • A numerical method was developed for easy selection of chiller sequencing control. - Abstract: Multiple-chiller plant is commonly employed in the heating, ventilating and air-conditioning system to increase operational feasibility and energy-efficiency under part load condition. In a multiple-chiller plant, chiller sequencing control plays a key role in achieving overall energy efficiency while not sacrifices the cooling sufficiency for indoor thermal comfort. Various sequencing control strategies have been developed and implemented in practice. Based on the observation that (i) uncertainty, which cannot be avoided in chiller sequencing control, has a significant impact on the control performance and may cause the control fail to achieve the expected control and/or energy performance; and (ii) in current literature few studies have systematically addressed this issue, this paper therefore presents a study on robustness analysis of chiller sequencing control in order to understand the robustness of various chiller sequencing control strategies under different types of uncertainty. Based on the robustness analysis, a simple and applicable method is developed to select the most robust control strategy for a given chiller plant in the presence of uncertainties, which will be verified using case studies

  9. Nonspatial Sequence Coding in CA1 Neurons.

    Science.gov (United States)

    Allen, Timothy A; Salz, Daniel M; McKenzie, Sam; Fortin, Norbert J

    2016-02-03

    The hippocampus is critical to the memory for sequences of events, a defining feature of episodic memory. However, the fundamental neuronal mechanisms underlying this capacity remain elusive. While considerable research indicates hippocampal neurons can represent sequences of locations, direct evidence of coding for the memory of sequential relationships among nonspatial events remains lacking. To address this important issue, we recorded neural activity in CA1 as rats performed a hippocampus-dependent sequence-memory task. Briefly, the task involves the presentation of repeated sequences of odors at a single port and requires rats to identify each item as "in sequence" or "out of sequence". We report that, while the animals' location and behavior remained constant, hippocampal activity differed depending on the temporal context of items-in this case, whether they were presented in or out of sequence. Some neurons showed this effect across items or sequence positions (general sequence cells), while others exhibited selectivity for specific conjunctions of item and sequence position information (conjunctive sequence cells) or for specific probe types (probe-specific sequence cells). We also found that the temporal context of individual trials could be accurately decoded from the activity of neuronal ensembles, that sequence coding at the single-cell and ensemble level was linked to sequence memory performance, and that slow-gamma oscillations (20-40 Hz) were more strongly modulated by temporal context and performance than theta oscillations (4-12 Hz). These findings provide compelling evidence that sequence coding extends beyond the domain of spatial trajectories and is thus a fundamental function of the hippocampus. The ability to remember the order of life events depends on the hippocampus, but the underlying neural mechanisms remain poorly understood. Here we addressed this issue by recording neural activity in hippocampal region CA1 while rats performed a

  10. Multiplexed microsatellite recovery using massively parallel sequencing

    Science.gov (United States)

    Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.

    2011-01-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).

  11. Hardware Accelerated Sequence Alignment with Traceback

    Directory of Open Access Journals (Sweden)

    Scott Lloyd

    2009-01-01

    in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations. With 256 processing elements in FPGA technology, a performance gain over 300 times that of a desktop computer is demonstrated on sequence lengths of 16000. For greater performance, the architecture is scalable to more processing elements.

  12. Massively parallel sequencing of forensic STRs

    DEFF Research Database (Denmark)

    Parson, Walther; Ballard, David; Budowle, Bruce

    2016-01-01

    The DNA Commission of the International Society for Forensic Genetics (ISFG) is reviewing factors that need to be considered ahead of the adoption by the forensic community of short tandem repeat (STR) genotyping by massively parallel sequencing (MPS) technologies. MPS produces sequence data...... accessible genome assembly, and in place before the uptake of MPS by the general forensic community starts to generate sequence data on a large scale. While the established nomenclature for CE-based STR analysis will remain unchanged in the future, the nomenclature of sequence-based STR genotypes will need...

  13. Genomic sequencing of Pleistocene cave bears

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  14. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  15. Static multiplicities in heterogeneous azeotropic distillation sequences

    DEFF Research Database (Denmark)

    Esbjerg, Klavs; Andersen, Torben Ravn; Jørgensen, Sten Bay

    1998-01-01

    different static behavior. The method of Petlyuk and Avet'yan (1971), Bekiaris et al. (1993), which assumes infinite reflux and infinite number of stages, is extended to and applied on heterogeneous azeotropic distillation sequences. The predictions are substantiated through simulations. The static sequence...

  16. On peculiar Šindel sequences

    Czech Academy of Sciences Publication Activity Database

    Křížek, Michal; Somer, L.

    2010-01-01

    Roč. 17, č. 2 (2010), s. 129-140 ISSN 0972-5555 R&D Projects: GA AV ČR(CZ) IAA100190803 Institutional research plan: CEZ:AV0Z10190503 Keywords : quadratic residue * Chinese remainder theorem * primitive Šindel sequences * Prague clock sequence Subject RIV: BA - General Mathematics http://www.pphmj.com/abstract/5095.htm

  17. Comparative studies on sequence characteristics around translation ...

    Indian Academy of Sciences (India)

    Unknown

    To minimise sampling errors, the redundant sequences were excluded, as were sequences: (1) that had incorrect initia- tion and termination codons, and (2) in which .... In human genes, the preference for the optimal nucleo- tide of the mammalian translation initiation AUG context. (GCCGCC(A/G)CCAUGG) was generally ...

  18. Project Report: Automatic Sequence Processor Software Analysis

    Science.gov (United States)

    Benjamin, Brandon

    2011-01-01

    The Mission Planning and Sequencing (MPS) element of Multi-Mission Ground System and Services (MGSS) provides space missions with multi-purpose software to plan spacecraft activities, sequence spacecraft commands, and then integrate these products and execute them on spacecraft. Jet Propulsion Laboratory (JPL) is currently is flying many missions. The processes for building, integrating, and testing the multi-mission uplink software need to be improved to meet the needs of the missions and the operations teams that command the spacecraft. The Multi-Mission Sequencing Team is responsible for collecting and processing the observations, experiments and engineering activities that are to be performed on a selected spacecraft. The collection of these activities is called a sequence and ultimately a sequence becomes a sequence of spacecraft commands. The operations teams check the sequence to make sure that no constraints are violated. The workflow process involves sending a program start command, which activates the Automatic Sequence Processor (ASP). The ASP is currently a file-based system that is comprised of scripts written in perl, c-shell and awk. Once this start process is complete, the system checks for errors and aborts if there are any; otherwise the system converts the commands to binary, and then sends the resultant information to be radiated to the spacecraft.

  19. Towards a reference pecan genome sequence

    Science.gov (United States)

    The cost of generating DNA sequence data has declined dramatically over the previous 15 years as a result of the Human Genome Project and the potential applications of genome sequencing for human medicine. This cost reduction has generated renewed interest among crop breeding scientists in applying...

  20. Learning of Sensory Sequences in Cerebellar Patients

    Science.gov (United States)

    Frings, Markus; Boenisch, Raoul; Gerwig, Marcus; Diener, Hans-Christoph; Timmann, Dagmar

    2004-01-01

    A possible role of the cerebellum in detecting and recognizing event sequences has been proposed. The present study sought to determine whether patients with cerebellar lesions are impaired in the acquisition and discrimination of sequences of sensory stimuli of different modalities. A group of 26 cerebellar patients and 26 controls matched for…

  1. Early Permian transgressive-regressive cycles: sequence ...

    Indian Academy of Sciences (India)

    4

    Sequence stratigraphy of Permian Barakar Formation. 9 accumulated during further sea level rise. This led to additional accommodation space, but the rate of sediment supply by the river overpaced the rate of sea level rise, resulting in a progradational and aggradational sequence identified as tidally-influenced Lowstand ...

  2. On generalized difference Hahn sequence spaces.

    Science.gov (United States)

    Raj, Kuldip; Kiliçman, Adem

    2014-01-01

    We construct some generalized difference Hahn sequence spaces by mean of sequence of modulus functions. The topological properties and some inclusion relations of spaces h p ((F, u, Δ(r)) are investigated. Also we compute the dual of these spaces, and some matrix transformations are characterized.

  3. Nanopore sequencing detects structural variants in cancer.

    Science.gov (United States)

    Norris, Alexis L; Workman, Rachael E; Fan, Yunfan; Eshleman, James R; Timp, Winston

    2016-01-01

    Despite advances in sequencing, structural variants (SVs) remain difficult to reliably detect due to the short read length (nanopore sequencing on the MinION. Nanopore sequencing relies on a similar concept to a Coulter counter, reading the DNA sequence from the change in electrical current resulting from a DNA strand being forced through a nanometer-sized pore embedded in a membrane. Though nanopore sequencing currently has a relatively high mismatch rate that precludes base substitution and small frameshift mutation detection, its accuracy is sufficient for SV detection because of its long reads. In fact, long reads in some cases may improve SV detection efficiency. We have tested nanopore sequencing to detect a series of well-characterized SVs, including large deletions, inversions, and translocations that inactivate the CDKN2A/p16 and SMAD4/DPC4 tumor suppressor genes in pancreatic cancer. Using PCR amplicon mixes, we have demonstrated that nanopore sequencing can detect large deletions, translocations and inversions at dilutions as low as 1:100, with as few as 500 reads per sample. Given the speed, small footprint, and low capital cost, nanopore sequencing could become the ideal tool for the low-level detection of cancer-associated SVs needed for molecular relapse, early detection, or therapeutic monitoring.

  4. Sequencing Events: Exploring Art and Art Jobs.

    Science.gov (United States)

    Stephens, Pamela Geiger; Shaddix, Robin K.

    2000-01-01

    Presents an activity for upper-elementary students that correlates the actions of archaeologists, patrons, and artists with the sequencing of events in a logical order. Features ancient Egyptian art images. Discusses the preparation of materials, motivation, a pre-writing activity, and writing a story in sequence. (CMK)

  5. Thread extraction for polyadic instruction sequences

    NARCIS (Netherlands)

    Bergstra, J.; Middelburg, C.

    2011-01-01

    In this paper, we study the phenomenon that instruction sequences are split into fragments which somehow produce a joint behaviour. In order to bring this phenomenon better into the picture, we formalize a simple mechanism by which several instruction sequence fragments can produce a joint

  6. Enhanced throughput for infrared automated DNA sequencing

    Science.gov (United States)

    Middendorf, Lyle R.; Gartside, Bill O.; Humphrey, Pat G.; Roemer, Stephen C.; Sorensen, David R.; Steffens, David L.; Sutter, Scott L.

    1995-04-01

    Several enhancements have been developed and applied to infrared automated DNA sequencing resulting in significantly higher throughput. A 41 cm sequencing gel (31 cm well- to-read distance) combines high resolution of DNA sequencing fragments with optimized run times yielding two runs per day of 500 bases per sample. A 66 cm sequencing gel (56 cm well-to-read distance) produces sequence read lengths of up to 1000 bases for ds and ss templates using either T7 polymerase or cycle-sequencing protocols. Using a multichannel syringe to load 64 lanes allows 16 samples (compatible with 96-well format) to be visualized for each run. The 41 cm gel configuration allows 16,000 bases per day (16 samples X 500 bases/sample X 2 ten hour runs/day) to be sequenced with the advantages of infrared technology. Enhancements to internal labeling techniques using an infrared-labeled dATP molecule (Boehringer Mannheim GmbH, Penzberg, Germany; Sequenase (U.S. Biochemical) have also been made. The inclusion of glycerol in the sequencing reactions yields greatly improved results for some primer and template combinations. The inclusion of (alpha) -Thio-dNTP's in the labeling reaction increases signal intensity two- to three-fold.

  7. Stochastic modelling of daily rainfall sequences

    NARCIS (Netherlands)

    Buishand, T.A.

    1977-01-01

    Rainfall series of different climatic regions were analysed with the aim of generating daily rainfall sequences. A survey of the data is given in I, 1. When analysing daily rainfall sequences one must be aware of the following points:
    a. Seasonality. Because of seasonal variation

  8. Novel algorithms for protein sequence analysis

    NARCIS (Netherlands)

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  9. Novel bioinformatic developments for exome sequencing

    NARCIS (Netherlands)

    Lelieveld, S.H.; Veltman, J.A.; Gilissen, C.F.

    2016-01-01

    With the widespread adoption of next generation sequencing technologies by the genetics community and the rapid decrease in costs per base, exome sequencing has become a standard within the repertoire of genetic experiments for both research and diagnostics. Although bioinformatics now offers

  10. Sequence of PSE-2 beta-lactamase.

    OpenAIRE

    Huovinen, P; Huovinen, S; Jacoby, G A

    1988-01-01

    The nucleotide sequence of PSE-2 beta-lactamase, an enzyme that readily hydrolyzes both carbenicillin and oxacillin, has been determined. The deduced sequence of 266 amino acids contained 93 residues identical to those of OXA-2 beta-lactamase and the Ser-Thr-Phe-Lys tetrad also found in the active site of TEM-1 beta-lactamase.

  11. Wijsman Orlicz Asymptotically Ideal -Statistical Equivalent Sequences

    Directory of Open Access Journals (Sweden)

    Bipan Hazarika

    2013-01-01

    in Wijsman sense and present some definitions which are the natural combination of the definition of asymptotic equivalence, statistical equivalent, -statistical equivalent sequences in Wijsman sense. Finally, we introduce the notion of Cesaro Orlicz asymptotically -equivalent sequences in Wijsman sense and establish their relationship with other classes.

  12. Some Algebraic Aspects of Morse Code Sequences

    OpenAIRE

    Cigler, Johann

    2003-01-01

    International audience; Morse code sequences are very useful to give combinatorial interpretations of various properties of Fibonacci numbers. In this note we study some algebraic and combinatorial aspects of Morse code sequences and obtain several q-analogues of Fibonacci numbers and Fibonacci polynomials and their generalizations.

  13. Sequencing Closterium moniliferum : Future prospects in nuclear ...

    African Journals Online (AJOL)

    Through the recent advancements in sequencing studies, now the researchers are aiming to use its power in non conventional areas. Here we have discussed on the importance of sequencing the Closterium moniliferum genome which will prove to be a future endeavour in nuclear cleanup and radioactive waste disposal.

  14. Trace maps for arbitrary substitution sequences

    International Nuclear Information System (INIS)

    Avishai, Y.

    1993-01-01

    The discovery of quasi-crystals and their 1-dimensional modeling have led to a deep mathematical study of Schroedinger operators with an arbitrary deterministic potential sequence. In this work we address this problem and find trace maps for an arbitrary substitution sequence. our trace maps have lower dimensionality than those of Kolar and Nori, which make them quite attractive for actual applications. (authors)

  15. (SSR) and inter simple sequence repeat (ISSR)

    African Journals Online (AJOL)

    Finally, they were washed 3 to 4 times with sterile distilled water and inoculated aseptically on Murashige and Skoog (MS) basal medium free hormones. Single nodes resulted from seedlings cultured as explants. Inter simple sequence repeat (ISSR) and simple sequence repeat (SSR) primers used produced different ...

  16. Clustering metagenomic sequences with interpolated Markov models

    Directory of Open Access Journals (Sweden)

    Kelley David R

    2010-11-01

    Full Text Available Abstract Background Sequencing of environmental DNA (often called metagenomics has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects. Results We present SCIMM (Sequence Clustering with Interpolated Markov Models, an unsupervised sequence clustering method. SCIMM achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHYSCIMM that performs better when evolutionarily close training genomes are available. Conclusions SCIMM and PHYSCIMM are highly accurate methods to cluster metagenomic sequences. SCIMM operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. PHYSCIMM uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. SCIMM and PHYSCIMM are available open source from http://www.cbcb.umd.edu/software/scimm.

  17. Monitoring method call sequences using annotations

    NARCIS (Netherlands)

    B. Nobakht (Behrooz); F.S. de Boer (Frank); M.M. Bonsangue (Marcello); C.P.T. de Gouw (Stijn); M.M. Jaghouri (MohammadMahdi)

    2014-01-01

    htmlabstractIn this paper we introduce JMSeq, a Java-based tool for monitoring sequences of method calls. JMSeq provides a simple but expressive language to specify the observables of a Java program in terms of sequences of possibly nested method calls. Similar to many monitoring-oriented

  18. Sequence Comparison: Close and Open problems

    NARCIS (Netherlands)

    Lenzini, Gabriele; Cerrai, P.; Freguglia, P.

    Comparing sequences is a very important activity both in computer science and in a many other areas as well. For example thank to text editors, everyone knows the particular instance of a sequence comparison problem knonw as ``string mathcing problem''. It consists in searching a given work

  19. Wolbachia Sequence Typing in Butterflies Using Pyrosequencing.

    Science.gov (United States)

    Choi, Sungmi; Shin, Su-Kyoung; Jeong, Gilsang; Yi, Hana

    2015-09-01

    Wolbachia is an obligate symbiotic bacteria that is ubiquitous in arthropods, with 25-70% of insect species estimated to be infected. Wolbachia species can interact with their insect hosts in a mutualistic or parasitic manner. Sequence types (ST) of Wolbachia are determined by multilocus sequence typing (MLST) of housekeeping genes. However, there are some limitations to MLST with respect to the generation of clone libraries and the Sanger sequencing method when a host is infected with multiple STs of Wolbachia. To assess the feasibility of massive parallel sequencing, also known as next-generation sequencing, we used pyrosequencing for sequence typing of Wolbachia in butterflies. We collected three species of butterflies (Eurema hecabe, Eurema laeta, and Tongeia fischeri) common to Korea and screened them for Wolbachia STs. We found that T. fischeri was infected with a single ST of Wolbachia, ST41. In contrast, E. hecabe and E. laeta were each infected with two STs of Wolbachia, ST41 and ST40. Our results clearly demonstrate that pyrosequencing-based MLST has a higher sensitivity than cloning and Sanger sequencing methods for the detection of minor alleles. Considering the high prevalence of infection with multiple Wolbachia STs, next-generation sequencing with improved analysis would assist with scaling up approaches to Wolbachia MLST.

  20. From Sequence to Morphology - Long-Range Correlations in Complete Sequenced Genomes

    NARCIS (Netherlands)

    T.A. Knoch (Tobias)

    2004-01-01

    textabstractThe largely unresolved sequential organization, i.e. the relations within DNA sequences, and its connection to the three-dimensional organization of genomes was investigated by correlation analyses of completely sequenced chromosomes from Viroids, Archaea, Bacteria, Arabidopsis

  1. Swab-to-Sequence: Real-time Data Analysis Platform for the Biomolecule Sequencer

    Data.gov (United States)

    National Aeronautics and Space Administration — DNA was successfully sequenced on the ISS in 2016, but the DNA sequenced was prepared on the ground. With FY’16 IRAD funds, the same team developed a...

  2. Snake Genome Sequencing: Results and Future Prospects

    Directory of Open Access Journals (Sweden)

    Harald M. I. Kerkkamp

    2016-12-01

    Full Text Available Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.

  3. Snake Genome Sequencing: Results and Future Prospects.

    Science.gov (United States)

    Kerkkamp, Harald M I; Kini, R Manjunatha; Pospelov, Alexey S; Vonk, Freek J; Henkel, Christiaan V; Richardson, Michael K

    2016-12-01

    Snake genome sequencing is in its infancy-very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.

  4. Sequencing and comparing whole mitochondrial genomes ofanimals

    Energy Technology Data Exchange (ETDEWEB)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  5. Nuclear DNA sequences from late Pleistocene megafauna.

    Science.gov (United States)

    Greenwood, A D; Capelli, C; Possnert, G; Pääbo, S

    1999-11-01

    We report the retrieval and characterization of multi- and single-copy nuclear DNA sequences from Alaskan and Siberian mammoths (Mammuthus primigenius). In addition, a nuclear copy of a mitochondrial gene was recovered. Furthermore, a 13,000-year-old ground sloth and a 33,000-year-old cave bear yielded multicopy nuclear DNA sequences. Thus, multicopy and single-copy genes can be analyzed from Pleistocene faunal remains. The results also show that under some circumstances, nucleotide sequence differences between alleles found within one individual can be distinguished from DNA sequence variation caused by postmortem DNA damage. The nuclear sequences retrieved from the mammoths suggest that mammoths were more similar to Asian elephants than to African elephants.

  6. Whole-genome sequencing of veterinary pathogens

    DEFF Research Database (Denmark)

    Ronco, Troels

    using whole-genome sequencing. The results showed that NELoc-1 and -3 and the two virulence genes netB and cnaA were significantly more associated with NE isolates from chickens compared to NE isolates from turkeys. Only NELoc-2 was associated with NE isolates from both turkeys and chickens. A putative......-electrophoresis and single-locus sequencing has been widely used to characterize such types of veterinary pathogens. However, DNA sequencing techniques have become fast and cost effective in recent years and whole-genome sequencing data provide a much higher discriminative power and reproducibility than any...... of the traditional molecular techniques. In this PhD project three important veterinary pathogens (Clostridium perfringens, Escherichia coli and Staphylococcus aureus) were investigated using whole-genome sequencing. This was done in five different scientific papers which all have been published. Paper I and II...

  7. [Sequence learning in major depressive disorder].

    Science.gov (United States)

    Borbély-Ipkovich, Emöke; Németh, Dezsö; Janacsek, Karolina; Gonda, Xénia

    2014-01-01

    Major Depressive Disorder (MDD) is one of the most common psychiatric diagnoses, accompanied by several psychological, behavioural and emotional symptoms, and in addition to the symptoms affecting the quality of life, it can lead to severe consequences, including suicide. Sequence learning plays a key role in adapting to the environment, neural plasticity, first language acquisition, social learning and skills, at the same time it defines the behaviour of the patient and also therapeutic possibilities. The aim of this paper is to review sequence learning and its consolidation in MDD. We know little about the effects of mood disorders on sequence learning; the results are contradictory, therefore, further studies are needed to test the effects of MDD on sequence learning and on the consolidation of implicitly acquired sequence knowledge.

  8. Finding Common Sequence and Structure Motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, Laurie J.; Stormo, Gary D.

    1997-01-01

    We present a computational scheme to search for the most common motif, composed of a combination of sequence and structure constraints, among a collection of RNA sequences. The method uses a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences......, and comparisons with other approaches, are provided. The solutions include finding consensus structure identical to published ones....

  9. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    DEFF Research Database (Denmark)

    de Souza, S J; Camargo, A A; Briones, M R

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central ...

  10. Comparison of metagenomic samples using sequence signatures

    Directory of Open Access Journals (Sweden)

    Jiang Bai

    2012-12-01

    Full Text Available Abstract Background Sequence signatures, as defined by the frequencies of k-tuples (or k-mers, k-grams, have been used extensively to compare genomic sequences of individual organisms, to identify cis-regulatory modules, and to study the evolution of regulatory sequences. Recently many next-generation sequencing (NGS read data sets of metagenomic samples from a variety of different environments have been generated. The assembly of these reads can be difficult and analysis methods based on mapping reads to genes or pathways are also restricted by the availability and completeness of existing databases. Sequence-signature-based methods, however, do not need the complete genomes or existing databases and thus, can potentially be very useful for the comparison of metagenomic samples using NGS read data. Still, the applications of sequence signature methods for the comparison of metagenomic samples have not been well studied. Results We studied several dissimilarity measures, including d2, d2* and d2S recently developed from our group, a measure (hereinafter noted as Hao used in CVTree developed from Hao’s group (Qi et al., 2004, measures based on relative di-, tri-, and tetra-nucleotide frequencies as in Willner et al. (2009, as well as standard lp measures between the frequency vectors, for the comparison of metagenomic samples using sequence signatures. We compared their performance using a series of extensive simulations and three real next-generation sequencing (NGS metagenomic datasets: 39 fecal samples from 33 mammalian host species, 56 marine samples across the world, and 13 fecal samples from human individuals. Results showed that the dissimilarity measure d2S can achieve superior performance when comparing metagenomic samples by clustering them into different groups as well as recovering environmental gradients affecting microbial samples. New insights into the environmental factors affecting microbial compositions in metagenomic samples

  11. On site DNA barcoding by nanopore sequencing.

    Directory of Open Access Journals (Sweden)

    Michele Menegon

    Full Text Available Biodiversity research is becoming increasingly dependent on genomics, which allows the unprecedented digitization and understanding of the planet's biological heritage. The use of genetic markers i.e. DNA barcoding, has proved to be a powerful tool in species identification. However, full exploitation of this approach is hampered by the high sequencing costs and the absence of equipped facilities in biodiversity-rich countries. In the present work, we developed a portable sequencing laboratory based on the portable DNA sequencer from Oxford Nanopore Technologies, the MinION. Complementary laboratory equipment and reagents were selected to be used in remote and tough environmental conditions. The performance of the MinION sequencer and the portable laboratory was tested for DNA barcoding in a mimicking tropical environment, as well as in a remote rainforest of Tanzania lacking electricity. Despite the relatively high sequencing error-rate of the MinION, the development of a suitable pipeline for data analysis allowed the accurate identification of different species of vertebrates including amphibians, reptiles and mammals. In situ sequencing of a wild frog allowed us to rapidly identify the species captured, thus confirming that effective DNA barcoding in the field is possible. These results open new perspectives for real-time-on-site DNA sequencing thus potentially increasing opportunities for the understanding of biodiversity in areas lacking conventional laboratory facilities.

  12. What can exome sequencing do for you?

    Science.gov (United States)

    Majewski, Jacek; Schwartzentruber, Jeremy; Lalonde, Emilie; Montpetit, Alexandre; Jabado, Nada

    2011-09-01

    Recent advances in next-generation sequencing technologies have brought a paradigm shift in how medical researchers investigate both rare and common human disorders. The ability cost-effectively to generate genome-wide sequencing data with deep coverage in a short time frame is replacing approaches that focus on specific regions for gene discovery and clinical testing. While whole genome sequencing remains prohibitively expensive for most applications, exome sequencing--a technique which focuses on only the protein-coding portion of the genome--places many advantages of the emerging technologies into researchers' hands. Recent successes using this technology have uncovered genetic defects with a limited number of probands regardless of shared genetic heritage, and are changing our approach to Mendelian disorders where soon all causative variants, genes and their relation to phenotype will be uncovered. The expectation is that, in the very near future, this technology will enable us to identify all the variants in an individual's personal genome and, in particular, clinically relevant alleles. Beyond this, whole genome sequencing is also expected to bring a major shift in clinical practice in terms of diagnosis and understanding of diseases, ultimately enabling personalised medicine based on one's genome. This paper provides an overview of the current and future use of next generation sequencing as it relates to whole exome sequencing in human disease by focusing on the technical capabilities, limitations and ethical issues associated with this technology in the field of genetics and human disease.

  13. Exploration of noncoding sequences in metagenomes.

    Directory of Open Access Journals (Sweden)

    Fabián Tobar-Tosse

    Full Text Available Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C content, Codon Usage (Cd, Trinucleotide Usage (Tn, and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.

  14. Analysis and prediction of baculovirus promoter sequences.

    Science.gov (United States)

    Xing, Ke; Deng, Riqiang; Wang, Jinwen; Feng, Jinghua; Huang, Mingsong; Wang, Xunzhang

    2005-10-01

    Consensus patterns of baculovirus sequences upstream from the translational initiation sites have been analyzed and a web tool, Local Alignment Promoter Predictor (LAPP), for the prediction of baculovirus promoter sequences has also been developed. Potential consensus sequences, i.e., TCATTGT, TCTTGTA, CTCGTAA, TCCATTT and TCATT plus TCGT in approximately 30 bp spacing context, have been found in baculovirus promoter regions, in addition to well-characterized late and early promoter elements G/T/ATAAG and TATAA, which is accompanied about 30-bp downstream by a transcriptional initiation sequence CAGT or CATT. Promoter prediction is performed by a dynamic programming algorithm based on maximal segment pair measure with scores above some cutoff against each sequence in a refined promoter database. The algorithm was able to discriminate between promoter and non-promoter sequences in a test set of baculovirus sequences with prediction specificity and sensitivity superior to that using five other eukaryotic promoter recognition programs available on the Internet. A web server that implements the LAPP with continually updated promoter database is freely available at http://life.zsu.edu.cn/LAPP/.

  15. CATEGORIZATION OF EVENT SEQUENCES FOR LICENSE APPLICATION

    Energy Technology Data Exchange (ETDEWEB)

    G.E. Ragan; P. Mecheret; D. Dexheimer

    2005-04-14

    The purposes of this analysis are: (1) Categorize (as Category 1, Category 2, or Beyond Category 2) internal event sequences that may occur before permanent closure of the repository at Yucca Mountain. (2) Categorize external event sequences that may occur before permanent closure of the repository at Yucca Mountain. This includes examining DBGM-1 seismic classifications and upgrading to DBGM-2, if appropriate, to ensure Beyond Category 2 categorization. (3) State the design and operational requirements that are invoked to make the categorization assignments valid. (4) Indicate the amount of material put at risk by Category 1 and Category 2 event sequences. (5) Estimate frequencies of Category 1 event sequences at the maximum capacity and receipt rate of the repository. (6) Distinguish occurrences associated with normal operations from event sequences. It is beyond the scope of the analysis to propose design requirements that may be required to control radiological exposure associated with normal operations. (7) Provide a convenient compilation of the results of the analysis in tabular form. The results of this analysis are used as inputs to the consequence analyses in an iterative design process that is depicted in Figure 1. Categorization of event sequences for permanent retrieval of waste from the repository is beyond the scope of this analysis. Cleanup activities that take place after an event sequence and other responses to abnormal events are also beyond the scope of the analysis.

  16. A comparative evaluation of sequence classification programs

    Directory of Open Access Journals (Sweden)

    Bazinet Adam L

    2012-05-01

    Full Text Available Abstract Background A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics. Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classification, some methods screen sequence reads for ’barcoding genes’ like 16S rRNA, or various types of protein-coding genes. Due to the sheer number and complexity of methods, it can be difficult for a researcher to choose one that is well-suited for a particular analysis. Results We divided the very large number of programs that have been released in recent years for solving the sequence classification problem into three main categories based on the general algorithm they use to compare a query sequence against a database of sequences. We also evaluated the performance of the leading programs in each category on data sets whose taxonomic and functional composition is known. Conclusions We found significant variability in classification accuracy, precision, and resource consumption of sequence classification programs when used to analyze various metagenomics data sets. However, we observe some general trends and patterns that will be useful to researchers who use sequence classification programs.

  17. Robot Sequencing and Visualization Program (RSVP)

    Science.gov (United States)

    Cooper, Brian K.; Maxwell,Scott A.; Hartman, Frank R.; Wright, John R.; Yen, Jeng; Toole, Nicholas T.; Gorjian, Zareh; Morrison, Jack C

    2013-01-01

    The Robot Sequencing and Visualization Program (RSVP) is being used in the Mars Science Laboratory (MSL) mission for downlink data visualization and command sequence generation. RSVP reads and writes downlink data products from the operations data server (ODS) and writes uplink data products to the ODS. The primary users of RSVP are members of the Rover Planner team (part of the Integrated Planning and Execution Team (IPE)), who use it to perform traversability/articulation analyses, take activity plan input from the Science and Mission Planning teams, and create a set of rover sequences to be sent to the rover every sol. The primary inputs to RSVP are downlink data products and activity plans in the ODS database. The primary outputs are command sequences to be placed in the ODS for further processing prior to uplink to each rover. RSVP is composed of two main subsystems. The first, called the Robot Sequence Editor (RoSE), understands the MSL activity and command dictionaries and takes care of converting incoming activity level inputs into command sequences. The Rover Planners use the RoSE component of RSVP to put together command sequences and to view and manage command level resources like time, power, temperature, etc. (via a transparent realtime connection to SEQGEN). The second component of RSVP is called HyperDrive, a set of high-fidelity computer graphics displays of the Martian surface in 3D and in stereo. The Rover Planners can explore the environment around the rover, create commands related to motion of all kinds, and see the simulated result of those commands via its underlying tight coupling with flight navigation, motor, and arm software. This software is the evolutionary replacement for the Rover Sequencing and Visualization software used to create command sequences (and visualize the Martian surface) for the Mars Exploration Rover mission.

  18. Locomotor sequence learning in visually guided walking.

    Science.gov (United States)

    Choi, Julia T; Jensen, Peter; Nielsen, Jens Bo

    2016-04-01

    Voluntary limb modifications must be integrated with basic walking patterns during visually guided walking. In this study we tested whether voluntary gait modifications can become more automatic with practice. We challenged walking control by presenting visual stepping targets that instructed subjects to modify step length from one trial to the next. Our sequence learning paradigm is derived from the serial reaction-time (SRT) task that has been used in upper limb studies. Both random and ordered sequences of step lengths were used to measure sequence-specific and sequence-nonspecific learning during walking. In addition, we determined how age (i.e., healthy young adults vs. children) and biomechanical factors (i.e., walking speed) affected the rate and magnitude of locomotor sequence learning. The results showed that healthy young adults (age 24 ± 5 yr,n= 20) could learn a specific sequence of step lengths over 300 training steps. Younger children (age 6-10 yr,n= 8) had lower baseline performance, but their magnitude and rate of sequence learning were the same compared with those of older children (11-16 yr,n= 10) and healthy adults. In addition, learning capacity may be more limited at faster walking speeds. To our knowledge, this is the first study to demonstrate that spatial sequence learning can be integrated with a highly automatic task such as walking. These findings suggest that adults and children use implicit knowledge about the sequence to plan and execute leg movement during visually guided walking. Copyright © 2016 the American Physiological Society.

  19. Automated genome sequence analysis and annotation.

    Science.gov (United States)

    Andrade, M A; Brown, N P; Leroy, C; Hoersch, S; de Daruvar, A; Reich, C; Franchini, A; Tamames, J; Valencia, A; Ouzounis, C; Sander, C

    1999-05-01

    Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized. Research in bioinformatics contributes to the development of methods for the computational characterization of these sequences. However, the installation and application of these methods require experience and are time consuming. We present here an automatic system for preliminary functional annotation of protein sequences that has been applied to the analysis of sets of sequences from complete genomes, both to refine overall performance and to make new discoveries comparable to those made by human experts. The GeneQuiz system includes a Web-based browser that allows examination of the evidence leading to an automatic annotation and offers additional information, views of the results, and links to biological databases that complement the automatic analysis. System structure and operating principles concerning the use of multiple sequence databases, underlying sequence analysis tools, lexical analyses of database annotations and decision criteria for functional assignments are detailed. The system makes automatic quality assessments of results based on prior experience with the underlying sequence analysis tools; overall error rates in functional assignment are estimated at 2.5-5% for cases annotated with highest reliability ('clear' cases). Sources of over-interpretation of results are discussed with proposals for improvement. A conservative definition for reporting 'new findings' that takes account of database maturity is presented along with examples of possible kinds of discoveries (new function, family and superfamily) made by the system. System performance in relation to sequence database coverage, database dynamics and database search methods is analysed, demonstrating the inherent advantages of an integrated automatic approach using multiple databases and search methods applied in an objective and repeatable manner. The GeneQuiz system

  20. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  1. Rapid and Accurate Sequencing of Enterovirus Genomes Using MinION Nanopore Sequencer.

    Science.gov (United States)

    Wang, Ji; Ke, Yue Hua; Zhang, Yong; Huang, Ke Qiang; Wang, Lei; Shen, Xin Xin; Dong, Xiao Ping; Xu, Wen Bo; Ma, Xue Jun

    2017-10-01

    Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly used to sequence various viral pathogens in many clinical situations because of its long reads, portability, real-time accessibility of sequenced data, and very low initial costs. However, information is lacking on MinION sequencing of enterovirus genomes. In this proof-of-concept study using Enterovirus 71 (EV71) and Coxsackievirus A16 (CA16) strains as examples, we established an amplicon-based whole genome sequencing method using MinION. We explored the accuracy, minimum sequencing time, discrimination and high-throughput sequencing ability of MinION, and compared its performance with Sanger sequencing. Within the first minute (min) of sequencing, the accuracy of MinION was 98.5% for the single EV71 strain and 94.12%-97.33% for 10 genetically-related CA16 strains. In as little as 14 min, 99% identity was reached for the single EV71 strain, and in 17 min (on average), 99% identity was achieved for 10 CA16 strains in a single run. MinION is suitable for whole genome sequencing of enteroviruses with sufficient accuracy and fine discrimination and has the potential as a fast, reliable and convenient method for routine use. Copyright © 2017 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.

  2. Computer simulation of replacement sequences in copper

    International Nuclear Information System (INIS)

    Schiffgens, J.O.; Schwartz, D.W.; Ariyasu, R.G.; Cascadden, S.E.

    1978-01-01

    Results of computer simulations of , , and replacement sequences in copper are presented, including displacement thresholds, focusing energies, energy losses per replacement, and replacement sequence lengths. These parameters are tabulated for six interatomic potentials and shown to vary in a systematic way with potential stiffness and range. Comparisons of results from calculations made with ADDES, a quasi-dynamical code, and COMENT, a dynamical code, show excellent agreement, demonstrating that the former can be calibrated and used satisfactorily in the analysis of low energy displacement cascades. Upper limits on , , and replacement sequences were found to be approximately 10, approximately 30, and approximately 14 replacements, respectively. (author)

  3. Method for sequencing DNA base pairs

    Science.gov (United States)

    Sessler, Andrew M.; Dawson, John

    1993-01-01

    The base pairs of a DNA structure are sequenced with the use of a scanning tunneling microscope (STM). The DNA structure is scanned by the STM probe tip, and, as it is being scanned, the DNA structure is separately subjected to a sequence of infrared radiation from four different sources, each source being selected to preferentially excite one of the four different bases in the DNA structure. Each particular base being scanned is subjected to such sequence of infrared radiation from the four different sources as that particular base is being scanned. The DNA structure as a whole is separately imaged for each subjection thereof to radiation from one only of each source.

  4. Nanopore-CMOS Interfaces for DNA Sequencing.

    Science.gov (United States)

    Magierowski, Sebastian; Huang, Yiyun; Wang, Chengjie; Ghafar-Zadeh, Ebrahim

    2016-08-06

    DNA sequencers based on nanopore sensors present an opportunity for a significant break from the template-based incumbents of the last forty years. Key advantages ushered by nanopore technology include a simplified chemistry and the ability to interface to CMOS technology. The latter opportunity offers substantial promise for improvement in sequencing speed, size and cost. This paper reviews existing and emerging means of interfacing nanopores to CMOS technology with an emphasis on massively-arrayed structures. It presents this in the context of incumbent DNA sequencing techniques, reviews and quantifies nanopore characteristics and models and presents CMOS circuit methods for the amplification of low-current nanopore signals in such interfaces.

  5. Probabilistic Motor Sequence Yields Greater Offline and Less Online Learning than Fixed Sequence.

    Science.gov (United States)

    Du, Yue; Prashad, Shikha; Schoenbrun, Ilana; Clark, Jane E

    2016-01-01

    It is well acknowledged that motor sequences can be learned quickly through online learning. Subsequently, the initial acquisition of a motor sequence is boosted or consolidated by offline learning. However, little is known whether offline learning can drive the fast learning of motor sequences (i.e., initial sequence learning in the first training session). To examine offline learning in the fast learning stage, we asked four groups of young adults to perform the serial reaction time (SRT) task with either a fixed or probabilistic sequence and with or without preliminary knowledge (PK) of the presence of a sequence. The sequence and PK were manipulated to emphasize either procedural (probabilistic sequence; no preliminary knowledge (NPK)) or declarative (fixed sequence; with PK) memory that were found to either facilitate or inhibit offline learning. In the SRT task, there were six learning blocks with a 2 min break between each consecutive block. Throughout the session, stimuli followed the same fixed or probabilistic pattern except in Block 5, in which stimuli appeared in a random order. We found that PK facilitated the learning of a fixed sequence, but not a probabilistic sequence. In addition to overall learning measured by the mean reaction time (RT), we examined the progressive changes in RT within and between blocks (i.e., online and offline learning, respectively). It was found that the two groups who performed the fixed sequence, regardless of PK, showed greater online learning than the other two groups who performed the probabilistic sequence. The groups who performed the probabilistic sequence, regardless of PK, did not display online learning, as indicated by a decline in performance within the learning blocks. However, they did demonstrate remarkably greater offline improvement in RT, which suggests that they are learning the probabilistic sequence offline. These results suggest that in the SRT task, the fast acquisition of a motor sequence is driven

  6. Fault location using synchronized sequence measurements

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Chun; Jia, Qing-Quan; Li, Xin-Bin; Dou, Chun-Xia [Department of Power Electrical Engineering, Yanshan University, Qinhuangdao 066004 (China)

    2008-02-15

    This paper proposes fault location formulas using synchronized sequence measurements. For earth faults, zero-sequence voltages and currents at two terminals of faulted line are applied to fault location. Negative-sequence measurements are utilized for asymmetrical faults and positive-sequence measurements are used for three-phase faults. The fault location formulas are derived from a fault location technique [Wang C, Dou C, Li X, Jia Q. A WAMS/PMU-based fault location technique. Elect Power Syst Res 2007;77(8):936-945] based on WAMS/PMU. The technique uses synchronized fault voltages measured by PMUs in power network. The formulas are simple and are easy for application. Case studies on a testing network with 500 kV transmission lines including ATP/EMTP simulations are presented. Various fault types and fault resistances are also considered. (author)

  7. Supervised Sequence Labelling with Recurrent Neural Networks

    CERN Document Server

    Graves, Alex

    2012-01-01

    Supervised sequence labelling is a vital area of machine learning, encompassing tasks such as speech, handwriting and gesture recognition, protein secondary structure prediction and part-of-speech tagging. Recurrent neural networks are powerful sequence learning tools—robust to input noise and distortion, able to exploit long-range contextual information—that would seem ideally suited to such problems. However their role in large-scale sequence labelling systems has so far been auxiliary.    The goal of this book is a complete framework for classifying and transcribing sequential data with recurrent neural networks only. Three main innovations are introduced in order to realise this goal. Firstly, the connectionist temporal classification output layer allows the framework to be trained with unsegmented target sequences, such as phoneme-level speech transcriptions; this is in contrast to previous connectionist approaches, which were dependent on error-prone prior segmentation. Secondly, multidimensional...

  8. The International Nucleotide Sequence Database Collaboration.

    Science.gov (United States)

    Cochrane, Guy; Karsch-Mizrachi, Ilene; Nakamura, Yasukazu

    2011-01-01

    Under the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org), globally comprehensive public domain nucleotide sequence is captured, preserved and presented. The partners of this long-standing collaboration work closely together to provide data formats and conventions that enable consistent data submission to their databases and support regular data exchange around the globe. Clearly defined policy and governance in relation to free access to data and relationships with journal publishers have positioned INSDC databases as a key provider of the scientific record and a core foundation for the global bioinformatics data infrastructure. While growth in sequence data volumes comes no longer as a surprise to INSDC partners, the uptake of next-generation sequencing technology by mainstream science that we have witnessed in recent years brings a step-change to growth, necessarily making a clear mark on INSDC strategy. In this article, we introduce the INSDC, outline data growth patterns and comment on the challenges of increased growth.

  9. Galaxy LIMS for next-generation sequencing

    NARCIS (Netherlands)

    Scholtalbers, J.; Rossler, J.; Sorn, P.; Graaf, J. de; Boisguerin, V.; Castle, J.; Sahin, U.

    2013-01-01

    SUMMARY: We have developed a laboratory information management system (LIMS) for a next-generation sequencing (NGS) laboratory within the existing Galaxy platform. The system provides lab technicians standard and customizable sample information forms, barcoded submission forms, tracking of input

  10. Fluency First: Reversing the Traditional ESL Sequence.

    Science.gov (United States)

    MacGowan-Gilhooly, Adele

    1991-01-01

    Describes an ESL department's whole language approach to writing and reading, replacing its traditional grammar-based ESL instructional sequence. Reports the positive quantitative and qualitative results of the first three years of using the new approach. (KEH)

  11. Identifying driver mutations in sequenced cancer genomes

    DEFF Research Database (Denmark)

    Raphael, Benjamin J; Dobson, Jason R; Oesper, Layla

    2014-01-01

    High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise......, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA...... sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein...

  12. Generalized locally Toeplitz sequences theory and applications

    CERN Document Server

    Garoni, Carlo

    2017-01-01

    Based on their research experience, the authors propose a reference textbook in two volumes on the theory of generalized locally Toeplitz sequences and their applications. This first volume focuses on the univariate version of the theory and the related applications in the unidimensional setting, while the second volume, which addresses the multivariate case, is mainly devoted to concrete PDE applications. This book systematically develops the theory of generalized locally Toeplitz (GLT) sequences and presents some of its main applications, with a particular focus on the numerical discretization of differential equations (DEs). It is the first book to address the relatively new field of GLT sequences, which occur in numerous scientific applications and are especially dominant in the context of DE discretizations. Written for applied mathematicians, engineers, physicists, and scientists who (perhaps unknowingly) encounter GLT sequences in their research, it is also of interest to those working in the fields of...

  13. Characterizing leader sequences of CRISPR loci

    DEFF Research Database (Denmark)

    Alkhnbashi, Omer; Shah, Shiraz Ali; Garrett, Roger Antony

    2016-01-01

    The CRISPR-Cas system is an adaptive immune system in many archaea and bacteria, which provides resistance against invading genetic elements. The first phase of CRISPR-Cas immunity is called adaptation, in which small DNA fragments are excised from genetic elements and are inserted into a CRISPR...... array generally adjacent to its so called leader sequence at one end of the array. It has been shown that transcription initiation and adaptation signals of the CRISPR array are located within the leader. However, apart from promoters, there is very little knowledge of sequence or structural motifs...... sequences by focusing on the consensus repeat of the adjacent CRISPR array and weak upstream conservation signals. We applied our tool to the analysis of a comprehensive genomic database and identified several characteristic properties of leader sequences specific to archaea and bacteria, ranging from...

  14. Sequencing Information Management System (SIMS). Final report

    Energy Technology Data Exchange (ETDEWEB)

    Fields, C.

    1996-02-15

    A feasibility study to develop a requirements analysis and functional specification for a data management system for large-scale DNA sequencing laboratories resulted in a functional specification for a Sequencing Information Management System (SIMS). This document reports the results of this feasibility study, and includes a functional specification for a SIMS relational schema. The SIMS is an integrated information management system that supports data acquisition, management, analysis, and distribution for DNA sequencing laboratories. The SIMS provides ad hoc query access to information on the sequencing process and its results, and partially automates the transfer of data between laboratory instruments, analysis programs, technical personnel, and managers. The SIMS user interfaces are designed for use by laboratory technicians, laboratory managers, and scientists. The SIMS is designed to run in a heterogeneous, multiplatform environment in a client/server mode. The SIMS communicates with external computational and data resources via the internet.

  15. Sequence finishing and mapping of Drosophila melanogasterheterochromatin

    Energy Technology Data Exchange (ETDEWEB)

    Hoskins, Roger A.; Carlson, Joseph W.; Kennedy, Cameron; Acevedo,David; Evans-Holm, Martha; Frise, Erwin; Wan, Kenneth H.; Park, Soo; Mendez-Lago, Maria; Rossi, Fabrizio; Villasante, Alfredo; Dimitri,Patrizio; Karpen, Gary H.; Celniker, Susan E.

    2007-06-15

    Genome sequences for most metazoans are incomplete due tothe presence of repeated DNA in the pericentromeric heterochromatin. Theheterochromatic regions of D. melanogaster contain 20 Mb of sequenceamenable to mapping, sequence assembly and finishing. Here we describethe generation of 15 Mb of finished or improved heterochromatic sequenceusing available clone resources and assembly and mapping methods. We alsoconstructed a BAC-based physical map that spans approximately 13 Mb ofthe pericentromeric heterochromatin, and a cytogenetic map that positionsapproximately 11 Mb of BAC contigs and sequence scaffolds in specificchromosomal locations. The integrated sequence assembly and maps greatlyimprove our understanding of the structure and composition of this poorlyunderstood fraction of a metazoan genome and provide a framework forfunctional analyses.

  16. Applications of High Throughput Nucleotide Sequencing

    DEFF Research Database (Denmark)

    Waage, Johannes Eichler

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come...... equally large demands in data handling, analysis and interpretation, perhaps defining the modern challenge of the computational biologist of the post-genomic era. The first part of this thesis consists of a general introduction to the history, common terms and challenges of next generation sequencing......, focusing on oft encountered problems in data processing, such as quality assurance, mapping, normalization, visualization, and interpretation. Presented in the second part are scientific endeavors representing solutions to problems of two sub-genres of next generation sequencing. For the first flavor, RNA-sequencing...

  17. Digital Recovery Sequencer - Advanced Concept Ejection Seats

    National Research Council Canada - National Science Library

    Ross, David A; Cotter, Lee; Culhane, David; Press, Matthew J

    2005-01-01

    The Advanced Concept Ejection Seat (ACES) currently uses the Analog Sequencer, designed in the 1960's and 1970's with analog technology, to control ejection event timing and ejection mode selection...

  18. Improved polynomial remainder sequences for Ore polynomials.

    Science.gov (United States)

    Jaroschek, Maximilian

    2013-11-01

    Polynomial remainder sequences contain the intermediate results of the Euclidean algorithm when applied to (non-)commutative polynomials. The running time of the algorithm is dependent on the size of the coefficients of the remainders. Different ways have been studied to make these as small as possible. The subresultant sequence of two polynomials is a polynomial remainder sequence in which the size of the coefficients is optimal in the generic case, but when taking the input from applications, the coefficients are often larger than necessary. We generalize two improvements of the subresultant sequence to Ore polynomials and derive a new bound for the minimal coefficient size. Our approach also yields a new proof for the results in the commutative case, providing a new point of view on the origin of the extraneous factors of the coefficients.

  19. Improved polynomial remainder sequences for Ore polynomials☆

    Science.gov (United States)

    Jaroschek, Maximilian

    2013-01-01

    Polynomial remainder sequences contain the intermediate results of the Euclidean algorithm when applied to (non-)commutative polynomials. The running time of the algorithm is dependent on the size of the coefficients of the remainders. Different ways have been studied to make these as small as possible. The subresultant sequence of two polynomials is a polynomial remainder sequence in which the size of the coefficients is optimal in the generic case, but when taking the input from applications, the coefficients are often larger than necessary. We generalize two improvements of the subresultant sequence to Ore polynomials and derive a new bound for the minimal coefficient size. Our approach also yields a new proof for the results in the commutative case, providing a new point of view on the origin of the extraneous factors of the coefficients. PMID:26523087

  20. Expressed sequence tags (ESTs) and single nucleotide ...

    African Journals Online (AJOL)

    SERVER

    2008-02-19

    stranded DNA binding dyes or fluorophore-labelled ..... Comparative sequence analysis of plant nuclear genomes: microcolinearity and its many exceptions. Plant Cell 12(7):. 1021-1029. Bertone P, Snyder M (2005). Prospects and ...

  1. Automated Testing with Targeted Event Sequence Generation

    DEFF Research Database (Denmark)

    Jensen, Casper Svenning; Prasad, Mukul R.; Møller, Anders

    2013-01-01

    Automated software testing aims to detect errors by producing test inputs that cover as much of the application source code as possible. Applications for mobile devices are typically event-driven, which raises the challenge of automatically producing event sequences that result in high coverage....... Some existing approaches use random or model-based testing that largely treats the application as a black box. Other approaches use symbolic execution, either starting from the entry points of the applications or on specific event sequences. A common limitation of the existing approaches...... is that they often fail to reach the parts of the application code that require more complex event sequences. We propose a two-phase technique for automatically finding event sequences that reach a given target line in the application code. The first phase performs concolic execution to build summaries...

  2. Ancestral sequence alignment under optimal conditions

    Directory of Open Access Journals (Sweden)

    Brown Daniel G

    2005-11-01

    Full Text Available Abstract Background Multiple genome alignment is an important problem in bioinformatics. An important subproblem used by many multiple alignment approaches is that of aligning two multiple alignments. Many popular alignment algorithms for DNA use the sum-of-pairs heuristic, where the score of a multiple alignment is the sum of its induced pairwise alignment scores. However, the biological meaning of the sum-of-pairs of pairs heuristic is not obvious. Additionally, many algorithms based on the sum-of-pairs heuristic are complicated and slow, compared to pairwise alignment algorithms. An alternative approach to aligning alignments is to first infer ancestral sequences for each alignment, and then align the two ancestral sequences. In addition to being fast, this method has a clear biological basis that takes into account the evolution implied by an underlying phylogenetic tree. In this study we explore the accuracy of aligning alignments by ancestral sequence alignment. We examine the use of both maximum likelihood and parsimony to infer ancestral sequences. Additionally, we investigate the effect on accuracy of allowing ambiguity in our ancestral sequences. Results We use synthetic sequence data that we generate by simulating evolution on a phylogenetic tree. We use two different types of phylogenetic trees: trees with a period of rapid growth followed by a period of slow growth, and trees with a period of slow growth followed by a period of rapid growth. We examine the alignment accuracy of four ancestral sequence reconstruction and alignment methods: parsimony, maximum likelihood, ambiguous parsimony, and ambiguous maximum likelihood. Additionally, we compare against the alignment accuracy of two sum-of-pairs algorithms: ClustalW and the heuristic of Ma, Zhang, and Wang. Conclusion We find that allowing ambiguity in ancestral sequences does not lead to better multiple alignments. Regardless of whether we use parsimony or maximum likelihood, the

  3. ASAP: Amplification, sequencing & annotation of plastomes

    Directory of Open Access Journals (Sweden)

    Folta Kevin M

    2005-12-01

    Full Text Available Abstract Background Availability of DNA sequence information is vital for pursuing structural, functional and comparative genomics studies in plastids. Traditionally, the first step in mining the valuable information within a chloroplast genome requires sequencing a chloroplast plasmid library or BAC clones. These activities involve complicated preparatory procedures like chloroplast DNA isolation or identification of the appropriate BAC clones to be sequenced. Rolling circle amplification (RCA is being used currently to amplify the chloroplast genome from purified chloroplast DNA and the resulting products are sheared and cloned prior to sequencing. Herein we present a universal high-throughput, rapid PCR-based technique to amplify, sequence and assemble plastid genome sequence from diverse species in a short time and at reasonable cost from total plant DNA, using the large inverted repeat region from strawberry and peach as proof of concept. The method exploits the highly conserved coding regions or intergenic regions of plastid genes. Using an informatics approach, chloroplast DNA sequence information from 5 available eudicot plastomes was aligned to identify the most conserved regions. Cognate primer pairs were then designed to generate ~1 – 1.2 kb overlapping amplicons from the inverted repeat region in 14 diverse genera. Results 100% coverage of the inverted repeat region was obtained from Arabidopsis, tobacco, orange, strawberry, peach, lettuce, tomato and Amaranthus. Over 80% coverage was obtained from distant species, including Ginkgo, loblolly pine and Equisetum. Sequence from the inverted repeat region of strawberry and peach plastome was obtained, annotated and analyzed. Additionally, a polymorphic region identified from gel electrophoresis was sequenced from tomato and Amaranthus. Sequence analysis revealed large deletions in these species relative to tobacco plastome thus exhibiting the utility of this method for structural and

  4. DESCRIPTION OF THE RHIC SEQUENCER SYSTEM

    International Nuclear Information System (INIS)

    DOTTAVIO, T.; FRAK, B.; MORRIS, J.; SATOGATA, T.; VAN ZEIJTS, J.

    2001-01-01

    The movement of the Relativistic Heavy Ion Collider (RHIC) through its various states (eg. injection, acceleration, storage, collisions) is controlled by an application called the Sequencer. This program orchestrates most magnet and instrumentation systems and is responsible for the coordinated acquisition and saving of data from various systems. The Sequencer system, its software infrastructure, support programs, and the language used to drive it are discussed in this paper. Initial operational experience is also described

  5. Transforming clinical microbiology with bacterial genome sequencing

    OpenAIRE

    Didelot, Xavier; Bowden, Rory; Wilson, Daniel J.; Peto, Tim E. A.; Crook, Derrick W.

    2012-01-01

    Whole genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here we review the current status of clinical microbiology and how it has already begun to be transformed by the use of next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pa...

  6. Optimization of a sequence of reactors

    DEFF Research Database (Denmark)

    Vidal, Rene Victor Valqui

    1991-01-01

    Concerns the optimal production of sulphuric acid in a sequence of reactors. Using a suitable approximation to the objective function, this problem can easily be solved using the maximum principle. A numerical example documents the applicability of the suggested approach......Concerns the optimal production of sulphuric acid in a sequence of reactors. Using a suitable approximation to the objective function, this problem can easily be solved using the maximum principle. A numerical example documents the applicability of the suggested approach...

  7. Chromatid interchanges at intrachromosomal telomeric DNA sequences

    International Nuclear Information System (INIS)

    Fernandez, J.L.; Vazquez-Gundin, F.; Bilbao, A.; Gosalvez, J.; Goyanes, V.

    1997-01-01

    Chinese hamster Don cells were exposed to X-rays, mitomycin C and teniposide (VM-26) to induce chromatid exchanges (quadriradials and triradials). After fluorescence in situ hybridization (FISH) of telomere sequences it was found that interstitial telomere-like DNA sequence arrays presented around five times more breakage-rearrangements than the genome overall. This high recombinogenic capacity was independent of the clastogen, suggesting that this susceptibility is not related to the initial mechanisms of DNA damage. (author)

  8. Biases in small RNA deep sequencing data

    OpenAIRE

    Raabe, Carsten A.; Tang, Thean-Hock; Brosius, Juergen; Rozhdestvensky, Timofey S.

    2013-01-01

    High-throughput RNA sequencing (RNA-seq) is considered a powerful tool for novel gene discovery and fine-tuned transcriptional profiling. The digital nature of RNA-seq is also believed to simplify meta-analysis and to reduce background noise associated with hybridization-based approaches. The development of multiplex sequencing enables efficient and economic parallel analysis of gene expression. In addition, RNA-seq is of particular value when low RNA expression or modest changes between samp...

  9. A Method to Construct Generalized Fibonacci Sequences

    Directory of Open Access Journals (Sweden)

    Adalberto García-Máynez

    2016-01-01

    Full Text Available The main purpose of this paper is to study the convergence properties of Generalized Fibonacci Sequences and the series of partial sums associated with them. When the proper values of an s×s real matrix A are real and different, we give a necessary and sufficient condition for the convergence of the matrix sequence A,A2,A3,… to a matrix B.

  10. Fibonacci difference sequence spaces for modulus functions

    Directory of Open Access Journals (Sweden)

    Kuldip Raj

    2015-05-01

    Full Text Available In the present paper we introduce Fibonacci difference sequence spaces l(F, Ƒ, p, u and  l_∞(F, Ƒ, p, u by using a sequence of modulus functions and a new band matrix F. We also make an effort to study some inclusion relations, topological and geometric properties of these spaces. Furthermore, the alpha, beta, gamma duals and matrix transformation of the space l(F, Ƒ, p, u are determined.

  11. Gao's conjecture on zero-sum sequences

    Indian Academy of Sciences (India)

    M. Senthilkumar (Newgen Imaging) 1461 1996 Oct 15 13:05:22

    Davenport's constant is connected with algebraic number theory as follows. Let K be a number field (i.e., a finite extension of Q) and ... maps a sequence to the sum of its elements. Let S = ∏l ν=1 gν ∈ F(G) be a sequence. Then S has a .... A more general application analogous to the E–G–Z theorem for a finite group had ...

  12. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  13. On statistical acceleration convergence of double sequences

    Directory of Open Access Journals (Sweden)

    Bipan Hazarika

    2017-04-01

    Full Text Available In this article the notion of statistical acceleration convergence of double sequences in Pringsheim's sense has been introduced. We prove the decompostion theorems for  statistical acceleration convergence of double sequences and some theorems related to that concept have been established using the four dimensional matrix transformations. We provided some examples, where the results of acceleration convergence fails to hold for the statistical cases.

  14. Task sequencing for autonomous robotic vacuum cleaners

    Science.gov (United States)

    Gorbenko, Anna; Popov, Vladimir

    2017-07-01

    Various planning problems for robotic systems are of considerable interest. One of such problems is the problem of task sequencing. In this paper, we consider the problem of task sequencing for autonomous vacuum floor cleaning robots. We consider a graph model for the problem. We propose an efficient approach to solve the problem. In particular, we use an explicit reduction from the decision version of the problem to the satisfiability problem. We present the results of computational experiments for different satisfiability algorithms.

  15. Trace maps of general substitutional sequences

    International Nuclear Information System (INIS)

    Kolar, M.; Nori, F.

    1990-01-01

    It is shown that for arbitrary n, there exists a trace map for any n-letter substitutional sequence. Trace maps are explicitly obtained for the well-known circle and Rudin-Shapiro sequences which can be defined by means of substitution rules on three and four letters, respectively. The properties of the two trace maps and their consequences for various spectral properties are briefly discussed

  16. Value of a newly sequenced bacterial genome

    DEFF Research Database (Denmark)

    Barbosa, Eudes; Aburjaile, Flavia F; Ramos, Rommel Tj

    2014-01-01

    and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses...... the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information....

  17. Analysis of Neuronal Sequences Using Pairwise Biases

    Science.gov (United States)

    2015-08-27

    semantic memory (knowledge of facts) and implicit memory (e.g., how to ride a bike ). Evidence for the participation of the hippocampus in the formation of...very different from each other in many ways including duration and number of spikes. Still, these sequences share a similar trend in the general order...1 and 2 precede all other spikes in both s and s�). Many other sequences share this property with s and s�; in fact, we can completely characterize

  18. Escherichia Coli: From Genome Sequences to Consequence

    Directory of Open Access Journals (Sweden)

    Mark Pallen

    2006-01-01

    Full Text Available The present article summarizes a presentation given by Professor Mark Pallen of the School of Medicine at the University of Birmingham (Birmingham, United Kingdom for the Fourth Stanier Lecture held in Regina, Saskatchewan, on November 9, 2004. Professor Pallen's lecture, entitled 'Escherichia coli: From genome sequences to consequences', provides a summary of the important discoveries of his team of research scientists in the area of genetic sequencing and variations in phenotypic expression.

  19. The DNA sequence specificity of bleomycin cleavage in a systematically altered DNA sequence.

    Science.gov (United States)

    Gautam, Shweta D; Chen, Jon K; Murray, Vincent

    2017-08-01

    Bleomycin is an anti-tumour agent that is clinically used to treat several types of cancers. Bleomycin cleaves DNA at specific DNA sequences and recent genome-wide DNA sequencing specificity data indicated that the sequence 5'-RTGT*AY (where T* is the site of bleomycin cleavage, R is G/A and Y is T/C) is preferentially cleaved by bleomycin in human cells. Based on this DNA sequence, we constructed a plasmid clone to explore this bleomycin cleavage preference. By systematic variation of single nucleotides in the 5'-RTGT*AY sequence, we were able to investigate the effect of nucleotide changes on bleomycin cleavage efficiency. We observed that the preferred consensus DNA sequence for bleomycin cleavage in the plasmid clone was 5'-YYGT*AW (where W is A/T). The most highly cleaved sequence was 5'-TCGT*AT and, in fact, the seven most highly cleaved sequences conformed to the consensus sequence 5'-YYGT*AW. A comparison with genome-wide results was also performed and while the core sequence was similar in both environments, the surrounding nucleotides were different.

  20. A neurocomputational model of automatic sequence production.

    Science.gov (United States)

    Helie, Sebastien; Roeder, Jessica L; Vucovich, Lauren; Rünger, Dennis; Ashby, F Gregory

    2015-07-01

    Most behaviors unfold in time and include a sequence of submovements or cognitive activities. In addition, most behaviors are automatic and repeated daily throughout life. Yet, relatively little is known about the neurobiology of automatic sequence production. Past research suggests a gradual transfer from the associative striatum to the sensorimotor striatum, but a number of more recent studies challenge this role of the BG in automatic sequence production. In this article, we propose a new neurocomputational model of automatic sequence production in which the main role of the BG is to train cortical-cortical connections within the premotor areas that are responsible for automatic sequence production. The new model is used to simulate four different data sets from human and nonhuman animals, including (1) behavioral data (e.g., RTs), (2) electrophysiology data (e.g., single-neuron recordings), (3) macrostructure data (e.g., TMS), and (4) neurological circuit data (e.g., inactivation studies). We conclude with a comparison of the new model with existing models of automatic sequence production and discuss a possible new role for the BG in automaticity and its implication for Parkinson's disease.

  1. A Unified Theoretical Framework for Cognitive Sequencing.

    Science.gov (United States)

    Savalia, Tejas; Shukla, Anuj; Bapi, Raju S

    2016-01-01

    The capacity to sequence information is central to human performance. Sequencing ability forms the foundation stone for higher order cognition related to language and goal-directed planning. Information related to the order of items, their timing, chunking and hierarchical organization are important aspects in sequencing. Past research on sequencing has emphasized two distinct and independent dichotomies: implicit vs. explicit and goal-directed vs. habits. We propose a theoretical framework unifying these two streams. Our proposal relies on brain's ability to implicitly extract statistical regularities from the stream of stimuli and with attentional engagement organizing sequences explicitly and hierarchically. Similarly, sequences that need to be assembled purposively to accomplish a goal require engagement of attentional processes. With repetition, these goal-directed plans become habits with concomitant disengagement of attention. Thus, attention and awareness play a crucial role in the implicit-to-explicit transition as well as in how goal-directed plans become automatic habits. Cortico-subcortical loops basal ganglia-frontal cortex and hippocampus-frontal cortex loops mediate the transition process. We show how the computational principles of model-free and model-based learning paradigms, along with a pivotal role for attention and awareness, offer a unifying framework for these two dichotomies. Based on this framework, we make testable predictions related to the potential influence of response-to-stimulus interval (RSI) on developing awareness in implicit learning tasks.

  2. Mitochondrial sequence changes in keratoconus patients.

    Science.gov (United States)

    Abu-Amero, Khaled K; Azad, Taif Anwar; Kalantan, Hatem; Sultan, Tahira; Al-Muammar, Abdulrahman M

    2014-03-20

    We investigated whether a group of patients with keratoconus (KTCN) harbor mutations in the mitochondrial genome. We sequenced the full mitochondrial genome in a group of Saudi patients with KTCN (n = 26) and 100 ethnically matched controls who had no KTCN by examination. A total of 10 KTCN patients (38.5%) had potentially pathogenic nonsynonymous mtDNA mutations. Of the nonsynonymous sequence changes detected, 4 (40%) were in Complex I, one was in the tRNA(Glutamine), one was in tRNA(Tryptophan), one was in tRNA(Asparagine), one was in tRNA(Histidine), and two were in the tRNA(Leucine2). One nonsynonymous sequence change was heteroplasmic, whereas all the remaining 9 were homoplasmic. These sequence changes were not detected in controls of similar ethnicity. Four sequence changes were novel (were not reported previously) and 5 were reported previously. Additionally, we detected 54 synonymous (does not result in an amino acid change) sequence changes with no pathologic significance. If our results are confirmed in a larger cohort and multiple ethnicities, then mtDNA mutation may be considered as a genetic risk factor contributing indirectly through the oxidative stress mechanism to the development and/or progression of KTCN.

  3. Bunches of random cross-correlated sequences

    International Nuclear Information System (INIS)

    Maystrenko, A A; Melnik, S S; Pritula, G M; Usatenko, O V

    2013-01-01

    The statistical properties of random cross-correlated sequences constructed by the convolution method (likewise referred to as the Rice or the inverse Fourier transformation) are examined. We clarify the meaning of the filtering function—the kernel of the convolution operator—and show that it is the value of the cross-correlation function which describes correlations between the initial white noise and constructed correlated sequences. The matrix generalization of this method for constructing a bunch of N cross-correlated sequences is presented. Algorithms for their generation are reduced to solving the problem of decomposition of the Fourier transform of the correlation matrix into a product of two mutually conjugate matrices. Different decompositions are considered. The limits of weak and strong correlations for the one-point probability and pair correlation functions of sequences generated by the method under consideration are studied. Special cases of heavy-tailed distributions of the generated sequences are analyzed. We show that, if the filtering function is rather smooth, the distribution function of generated variables has the Gaussian or Lévy form depending on the analytical properties of the distribution (or characteristic) functions of the initial white noise. Anisotropic properties of statistically homogeneous random sequences related to the asymmetry of a filtering function are revealed and studied. These asymmetry properties are expressed in terms of the third- or fourth-order correlation functions. Several examples of the construction of correlated chains with a predefined correlation matrix are given. (paper)

  4. Reporting Differences Between Spacecraft Sequence Files

    Science.gov (United States)

    Khanampompan, Teerapat; Gladden, Roy E.; Fisher, Forest W.

    2010-01-01

    A suite of computer programs, called seq diff suite, reports differences between the products of other computer programs involved in the generation of sequences of commands for spacecraft. These products consist of files of several types: replacement sequence of events (RSOE), DSN keyword file [DKF (wherein DSN signifies Deep Space Network)], spacecraft activities sequence file (SASF), spacecraft sequence file (SSF), and station allocation file (SAF). These products can include line numbers, request identifications, and other pieces of information that are not relevant when generating command sequence products, though these fields can result in the appearance of many changes to the files, particularly when using the UNIX diff command to inspect file differences. The outputs of prior software tools for reporting differences between such products include differences in these non-relevant pieces of information. In contrast, seq diff suite removes the fields containing the irrelevant pieces of information before processing to extract differences, so that only relevant differences are reported. Thus, seq diff suite is especially useful for reporting changes between successive versions of the various products and in particular flagging difference in fields relevant to the sequence command generation and review process.

  5. A Unified Theoretical Framework for Cognitive Sequencing

    Directory of Open Access Journals (Sweden)

    Tejas Savalia

    2016-11-01

    Full Text Available The capacity to sequence information is central to human performance. Sequencing ability forms the foundation stone for higher order cognition related to language and goal-directed planning. Information related to the order of items, their timing, chunking and hierarchical organization are important aspects in sequencing. Past research on sequencing has emphasized two distinct and independent dichotomies: implicit versus explicit and goal-directed versus habits. We propose a theoretical framework unifying these two streams. Our proposal relies on brain's ability to implicitly extract statistical regularities from the stream of stimuli and with attentional engagement organizing sequences explicitly and hierarchically. Similarly, sequences that need to be assembled purposively to accomplish a goal require engagement of attentional processes. With repetition, these goal-directed plans become habits with concomitant disengagement of attention. Thus attention and awareness play a crucial role in the implicit-to-explicit transition as well as in how goal-directed plans become automatic habits. Cortico-subcortical loops ─ basal ganglia-frontal cortex and hippocampus-frontal cortex loops ─ mediate the transition process. We show how the computational principles of model-free and model-based learning paradigms, along with a pivotal role for attention and awareness, offer a unifying framework for these two dichotomies. Based on this framework, we make testable predictions related to the potential influence of response-to-stimulus interval (RSI on developing awareness in implicit learning tasks.

  6. Binary sequence detector uses minimum number of decision elements

    Science.gov (United States)

    Perlman, M.

    1966-01-01

    Detector of an n bit binary sequence code within a serial binary data system assigns states to memory elements of a code sequence detector by employing the same order of states for the sequence detector as that of the sequence generator when the linear recursion relationship employed by the sequence generator is given.

  7. Generalized Vector-Valued Sequence Spaces Defined by Modulus Functions

    Directory of Open Access Journals (Sweden)

    Işik Mahmut

    2010-01-01

    Full Text Available We introduce the vector-valued sequence spaces , , and , and , using a sequence of modulus functions and the multiplier sequence of nonzero complex numbers. We give some relations related to these sequence spaces. It is also shown that if a sequence is strongly -Cesàro summable with respect to the modulus function then it is -statistically convergent.

  8. The impact of sequence length and number of sequences on promoter prediction performance.

    Science.gov (United States)

    Carvalho, Sávio G; Guerra-Sá, Renata; de C Merschmann, Luiz H

    2015-01-01

    The advent of rapid evolution on sequencing capacity of new genomes has evidenced the need for data analysis automation aiming at speeding up the genomic annotation process and reducing its cost. Given that one important step for functional genomic annotation is the promoter identification, several studies have been taken in order to propose computational approaches to predict promoters. Different classifiers and characteristics of the promoter sequences have been used to deal with this prediction problem. However, several works in literature have addressed the promoter prediction problem using datasets containing sequences of 250 nucleotides or more. As the sequence length defines the amount of dataset attributes, even considering a limited number of properties to characterize the sequences, datasets with a high number of attributes are generated for training classifiers. Once high-dimensional datasets can degrade the classifiers predictive performance or even require an infeasible processing time, predicting promoters by training classifiers from datasets with a reduced number of attributes, it is essential to obtain good predictive performance with low computational cost. To the best of our knowledge, there is no work in literature that verified in a systematic way the relation between the sequences length and the predictive performance of classifiers. Thus, in this work, we have evaluated the impact of sequence length variation and training dataset size (number of sequences) on the predictive performance of classifiers. We have built sixteen datasets composed of different sized sequences (ranging in length from 12 to 301 nucleotides) and evaluated them using the SVM, Random Forest and k-NN classifiers. The best predictive performances reached by SVM and Random Forest remained relatively stable for datasets composed of sequences varying in length from 301 to 41 nucleotides, while k-NN achieved its best performance for the dataset composed of 101 nucleotides. We

  9. Advances in clinical next-generation sequencing: target enrichment and sequencing technologies.

    Science.gov (United States)

    Ballester, Leomar Y; Luthra, Rajyalakshmi; Kanagal-Shamanna, Rashmi; Singh, Rajesh R

    2016-01-01

    The huge parallel sequencing capabilities of next generation sequencing technologies have made them the tools of choice to characterize genomic aberrations for research and diagnostic purposes. For clinical applications, screening the whole genome or exome is challenging owing to the large genomic area to be sequenced, associated costs, complexity of data, and lack of known clinical significance of all genes. Consequently, routine screening involves limited markers with established clinical relevance. This process, referred to as targeted genome sequencing, requires selective enrichment of the genomic areas comprising these markers via one of several primer or probe-based enrichment strategies, followed by sequencing of the enriched genomic areas. Here, the authors review current target enrichment approaches and next generation sequencing platforms, focusing on the underlying principles, capabilities, and limitations of each technology along with validation and implementation for clinical testing.

  10. Clinical evaluation of further-developed MRCP sequences in comparison with standard MRCP sequences

    International Nuclear Information System (INIS)

    Hundt, W.; Scheidler, J.; Reiser, M.; Petsch, R.

    2002-01-01

    The purpose of this study was the comparison of technically improved single-shot magnetic resonance cholangiopancreatography (MRCP) sequences with standard single-shot rapid acquisition with relaxation enhancement (RARE) and half-Fourier acquired single-shot turbo spin-echo (HASTE) sequences in evaluating the normal and abnormal biliary duct system. The bile duct system of 45 patients was prospectively investigated on a 1.5-T MRI system. The investigation was performed with RARE and HASTE MR cholangiography sequences with standard and high spatial resolutions, and with a delayed-echo half-Fourier RARE (HASTE) sequence. Findings of the improved MRCP sequences were compared with the standard MRCP sequences. The level of confidence in assessing the diagnosis was divided into five groups. The Wilcoxon signed-rank test at a level of p<0.05 was applied. In 15 patients no pathology was found. The MRCP showed stenoses of the bile duct system in 10 patients and choledocholithiasis and cholecystolithiasis in 16 patients. In 12 patients a dilatation of the bile duct system was found. Comparison of the low- and high spatial resolution sequences and the short and long TE times of the half-Fourier RARE (HASTE) sequence revealed no statistically significant differences regarding accuracy of the examination. The diagnostic confidence level in assessing normal or pathological findings for the high-resolution RARE and half-Fourier RARE (HASTE) was significantly better than for the standard sequences. For the delayed-echo half-Fourier RARE (HASTE) sequence no statistically significant difference was seen. The high-resolution RARE and half-Fourier RARE (HASTE) sequences had a higher confidence level, but there was no significant difference in diagnosis in terms of detection and assessment of pathological changes in the biliary duct system compared with standard sequences. (orig.)

  11. Clinical evaluation of further-developed MRCP sequences in comparison with standard MRCP sequences

    Energy Technology Data Exchange (ETDEWEB)

    Hundt, W.; Scheidler, J.; Reiser, M. [Department of Clinical Radiology, Klinikum Grosshadern, Ludwig-Maximilians University of Munich (Germany); Petsch, R. [Department of MRI, Siemens Medizintechnik, Erlangen (Germany)

    2002-07-01

    The purpose of this study was the comparison of technically improved single-shot magnetic resonance cholangiopancreatography (MRCP) sequences with standard single-shot rapid acquisition with relaxation enhancement (RARE) and half-Fourier acquired single-shot turbo spin-echo (HASTE) sequences in evaluating the normal and abnormal biliary duct system. The bile duct system of 45 patients was prospectively investigated on a 1.5-T MRI system. The investigation was performed with RARE and HASTE MR cholangiography sequences with standard and high spatial resolutions, and with a delayed-echo half-Fourier RARE (HASTE) sequence. Findings of the improved MRCP sequences were compared with the standard MRCP sequences. The level of confidence in assessing the diagnosis was divided into five groups. The Wilcoxon signed-rank test at a level of p<0.05 was applied. In 15 patients no pathology was found. The MRCP showed stenoses of the bile duct system in 10 patients and choledocholithiasis and cholecystolithiasis in 16 patients. In 12 patients a dilatation of the bile duct system was found. Comparison of the low- and high spatial resolution sequences and the short and long TE times of the half-Fourier RARE (HASTE) sequence revealed no statistically significant differences regarding accuracy of the examination. The diagnostic confidence level in assessing normal or pathological findings for the high-resolution RARE and half-Fourier RARE (HASTE) was significantly better than for the standard sequences. For the delayed-echo half-Fourier RARE (HASTE) sequence no statistically significant difference was seen. The high-resolution RARE and half-Fourier RARE (HASTE) sequences had a higher confidence level, but there was no significant difference in diagnosis in terms of detection and assessment of pathological changes in the biliary duct system compared with standard sequences. (orig.)

  12. Bacterial and viral identification and differentiation by amplicon sequencing on the MinION nanopore sequencer.

    Science.gov (United States)

    Kilianski, Andy; Haas, Jamie L; Corriveau, Elizabeth J; Liem, Alvin T; Willis, Kristen L; Kadavy, Dana R; Rosenzweig, C Nicole; Minot, Samuel S

    2015-01-01

    The MinION™ nanopore sequencer was recently released to a community of alpha-testers for evaluation using a variety of sequencing applications. Recent reports have tested the ability of the MinION™ to act as a whole genome sequencer and have demonstrated that nanopore sequencing has tremendous potential utility. However, the current nanopore technology still has limitations with respect to error-rate, and this is problematic when attempting to assemble whole genomes without secondary rounds of sequencing to correct errors. In this study, we tested the ability of the MinION™ nanopore sequencer to accurately identify and differentiate bacterial and viral samples via directed sequencing of characteristic genes shared broadly across a target clade. Using a 6 hour sequencing run time, sufficient data were generated to identify an E. coli sample down to the species level from 16S rDNA amplicons. Three poxviruses (cowpox, vaccinia-MVA, and vaccinia-Lister) were identified and differentiated down to the strain level, despite over 98% identity between the vaccinia strains. The ability to differentiate strains by amplicon sequencing on the MinION™ was accomplished despite an observed per-base error rate of approximately 30%. While nanopore sequencing, using the MinION™ platform from Oxford Nanopore in particular, continues to mature into a commercially available technology, practical uses are sought for the current versions of the technology. This study offers evidence of the utility of amplicon sequencing by demonstrating that the current versions of MinION™ technology can accurately identify and differentiate both viral and bacterial species present within biological samples via amplicon sequencing.

  13. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities.

    Science.gov (United States)

    Troshin, Peter V; Postis, Vincent Lg; Ashworth, Denise; Baldwin, Stephen A; McPherson, Michael J; Barton, Geoffrey J

    2011-03-07

    Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  14. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    Directory of Open Access Journals (Sweden)

    Baldwin Stephen A

    2011-03-01

    Full Text Available Abstract Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  15. Coverage statistics for sequence census methods

    Directory of Open Access Journals (Sweden)

    Evans Steven N

    2010-08-01

    Full Text Available Abstract Background We study the statistical properties of fragment coverage in genome sequencing experiments. In an extension of the classic Lander-Waterman model, we consider the effect of the length distribution of fragments. We also introduce a coding of the shape of the coverage depth function as a tree and explain how this can be used to detect regions with anomalous coverage. This modeling perspective is especially germane to current high-throughput sequencing experiments, where both sample preparation protocols and sequencing technology particulars can affect fragment length distributions. Results Under the mild assumptions that fragment start sites are Poisson distributed and successive fragment lengths are independent and identically distributed, we observe that, regardless of fragment length distribution, the fragments produced in a sequencing experiment can be viewed as resulting from a two-dimensional spatial Poisson process. We then study the successive jumps of the coverage function, and show that they can be encoded as a random tree that is approximately a Galton-Watson tree with generation-dependent geometric offspring distributions whose parameters can be computed. Conclusions We extend standard analyses of shotgun sequencing that focus on coverage statistics at individual sites, and provide a null model for detecting deviations from random coverage in high-throughput sequence census based experiments. Our approach leads to explicit determinations of the null distributions of certain test statistics, while for others it greatly simplifies the approximation of their null distributions by simulation. Our focus on fragments also leads to a new approach to visualizing sequencing data that is of independent interest.

  16. Compilation of tRNA sequences.

    Science.gov (United States)

    Sprinzl, M; Grueter, F; Spelzhaus, A; Gauss, D H

    1980-01-11

    This compilation presents in a small space the tRNA sequences so far published. The numbering of tRNAPhe from yeast is used following the rules proposed by the participants of the Cold Spring Harbor Meeting on tRNA 1978 (1,2;Fig. 1). This numbering allows comparisons with the three dimensional structure of tRNAPhe. The secondary structure of tRNAs is indicated by specific underlining. In the primary structure a nucleoside followed by a nucleoside in brackets or a modification in brackets denotes that both types of nucleosides can occupy this position. Part of a sequence in brackets designates a piece of sequence not unambiguosly analyzed. Rare nucleosides are named according to the IUPACIUB rules (for complicated rare nucleosides and their identification see Table 1); those with lengthy names are given with the prefix x and specified in the footnotes. Footnotes are numbered according to the coordinates of the corresponding nucleoside and are indicated in the sequence by an asterisk. The references are restricted to the citation of the latest publication in those cases where several papers deal with one sequence. For additional information the reader is referred either to the original literature or to other tRNA sequence compilations (3-7). Mutant tRNAs are dealt with in a compilation by J. Celis (8). The compilers would welcome any information by the readers regarding missing material or erroneous presentation. On the basis of this numbering system computer printed compilations of tRNA sequences in a linear form and in cloverleaf form are in preparation.

  17. Preliminary hazard analysis using sequence tree method

    International Nuclear Information System (INIS)

    Huang Huiwen; Shih Chunkuan; Hung Hungchih; Chen Minghuei; Yih Swu; Lin Jiinming

    2007-01-01

    A system level PHA using sequence tree method was developed to perform Safety Related digital I and C system SSA. The conventional PHA is a brainstorming session among experts on various portions of the system to identify hazards through discussions. However, this conventional PHA is not a systematic technique, the analysis results strongly depend on the experts' subjective opinions. The analysis quality cannot be appropriately controlled. Thereby, this research developed a system level sequence tree based PHA, which can clarify the relationship among the major digital I and C systems. Two major phases are included in this sequence tree based technique. The first phase uses a table to analyze each event in SAR Chapter 15 for a specific safety related I and C system, such as RPS. The second phase uses sequence tree to recognize what I and C systems are involved in the event, how the safety related systems work, and how the backup systems can be activated to mitigate the consequence if the primary safety systems fail. In the sequence tree, the defense-in-depth echelons, including Control echelon, Reactor trip echelon, ESFAS echelon, and Indication and display echelon, are arranged to construct the sequence tree structure. All the related I and C systems, include digital system and the analog back-up systems are allocated in their specific echelon. By this system centric sequence tree based analysis, not only preliminary hazard can be identified systematically, the vulnerability of the nuclear power plant can also be recognized. Therefore, an effective simplified D3 evaluation can be performed as well. (author)

  18. Blind sequence-length estimation of low-SNR cyclostationary sequences

    CSIR Research Space (South Africa)

    Vlok, JD

    2014-06-01

    Full Text Available performance bound Estimation algorithm 1 takes the index k of the maximum value of the mean-square correlation sequence ρ(k) as the estimated sequence length Nest. The sequence length will therefore be estimated correctly if the peak of ρ(k) is located at k... the estimated sequence length Nest, and technique 1 can therefore only provide the correct answer as long as k = N is considered within the range of k. The positions of segments within the intercepted signal and the value of L will also influence the performance...

  19. Detection of inter-spread repeat sequence in genomic DNA sequence.

    Science.gov (United States)

    Murakami, Hiroo; Sugaya, Nobuyoshi; Sato, Makihiko; Imaizumi, Akira; Aburatani, Sachiyo; Horimoto, Katsuhisa

    2004-01-01

    Various types of periodic patterns in nucleotide sequences are known to be very abundant in a genomic DNA sequence, and to play important biological roles such as gene expression, genome structural stabilization, and recombination. We present a new method, named "STEPSTONE", to find a specific periodic pattern of repeat sequence, inter-spread repeat, in which the tandem repeats of the conserved and the not-conserved regions appear periodically. In our method, at first, the data on periods of short repeat sequences found in a target sequence are stored as a hash data, and then are selected by application of an auto-correlation test in time series analysis. Among the statistically selected sequences, the inter-spread repeats are obtained by usual alignment procedures through two steps. To test the performance of our method, we examined the inter-spread repeats in Mycobacterium tuberculosis and Zamia paucijuga genomic sequences. As a result, our method exactly detected the repeats in the two sequences, being useful for identifying systematically the inter-spread repeats in DNA sequence.

  20. Sequence Variations in the Non-Coding Sequence of CTX Phages in Vibrio cholerae.

    Science.gov (United States)

    Kim, Eun Jin; Yu, Hyun Jin; Kim, Dong Wook

    2016-08-28

    This study focused on the variations in the non-coding sequences between ctxB and rstR of various CTX phages. The non-coding sequences of CTX-1 and CTX-cla are phage type-specific. The length of the non-coding region of CTX-1 and CTX-cla is 601 and 730 nucleotides, respectively. The non-coding sequence of CTX phage could be divided into three regions. There is a phage type-specific Variable region between two homologous Common regions (Common regions 1 and 2). The non-coding sequence of RS1 element is similar to CTX-1 except that Common region 1 is replaced by a short RS1-specific sequence. The non-coding sequences of CTX-2 and CTX-cla are homologous, indicating the non-coding sequence of CTX-2 is derived from CTX-cla. The non-coding region of CTX-O139 is similar to CTX-cla and CTX-2; however, it contains an extra phage type-specific sequence between Common region 2 and rstR. The variations in the non-coding sequences of CTX phages might be associated with the difference in the replication efficiency and the directionality in the integration into the V. cholerae chromosome.

  1. HPMV: human protein mutation viewer - relating sequence mutations to protein sequence architecture and function changes.

    Science.gov (United States)

    Sherman, Westley Arthur; Kuchibhatla, Durga Bhavani; Limviphuvadh, Vachiranee; Maurer-Stroh, Sebastian; Eisenhaber, Birgit; Eisenhaber, Frank

    2015-10-01

    Next-generation sequencing advances are rapidly expanding the number of human mutations to be analyzed for causative roles in genetic disorders. Our Human Protein Mutation Viewer (HPMV) is intended to explore the biomolecular mechanistic significance of non-synonymous human mutations in protein-coding genomic regions. The tool helps to assess whether protein mutations affect the occurrence of sequence-architectural features (globular domains, targeting signals, post-translational modification sites, etc.). As input, HPMV accepts protein mutations - as UniProt accessions with mutations (e.g. HGVS nomenclature), genome coordinates, or FASTA sequences. As output, HPMV provides an interactive cartoon showing the mutations in relation to elements of the sequence architecture. A large variety of protein sequence architectural features were selected for their particular relevance to mutation interpretation. Clicking a sequence feature in the cartoon expands a tree view of additional information including multiple sequence alignments of conserved domains and a simple 3D viewer mapping the mutation to known PDB structures, if available. The cartoon is also correlated with a multiple sequence alignment of similar sequences from other organisms. In cases where a mutation is likely to have a straightforward interpretation (e.g. a point mutation disrupting a well-understood targeting signal), this interpretation is suggested. The interactive cartoon can be downloaded as standalone viewer in Java jar format to be saved and viewed later with only a standard Java runtime environment. The HPMV website is: http://hpmv.bii.a-star.edu.sg/ .

  2. Sequencing of BAC pools by different next generation sequencing platforms and strategies

    Directory of Open Access Journals (Sweden)

    Scholz Uwe

    2011-10-01

    Full Text Available Abstract Background Next generation sequencing of BACs is a viable option for deciphering the sequence of even large and highly repetitive genomes. In order to optimize this strategy, we examined the influence of read length on the quality of Roche/454 sequence assemblies, to what extent Illumina/Solexa mate pairs (MPs improve the assemblies by scaffolding and whether barcoding of BACs is dispensable. Results Sequencing four BACs with both FLX and Titanium technologies revealed similar sequencing accuracy, but showed that the longer Titanium reads produce considerably less misassemblies and gaps. The 454 assemblies of 96 barcoded BACs were improved by scaffolding 79% of the total contig length with MPs from a non-barcoded library. Assembly of the unmasked 454 sequences without separation by barcodes revealed chimeric contig formation to be a major problem, encompassing 47% of the total contig length. Masking the sequences reduced this fraction to 24%. Conclusion Optimal BAC pool sequencing should be based on the longest available reads, with barcoding essential for a comprehensive assessment of both repetitive and non-repetitive sequence information. When interest is restricted to non-repetitive regions and repeats are masked prior to assembly, barcoding is non-essential. In any case, the assemblies can be improved considerably by scaffolding with non-barcoded BAC pool MPs.

  3. A survey of sequence alignment algorithms for next-generation sequencing.

    Science.gov (United States)

    Li, Heng; Homer, Nils

    2010-09-01

    Rapidly evolving sequencing technologies produce data on an unparalleled scale. A central challenge to the analysis of this data is sequence alignment, whereby sequence reads must be compared to a reference. A wide variety of alignment algorithms and software have been subsequently developed over the past two years. In this article, we will systematically review the current development of these algorithms and introduce their practical applications on different types of experimental data. We come to the conclusion that short-read alignment is no longer the bottleneck of data analyses. We also consider future development of alignment algorithms with respect to emerging long sequence reads and the prospect of cloud computing.

  4. Fibonacci-like sequences and generalized Pascal's triangles

    Science.gov (United States)

    Vincenzi, G.; Siani, S.

    2014-05-01

    The properties pertaining to diagonals of generalized Pascal's triangles are studied. Combinatorial relationships between Fibonacci-like sequences and Fibonacci sequence itself are determined, using the sequence of diagonals of generalized Pascal's triangle.

  5. Axioms for behavioural congruence of single-pass instruction sequences

    NARCIS (Netherlands)

    Bergstra, J.A.; Middelburg, C.A.

    2017-01-01

    In program algebra, an algebraic theory of single-pass instruction sequences, three congruences on instruction sequences are paid attention to: instruction sequence congruence, structural congruence, and behavioural congruence. Sound and complete axiom systems for the first two congruences were

  6. A new measurement of sequence conservation

    Directory of Open Access Journals (Sweden)

    Li Xiaoman

    2009-12-01

    Full Text Available Abstract Background Understanding sequence conservation is important for the study of sequence evolution and for the identification of functional regions of the genome. Current studies often measure sequence conservation based on every position in contiguous regions. Therefore, a large number of functional regions that contain conserved segments separated by relatively long divergent segments are ignored. Our goal in this paper is to define a new measurement of sequence conservation such that both contiguously conserved regions and discontiguously conserved regions can be detected based on this new measurement. Here and in the following, conserved regions are those regions that share similarity higher than a pre-specified similarity threshold with their homologous regions in other species. That is, conserved regions are good candidates of functional regions and may not be always functional. Moreover, conserved regions may contain long and divergent segments. Results To identify both discontiguously and contiguously conserved regions, we proposed a new measurement of sequence conservation, which measures sequence similarity based only on the conserved segments within the regions. By defining conserved segments using the local alignment tool CHAOS, under the new measurement, we analyzed the conservation of 1642 experimentally verified human functional non-coding regions in the mouse genome. We found that the conservation in at least 11% of these functional regions could be missed by the current conservation analysis methods. We also found that 72% of the mouse homologous regions identified based on the new measurement are more similar to the human functional sequences than the aligned mouse sequences from the UCSC genome browser. We further compared BLAST and discontiguous MegaBLAST with our method. We found that our method picks up many more conserved segments than BLAST and discontiguous MegaBLAST in these regions. Conclusions It is critical to

  7. Modeling of prepregs during automated draping sequences

    Science.gov (United States)

    Krogh, Christian; Glud, Jens A.; Jakobsen, Johnny

    2017-10-01

    The behavior of wowen prepreg fabric during automated draping sequences is investigated. A drape tool under development with an arrangement of grippers facilitates the placement of a woven prepreg fabric in a mold. It is essential that the draped configuration is free from wrinkles and other defects. The present study aims at setting up a virtual draping framework capable of modeling the draping process from the initial flat fabric to the final double curved shape and aims at assisting the development of an automated drape tool. The virtual draping framework consists of a kinematic mapping algorithm used to generate target points on the mold which are used as input to a draping sequence planner. The draping sequence planner prescribes the displacement history for each gripper in the drape tool and these displacements are then applied to each gripper in a transient model of the draping sequence. The model is based on a transient finite element analysis with the material's constitutive behavior currently being approximated as linear elastic orthotropic. In-plane tensile and bias-extension tests as well as bending tests are conducted and used as input for the model. The virtual draping framework shows a good potential for obtaining a better understanding of the drape process and guide the development of the drape tool. However, results obtained from using the framework on a simple test case indicate that the generation of draping sequences is non-trivial.

  8. Getting started in mapping-by-sequencing.

    Science.gov (United States)

    Candela, Héctor; Casanova-Sáez, Rubén; Micol, José Luis

    2015-07-01

    Next-generation sequencing (NGS) technologies allow the cost-effective sequencing of whole genomes and have expanded the scope of genomics to novel applications, such as the genome-wide characterization of intraspecific polymorphisms and the rapid mapping and identification of point mutations. Next-generation sequencing platforms, such as the Illumina HiSeq2000 platform, are now commercially available at affordable prices and routinely produce an enormous amount of sequence data, but their wide use is often hindered by a lack of knowledge on how to manipulate and process the information produced. In this review, we focus on the strategies that are available to geneticists who wish to incorporate these novel approaches into their research but who are not familiar with the necessary bioinformatic concepts and computational tools. In particular, we comprehensively summarize case studies where the use of NGS technologies has led to the identification of point mutations, a strategy that has been dubbed "mapping-by-sequencing", and review examples from plants and other model species such as Caenorhabditis elegans, Saccharomyces cerevisiae, and Drosophila melanogaster. As these technologies are becoming cheaper and more powerful, their use is also expanding to allow mutation identification in species with larger genomes, such as many crop plants. © 2014 Institute of Botany, Chinese Academy of Sciences.

  9. Refined Pichia pastoris reference genome sequence

    Science.gov (United States)

    Sturmberger, Lukas; Chappell, Thomas; Geier, Martina; Krainer, Florian; Day, Kasey J.; Vide, Ursa; Trstenjak, Sara; Schiefer, Anja; Richardson, Toby; Soriaga, Leah; Darnhofer, Barbara; Birner-Gruenberger, Ruth; Glick, Benjamin S.; Tolstorukov, Ilya; Cregg, James; Madden, Knut; Glieder, Anton

    2016-01-01

    Strains of the species Komagataella phaffii are the most frequently used “Pichia pastoris” strains employed for recombinant protein production as well as studies on peroxisome biogenesis, autophagy and secretory pathway analyses. Genome sequencing of several different P. pastoris strains has provided the foundation for understanding these cellular functions in recent genomics, transcriptomics and proteomics experiments. This experimentation has identified mistakes, gaps and incorrectly annotated open reading frames in the previously published draft genome sequences. Here, a refined reference genome is presented, generated with genome and transcriptome sequencing data from multiple P. pastoris strains. Twelve major sequence gaps from 20 to 6000 base pairs were closed and 5111 out of 5256 putative open reading frames were manually curated and confirmed by RNA-seq and published LC-MS/MS data, including the addition of new open reading frames (ORFs) and a reduction in the number of spliced genes from 797 to 571. One chromosomal fragment of 76 kbp between two previous gaps on chromosome 1 and another 134 kbp fragment at the end of chromosome 4, as well as several shorter fragments needed re-orientation. In total more than 500 positions in the genome have been corrected. This reference genome is presented with new chromosomal numbering, positioning ribosomal repeats at the distal ends of the four chromosomes, and includes predicted chromosomal centromeres as well as the sequence of two linear cytoplasmic plasmids of 13.1 and 9.5 kbp found in some strains of P. pastoris. PMID:27084056

  10. Harnessing Whole Genome Sequencing in Medical Mycology.

    Science.gov (United States)

    Cuomo, Christina A

    2017-01-01

    Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.

  11. Strobe sequence design for haplotype assembly

    Science.gov (United States)

    2011-01-01

    Background Humans are diploid, carrying two copies of each chromosome, one from each parent. Separating the paternal and maternal chromosomes is an important component of genetic analyses such as determining genetic association, inferring evolutionary scenarios, computing recombination rates, and detecting cis-regulatory events. As the pair of chromosomes are mostly identical to each other, linking together of alleles at heterozygous sites is sufficient to phase, or separate the two chromosomes. In Haplotype Assembly, the linking is done by sequenced fragments that overlap two heterozygous sites. While there has been a lot of research on correcting errors to achieve accurate haplotypes via assembly, relatively little work has been done on designing sequencing experiments to get long haplotypes. Here, we describe the different design parameters that can be adjusted with next generation and upcoming sequencing technologies, and study the impact of design choice on the length of the haplotype. Results We show that a number of parameters influence haplotype length, with the most significant one being the advance length (distance between two fragments of a clone). Given technologies like strobe sequencing that allow for large variations in advance lengths, we design and implement a simulated annealing algorithm to sample a large space of distributions over advance-lengths. Extensive simulations on individual genomic sequences suggest that a non-trivial distribution over advance lengths results a 1-2 order of magnitude improvement in median haplotype length. Conclusions Our results suggest that haplotyping of large, biologically important genomic regions is feasible with current technologies. PMID:21342554

  12. Extended sequence diagram for human system interaction

    International Nuclear Information System (INIS)

    Hwang, Jong Rok; Choi, Sun Woo; Ko, Hee Ran; Kim, Jong Hyun

    2012-01-01

    Unified Modeling Language (UML) is a modeling language in the field of object oriented software engineering. The sequence diagram is a kind of interaction diagram that shows how processes operate with one another and in what order. It is a construct of a message sequence chart. It depicts the objects and classes involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario. This paper proposes the Extended Sequence Diagram (ESD), which is capable of depicting human system interaction for nuclear power plants, as well as cognitive process of operators analysis. In the conventional sequence diagram, there is a limit to only identify the activities of human and systems interactions. The ESD is extended to describe operators' cognitive process in more detail. The ESD is expected to be used as a task analysis method for describing human system interaction. The ESD can also present key steps causing abnormal operations or failures and diverse human errors based on cognitive condition

  13. Predicting DNA hybridization kinetics from sequence

    Science.gov (United States)

    Zhang, Jinny X.; Fang, John Z.; Duan, Wei; Wu, Lucia R.; Zhang, Angela W.; Dalchau, Neil; Yordanov, Boyan; Petersen, Rasmus; Phillips, Andrew; Zhang, David Yu

    2018-01-01

    Hybridization is a key molecular process in biology and biotechnology, but so far there is no predictive model for accurately determining hybridization rate constants based on sequence information. Here, we report a weighted neighbour voting (WNV) prediction algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants. To construct this algorithm we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (36 nt sub-sequences of the CYCS and VEGF genes) at temperatures ranging from 28 to 55 °C. Automated feature selection and weighting optimization resulted in a final six-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 3 with ∼91% accuracy, based on leave-one-out cross-validation. Accurate prediction of hybridization kinetics allows the design of efficient probe sequences for genomics research.

  14. Entropy estimation of very short symbolic sequences

    Science.gov (United States)

    Lesne, Annick; Blanc, Jean-Luc; Pezard, Laurent

    2009-04-01

    While entropy per unit time is a meaningful index to quantify the dynamic features of experimental time series, its estimation is often hampered in practice by the finite length of the data. We here investigate the performance of entropy estimation procedures, relying either on block entropies or Lempel-Ziv complexity, when only very short symbolic sequences are available. Heuristic analytical arguments point at the influence of temporal correlations on the bias and statistical fluctuations, and put forward a reduced effective sequence length suitable for error estimation. Numerical studies are conducted using, as benchmarks, the wealth of different dynamic regimes generated by the family of logistic maps and stochastic evolutions generated by a Markov chain of tunable correlation time. Practical guidelines and validity criteria are proposed. For instance, block entropy leads to a dramatic overestimation for sequences of low entropy, whereas it outperforms Lempel-Ziv complexity at high entropy. As a general result, the quality of entropy estimation is sensitive to the sequence temporal correlation hence self-consistently depends on the entropy value itself, thus promoting a two-step procedure. Lempel-Ziv complexity is to be preferred in the first step and remains the best estimator for highly correlated sequences.

  15. Sequence analysis by iterated maps, a review.

    Science.gov (United States)

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  16. Progressive multiple sequence alignments from triplets

    Directory of Open Access Journals (Sweden)

    Stadler Peter F

    2007-07-01

    Full Text Available Abstract Background The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors. Research Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the "once a gap, always a gap" problem of progressive alignment procedures. Conclusion The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mismatch scores.

  17. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  18. Human Genome Sequencing in Health and Disease

    Science.gov (United States)

    Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

    2013-01-01

    Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

  19. Metal resistance sequences and transgenic plants

    Science.gov (United States)

    Meagher, Richard Brian; Summers, Anne O.; Rugh, Clayton L.

    1999-10-12

    The present invention provides nucleic acid sequences encoding a metal ion resistance protein, which are expressible in plant cells. The metal resistance protein provides for the enzymatic reduction of metal ions including but not limited to divalent Cu, divalent mercury, trivalent gold, divalent cadmium, lead ions and monovalent silver ions. Transgenic plants which express these coding sequences exhibit increased resistance to metal ions in the environment as compared with plants which have not been so genetically modified. Transgenic plants with improved resistance to organometals including alkylmercury compounds, among others, are provided by the further inclusion of plant-expressible organometal lyase coding sequences, as specifically exemplified by the plant-expressible merB coding sequence. Furthermore, these transgenic plants which have been genetically modified to express the metal resistance coding sequences of the present invention can participate in the bioremediation of metal contamination via the enzymatic reduction of metal ions. Transgenic plants resistant to organometals can further mediate remediation of organic metal compounds, for example, alkylmetal compounds including but not limited to methyl mercury, methyl lead compounds, methyl cadmium and methyl arsenic compounds, in the environment by causing the freeing of mercuric or other metal ions and the reduction of the ionic mercury or other metal ions to the less toxic elemental mercury or other metals.

  20. End Sequencing and Finger Printing of Human & Mouse BAC Libraries

    Energy Technology Data Exchange (ETDEWEB)

    Fraser, C

    2005-09-27

    This project provided for continued end sequencing of existing and new BAC libraries constructed to support human sequencing as well as to initiate BAC end sequencing from the mouse BAC libraries constructed to support mouse sequencing. The clones, the sequences, and the fingerprints are now an available resource for the community at large. Research and development of new metaodologies for BAC end sequencing have reduced costs and increase throughput.

  1. The first genome sequences of human bocaviruses from Vietnam.

    Science.gov (United States)

    Thanh, Tran Tan; Van, Hoang Minh Tu; Hong, Nguyen Thi Thu; Nhu, Le Nguyen Truc; Anh, Nguyen To; Tuan, Ha Manh; Hien, Ho Van; Tuong, Nguyen Manh; Kien, Trinh Trung; Khanh, Truong Huu; Nhan, Le Nguyen Thanh; Hung, Nguyen Thanh; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H Rogier; Tan, Le Van

    2016-01-01

    As part of an ongoing effort to generate complete genome sequences of hand, foot and mouth disease-causing enteroviruses directly from clinical specimens, two complete coding sequences and two partial genomic sequences of human bocavirus 1 (n=3) and 2 (n=1) were co-amplified and sequenced, representing the first genome sequences of human bocaviruses from Vietnam. The sequences may aid future study aiming at understanding the evolution of the pathogen.

  2. Robust inference of population structure from next-generation sequencing data with systematic differences in sequencing.

    Science.gov (United States)

    Liao, Peizhou; Satten, Glen A; Hu, Yi-Juan

    2018-04-01

    Inferring population structure is important for both population genetics and genetic epidemiology. Principal components analysis (PCA) has been effective in ascertaining population structure with array genotype data but can be difficult to use with sequencing data, especially when low depth leads to uncertainty in called genotypes. Because PCA is sensitive to differences in variability, PCA using sequencing data can result in components that correspond to differences in sequencing quality (read depth and error rate), rather than differences in population structure. We demonstrate that even existing methods for PCA specifically designed for sequencing data can still yield biased conclusions when used with data having sequencing properties that are systematically different across different groups of samples (i.e. sequencing groups). This situation can arise in population genetics when combining sequencing data from different studies, or in genetic epidemiology when using historical controls such as samples from the 1000 Genomes Project. To allow inference on population structure using PCA in these situations, we provide an approach that is based on using sequencing reads directly without calling genotypes. Our approach is to adjust the data from different sequencing groups to have the same read depth and error rate so that PCA does not generate spurious components representing sequencing quality. To accomplish this, we have developed a subsampling procedure to match the depth distributions in different sequencing groups, and a read-flipping procedure to match the error rates. We average over subsamples and read flips to minimize loss of information. We demonstrate the utility of our approach using two datasets from 1000 Genomes, and further evaluate it using simulation studies. TASER-PC software is publicly available at http://web1.sph.emory.edu/users/yhu30/software.html. yijuan.hu@emory.edu. Supplementary data are available at Bioinformatics online.

  3. The international nucleotide sequence database collaboration.

    Science.gov (United States)

    Karsch-Mizrachi, Ilene; Takagi, Toshihisa; Cochrane, Guy

    2018-01-04

    For more than 30 years, the International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) has been committed to capturing, preserving and providing access to comprehensive public domain nucleotide sequence and associated metadata which enables discovery in biomedicine, biodiversity and biological sciences. Since 1987, the DNA Data Bank of Japan (DDBJ) at the National Institute for Genetics in Mishima, Japan; the European Nucleotide Archive (ENA) at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK; and GenBank at National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health in Bethesda, Maryland, USA have worked collaboratively to enable access to nucleotide sequence data in standardized formats for the worldwide scientific community. In this article, we reiterate the principles of the INSDC collaboration and briefly summarize the trends of the archival content. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.

  4. The double main sequence of Omega Centauri

    Science.gov (United States)

    Bedin, L. R.; Piotto, G.; Anderson, J.; King, I. R.; Cassisi, S.; Momany, Y.

    Recent, high precision photometry of Omega Centauri, the biggest Galactic globular cluster, has been obtained with Hubble Space Telescope (HST). The color magnitude diagram reveals an unexpected bifurcation of colors in the main sequence (MS). The newly found double MS, the multiple turnoffs and subgiant branches, and other sequences discovered in the past along the red giant branch of this cluster add up to a fascinating but frustrating puzzle. Among the possible explanations for the blue main sequence an anomalous overabundance of helium is suggested. The hypothesis will be tested with a set of FLAMES@VLT data we have recently obtained (ESO DDT program), and with forthcoming ACS@HST images. Based on observations with the NASA/ESA Hubble Space Telescope, obtained at the Space Telescope Science Institute, which is operated by AURA, Inc., under NASA contract NAS 5-26555.

  5. The DNA sequence of equine herpesvirus-1.

    Science.gov (United States)

    Telford, E A; Watson, M S; McBride, K; Davison, A J

    1992-07-01

    The complete DNA sequence was determined of a pathogenic British isolate of equine herpesvirus-1, a respiratory virus which can cause abortion and neurological disease. The genome is 150,223 bp in size, has a base composition of 56.7% G + C, and contains 80 open reading frames likely to encode protein. Since four open reading frames are duplicated in the major inverted repeat, two are probably expressed as a spliced mRNA, and one may contain an internal transcriptional promoter, the genome is considered to contain 76 distinct genes. The genes are arranged collinearly with those in the genomes of the two previously sequenced alphaherpesviruses, varicella-zoster virus, and herpes simplex virus type-1, and comparisons of predicted amino acid sequences allowed the functions of many equine herpesvirus 1 proteins to be assigned.

  6. Sequence Classification Using Third-Order Moments

    DEFF Research Database (Denmark)

    Troelsgaard, Rasmus; Hansen, Lars Kai

    2017-01-01

    . The proposed method provides lower computational complexity at classification time than the usual likelihood-based methods. In order to demonstrate the properties of the proposed method, we perform classification of both simulated data and empirical data from a human activity recognition study.......Model-based classification of sequence data using a set of hidden Markov models is a well-known technique. The involved score function, which is often based on the class-conditional likelihood, can, however, be computationally demanding, especially for long data sequences. Inspired by recent...... theoretical advances in spectral learning of hidden Markov models, we propose a score function based on third-order moments. In particular, we propose to use the Kullback-Leibler divergence between theoretical and empirical third-order moments for classification of sequence data with discrete observations...

  7. Protein Sequencing with Tandem Mass Spectrometry

    Science.gov (United States)

    Ziady, Assem G.; Kinter, Michael

    The recent introduction of electrospray ionization techniques that are suitable for peptides and whole proteins has allowed for the design of mass spectrometric protocols that provide accurate sequence information for proteins. The advantages gained by these approaches over traditional Edman Degradation sequencing include faster analysis and femtomole, sometimes attomole, sensitivity. The ability to efficiently identify proteins has allowed investigators to conduct studies on their differential expression or modification in response to various treatments or disease states. In this chapter, we discuss the use of electrospray tandem mass spectrometry, a technique whereby protein-derived peptides are subjected to fragmentation in the gas phase, revealing sequence information for the protein. This powerful technique has been instrumental for the study of proteins and markers associated with various disorders, including heart disease, cancer, and cystic fibrosis. We use the study of protein expression in cystic fibrosis as an example.

  8. 10KP: A phylodiverse genome sequencing plan

    Science.gov (United States)

    Cheng, Shifeng; Melkonian, Michael; Brockington, Samuel; Archibald, John M; Delaux, Pierre-Marc; Melkonian, Barbara; Mavrodiev, Evgeny V; Sun, Wenjing; Fu, Yuan; Yang, Huanming; Soltis, Douglas E; Graham, Sean W; Soltis, Pamela S; Liu, Xin; Xu, Xun

    2018-01-01

    Abstract Understanding plant evolution and diversity in a phylogenomic context is an enormous challenge due, in part, to limited availability of genome-scale data across phylodiverse species. The 10KP (10,000 Plants) Genome Sequencing Project will sequence and characterize representative genomes from every major clade of embryophytes, green algae, and protists (excluding fungi) within the next 5 years. By implementing and continuously improving leading-edge sequencing technologies and bioinformatics tools, 10KP will catalogue the genome content of plant and protist diversity and make these data freely available as an enduring foundation for future scientific discoveries and applications. 10KP is structured as an international consortium, open to the global community, including botanical gardens, plant research institutes, universities, and private industry. Our immediate goal is to establish a policy framework for this endeavor, the principles of which are outlined here. PMID:29618049

  9. Transforming clinical microbiology with bacterial genome sequencing.

    Science.gov (United States)

    Didelot, Xavier; Bowden, Rory; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W

    2012-09-01

    Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.

  10. Automated constraint checking of spacecraft command sequences

    Science.gov (United States)

    Horvath, Joan C.; Alkalaj, Leon J.; Schneider, Karl M.; Spitale, Joseph M.; Le, Dang

    1995-01-01

    Robotic spacecraft are controlled by onboard sets of commands called "sequences." Determining that sequences will have the desired effect on the spacecraft can be expensive in terms of both labor and computer coding time, with different particular costs for different types of spacecraft. Specification languages and appropriate user interface to the languages can be used to make the most effective use of engineering validation time. This paper describes one specification and verification environment ("SAVE") designed for validating that command sequences have not violated any flight rules. This SAVE system was subsequently adapted for flight use on the TOPEX/Poseidon spacecraft. The relationship of this work to rule-based artificial intelligence and to other specification techniques is discussed, as well as the issues that arise in the transfer of technology from a research prototype to a full flight system.

  11. Chronodes: Interactive Multifocus Exploration of Event Sequences.

    Science.gov (United States)

    Polack, Peter J; Chen, Shang-Tse; Kahng, Minsuk; DE Barbaro, Kaya; Basole, Rahul; Sharmin, Moushumi; Chau, Duen Horng

    2018-02-01

    The advent of mobile health (mHealth) technologies challenges the capabilities of current visualizations, interactive tools, and algorithms. We present Chronodes, an interactive system that unifies data mining and human-centric visualization techniques to support explorative analysis of longitudinal mHealth data. Chronodes extracts and visualizes frequent event sequences that reveal chronological patterns across multiple participant timelines of mHealth data. It then combines novel interaction and visualization techniques to enable multifocus event sequence analysis, which allows health researchers to interactively define, explore, and compare groups of participant behaviors using event sequence combinations. Through summarizing insights gained from a pilot study with 20 behavioral and biomedical health experts, we discuss Chronodes's efficacy and potential impact in the mHealth domain. Ultimately, we outline important open challenges in mHealth, and offer recommendations and design guidelines for future research.

  12. Bias of purine stretches in sequenced chromosomes

    DEFF Research Database (Denmark)

    Ussery, David; Soumpasis, Dikeos Mario; Brunak, Søren

    2002-01-01

    We examined more than 700 DNA sequences (full length chromosomes and plasmids) for stretches of purines (R) or pyrimidines (Y) and alternating YR stretches; such regions will likely adopt structures which are different from the canonical B-form. Since one turn of the DNA helix is roughly 10 bp, we...... measured the fraction of each genome which contains purine (or pyrimidine) tracts of lengths of 10 by or longer (hereafter referred to as 'purine tracts'), as well as stretches of alternating pyrimidines/purine ('pyr/pur tracts') of the same length. Using this criteria, a random sequence would be expected...... to contain 1.0% of purine tracts and also 1.0% of the alternating pyr/pur tracts. In the vast majority of cases, there are more purine tracts than would be expected from a random sequence, with an average of 3.5%, significantly larger than the expectation value. The fraction of the chromosomes containing pyr...

  13. Bias of purine stretches in sequenced chromosomes

    DEFF Research Database (Denmark)

    Ussery, David; Soumpasis, Dikeos Mario; Brunak, Søren

    2002-01-01

    We examined more than 700 DNA sequences (full length chromosomes and plasmids) for stretches of purines (R) or pyrimidines (Y) and alternating YR stretches; such regions will likely adopt structures which are different from the canonical B-form. Since one turn of the DNA helix is roughly 10 bp, we...... to contain 1.0% of purine tracts and also 1.0% of the alternating pyr/pur tracts. In the vast majority of cases, there are more purine tracts than would be expected from a random sequence, with an average of 3.5%, significantly larger than the expectation value. The fraction of the chromosomes containing pyr......, in eukaryotes there is an abundance of long stretches of purines or alternating purine/pyrimidine tracts, which cannot be explained in this way; these sequences are likely to play an important role in eukaryotic chromosome organisation....

  14. Protein sequence database for pathogenic arenaviruses

    Science.gov (United States)

    Bui, Huynh-Hoa; Botten, Jason; Fusseder, Nicolas; Pasquetto, Valerie; Mothe, Bianca; Buchmeier, Michael J; Sette, Alessandro

    2007-01-01

    Background Arenaviruses are a family of rodent-borne viruses that cause several hemorrhagic fevers. These diseases can be devastating and are often lethal. Herein, to aid in the design and development of diagnostics, treatments and vaccines for arenavirus infections, we have developed a database containing protein sequences from the seven pathogenic arenaviruses (Junin, Guanarito, Sabia, Machupo, Whitewater Arroyo, Lassa and LCMV). Results The database currently contains a non-redundant set of 333 protein sequences which were manually annotated. All entries were linked to NCBI and cited PubMed references. The database has a convenient query interface including BLAST search. Sequence variability analyses were also performed and the results are hosted in the database. Conclusion The database is available at and can be used to aid in studies that require proteomic information from pathogenic arenaviruses. PMID:17288609

  15. Fetal Kidney Anomalies: Next Generation Sequencing

    DEFF Research Database (Denmark)

    Rasmussen, Maria; Sunde, Lone; Nielsen, Marlene Louise

    Aim and Introduction Identification of abnormal kidneys in the fetus may lead to termination of the pregnancy and raises questions about the underlying cause and recurrence risk in future pregnancies. In this study, we investigate the effectiveness of targeted next generation sequencing in fetuse...... no mutations were identified, have been selected for exome sequencing in order to uncover novel genes associated to fetal kidney anomalies.......Aim and Introduction Identification of abnormal kidneys in the fetus may lead to termination of the pregnancy and raises questions about the underlying cause and recurrence risk in future pregnancies. In this study, we investigate the effectiveness of targeted next generation sequencing in fetuses...... postmortem examination. The approximately 110 genes included in the targeted panel were chosen on the basis of their potential involvement in embryonic kidney development, cystic kidney disease, or the renin-angiotensin system. DNA was extracted from fetal tissue samples or cultured chorion villus cells...

  16. 10KP: A phylodiverse genome sequencing plan.

    Science.gov (United States)

    Cheng, Shifeng; Melkonian, Michael; Smith, Stephen A; Brockington, Samuel; Archibald, John M; Delaux, Pierre-Marc; Li, Fay-Wei; Melkonian, Barbara; Mavrodiev, Evgeny V; Sun, Wenjing; Fu, Yuan; Yang, Huanming; Soltis, Douglas E; Graham, Sean W; Soltis, Pamela S; Liu, Xin; Xu, Xun; Wong, Gane Ka-Shu

    2018-03-01

    Understanding plant evolution and diversity in a phylogenomic context is an enormous challenge due, in part, to limited availability of genome-scale data across phylodiverse species. The 10KP (10,000 Plants) Genome Sequencing Project will sequence and characterize representative genomes from every major clade of embryophytes, green algae, and protists (excluding fungi) within the next 5 years. By implementing and continuously improving leading-edge sequencing technologies and bioinformatics tools, 10KP will catalogue the genome content of plant and protist diversity and make these data freely available as an enduring foundation for future scientific discoveries and applications. 10KP is structured as an international consortium, open to the global community, including botanical gardens, plant research institutes, universities, and private industry. Our immediate goal is to establish a policy framework for this endeavor, the principles of which are outlined here.

  17. Chronodes: Interactive Multifocus Exploration of Event Sequences

    Science.gov (United States)

    POLACK, PETER J.; CHEN, SHANG-TSE; KAHNG, MINSUK; DE BARBARO, KAYA; BASOLE, RAHUL; SHARMIN, MOUSHUMI; CHAU, DUEN HORNG

    2018-01-01

    The advent of mobile health (mHealth) technologies challenges the capabilities of current visualizations, interactive tools, and algorithms. We present Chronodes, an interactive system that unifies data mining and human-centric visualization techniques to support explorative analysis of longitudinal mHealth data. Chronodes extracts and visualizes frequent event sequences that reveal chronological patterns across multiple participant timelines of mHealth data. It then combines novel interaction and visualization techniques to enable multifocus event sequence analysis, which allows health researchers to interactively define, explore, and compare groups of participant behaviors using event sequence combinations. Through summarizing insights gained from a pilot study with 20 behavioral and biomedical health experts, we discuss Chronodes’s efficacy and potential impact in the mHealth domain. Ultimately, we outline important open challenges in mHealth, and offer recommendations and design guidelines for future research. PMID:29515937

  18. Analyses of expressed sequence tags from apple.

    Science.gov (United States)

    Newcomb, Richard D; Crowhurst, Ross N; Gleave, Andrew P; Rikkerink, Erik H A; Allan, Andrew C; Beuning, Lesley L; Bowen, Judith H; Gera, Emma; Jamieson, Kim R; Janssen, Bart J; Laing, William A; McArtney, Steve; Nain, Bhawana; Ross, Gavin S; Snowden, Kimberley C; Souleyre, Edwige J F; Walton, Eric F; Yauk, Yar-Khing

    2006-05-01

    The domestic apple (Malus domestica; also known as Malus pumila Mill.) has become a model fruit crop in which to study commercial traits such as disease and pest resistance, grafting, and flavor and health compound biosynthesis. To speed the discovery of genes involved in these traits, develop markers to map genes, and breed new cultivars, we have produced a substantial expressed sequence tag collection from various tissues of apple, focusing on fruit tissues of the cultivar Royal Gala. Over 150,000 expressed sequence tags have been collected from 43 different cDNA libraries representing 34 different tissues and treatments. Clustering of these sequences results in a set of 42,938 nonredundant sequences comprising 17,460 tentative contigs and 25,478 singletons, together representing what we predict are approximately one-half the expressed genes from apple. Many potential molecular markers are abundant in the apple transcripts. Dinucleotide repeats are found in 4,018 nonredundant sequences, mainly in the 5'-untranslated region of the gene, with a bias toward one repeat type (containing AG, 88%) and against another (repeats containing CG, 0.1%). Trinucleotide repeats are most common in the predicted coding regions and do not show a similar degree of sequence bias in their representation. Bi-allelic single-nucleotide polymorphisms are highly abundant with one found, on average, every 706 bp of transcribed DNA. Predictions of the numbers of representatives from protein families indicate the presence of many genes involved in disease resistance and the biosynthesis of flavor and health-associated compounds. Comparisons of some of these gene families with Arabidopsis (Arabidopsis thaliana) suggest instances where there have been duplications in the lineages leading to apple of biosynthetic and regulatory genes that are expressed in fruit. This resource paves the way for a concerted functional genomics effort in this important temperate fruit crop.

  19. Exome Sequencing in Suspected Monogenic Dyslipidemias

    Science.gov (United States)

    Stitziel, Nathan O.; Peloso, Gina M.; Abifadel, Marianne; Cefalu, Angelo B.; Fouchier, Sigrid; Motazacker, M. Mahdi; Tada, Hayato; Larach, Daniel B.; Awan, Zuhier; Haller, Jorge F.; Pullinger, Clive R.; Varret, Mathilde; Rabès, Jean-Pierre; Noto, Davide; Tarugi, Patrizia; Kawashiri, Masa-aki; Nohara, Atsushi; Yamagishi, Masakazu; Risman, Marjorie; Deo, Rahul; Ruel, Isabelle; Shendure, Jay; Nickerson, Deborah A.; Wilson, James G.; Rich, Stephen S.; Gupta, Namrata; Farlow, Deborah N.; Neale, Benjamin M.; Daly, Mark J.; Kane, John P.; Freeman, Mason W.; Genest, Jacques; Rader, Daniel J.; Mabuchi, Hiroshi; Kastelein, John J.P.; Hovingh, G. Kees; Averna, Maurizio R.; Gabriel, Stacey; Boileau, Catherine; Kathiresan, Sekar

    2015-01-01

    Background Exome sequencing is a promising tool for gene mapping in Mendelian disorders. We utilized this technique in an attempt to identify novel genes underlying monogenic dyslipidemias. Methods and Results We performed exome sequencing on 213 selected family members from 41 kindreds with suspected Mendelian inheritance of extreme levels of low-density lipoprotein (LDL) cholesterol (after candidate gene sequencing excluded known genetic causes for high LDL cholesterol families) or high-density lipoprotein (HDL) cholesterol. We used standard analytic approaches to identify candidate variants and also assigned a polygenic score to each individual in order to account for their burden of common genetic variants known to influence lipid levels. In nine families, we identified likely pathogenic variants in known lipid genes (ABCA1, APOB, APOE, LDLR, LIPA, and PCSK9); however, we were unable to identify obvious genetic etiologies in the remaining 32 families despite follow-up analyses. We identified three factors that limited novel gene discovery: (1) imperfect sequencing coverage across the exome hid potentially causal variants; (2) large numbers of shared rare alleles within families obfuscated causal variant identification; and (3) individuals from 15% of families carried a significant burden of common lipid-related alleles, suggesting complex inheritance can masquerade as monogenic disease. Conclusions We identified the genetic basis of disease in nine of 41 families; however, none of these represented novel gene discoveries. Our results highlight the promise and limitations of exome sequencing as a discovery technique in suspected monogenic dyslipidemias. Considering the confounders identified may inform the design of future exome sequencing studies. PMID:25632026

  20. Heuristics for multiobjective multiple sequence alignment.

    Science.gov (United States)

    Abbasi, Maryam; Paquete, Luís; Pereira, Francisco B

    2016-07-15

    Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment. We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments. The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show

  1. Sequence-Based Identification of Aerobic Actinomycetes

    Science.gov (United States)

    Patel, Jean Baldus; Wallace, Richard J.; Brown-Elliott, Barbara A.; Taylor, Tony; Imperatrice, Carol; Leonard, Deborah G. B.; Wilson, Rebecca W.; Mann, Linda; Jost, Kenneth C.; Nachamkin, Irving

    2004-01-01

    We investigated the utility of 500-bp 16S rRNA gene sequencing for identifying clinically significant species of aerobic actinomycetes. A total of 28 reference strains and 71 clinical isolates that included members of the genera Streptomyces, Gordonia, and Tsukamurella and 10 taxa of Nocardia were studied. Methods of nonsequencing analyses included growth and biochemical analysis, PCR-restriction enzyme analysis of the 439-bp Telenti fragment of the 65 hsp gene, susceptibility testing, and, for selected isolates, high-performance liquid chromatography. Many of the isolates were included in prior taxonomic studies. Sequencing of Nocardia species revealed that members of the group were generally most closely related to the American Type Culture Collection (ATCC) type strains. However, the sequences of Nocardia transvalensis, N. otitidiscaviarum, and N. nova isolates were highly variable; and it is likely that each of these species contains multiple species. We propose that these three species be designated complexes until they are more taxonomically defined. The sequences of several taxa did not match any recognized species. Among other aerobic actinomycetes, each group most closely resembled the associated reference strain, but with some divergence. The study demonstrates the ability of partial 16S rRNA gene sequencing to identify members of the aerobic actinomycetes, but the study also shows that a high degree of sequence divergence exists within many species and that many taxa within the Nocardia spp. are unnamed at present. A major unresolved issue is the type strain of N. asteroides, as the present one (ATCC 19247), chosen before the availability of molecular analysis, does not represent any of the common taxa associated with clinical nocardiosis. PMID:15184431

  2. Geometric aspects of biological sequence comparison.

    Science.gov (United States)

    Stojmirović, Aleksandar; Yu, Yi-Kuo

    2009-04-01

    We introduce a geometric framework suitable for studying the relationships among biological sequences. In contrast to previous works, our formulation allows asymmetric distances (quasi-metrics), originating from uneven weighting of strings, which may induce non-trivial partial orders on sets of biosequences. The distances considered are more general than traditional generalized string edit distances. In particular, our framework enables non-trivial conversion between sequence similarities, both local and global, and distances. Our constructions apply to a wide class of scoring schemes and require much less restrictive gap penalties than the ones regularly used. Numerous examples are provided to illustrate the concepts introduced and their potential applications.

  3. Final sonorant sequences in the Celje dialect

    Directory of Open Access Journals (Sweden)

    Alja Ferme

    2006-12-01

    Full Text Available In this paper I will analyse final sonorant sequencesin the Celje variety of Slovene. In §2 various definitions of a consonant cluster will be discussed and the definition needed for further development ofthe article will be provided. In §3 I will present pretheoretical arguments against treating all final sonorant sequences as consonant clusters. In addition, a seemingly special behaviour of a small group of sequences will be pointed out. The government phonology framework will be introduced in §4. In §5 the hin the given theoretical framework.

  4. The computational linguistics of biological sequences

    Energy Technology Data Exchange (ETDEWEB)

    Searls, D. [Univ. of Pennsylvania, Philadelphia, PA (United States)

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Protein sequences are analogous in many respects, particularly their folding behavior. Proteins have a much richer variety of interactions, but in theory the same linguistic principles could come to bear in describing dependencies between distant residues that arise by virtue of three-dimensional structure. This tutorial will concentrate on nucleic acid sequences.

  5. 1994 Accident sequence precursor program results

    International Nuclear Information System (INIS)

    Belles, R.J.; Cletcher, J.W.; Copinger, D.A.

    1996-01-01

    The Accident Sequence Precursor (ASP) Program involves the systematic review and evaluation of operational events that have occurred at light-water reactors to identify and categorize precursors to potential severe core damage accident sequences. The results of the ASP Program are published in an annual report. The most recent report, which contains the analyses of the precursors for 1994, is NUREG/CR-4674, Vols. 21 and 22, Precursors to Potential Severe Core Damage Accidents: 1994, A Status Report, published in December 1995. This article provides an overview of the ASP review and evaluation process and a summary of the results for 1994. 12 refs., 2 figs., 4 tabs

  6. Scintillating optical fiber detectors for DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Bendali, M.; Mastrippolito, R.; Charon, Y.; Leblanc, M.; Tricoire, H.; Valentin, L. (Inst. de Physique Nucleaire, 91 - Orsay (France) Lab. de Physique Nucleaire, Univ. Paris 7, 75 (France)); Martin, B. (Lab. de Neurobiologie Cellulaire et Moleculaire, 91 - Gif-sur-Yvette (France))

    1991-12-01

    We have developed a two-dimensional detector (SOFI) for {sup 32}P emitting molecules used in molecular biology by combining scinitillating optical fibers (SOFs) and a multianode photomultiplier (MAPM). A good efficiency (15%) was obtained by suppressing the internal cross talk of the MAPM with a new electronic device. Using this improvement we are developing two new detectors using SOFs for DNA sequencing. We shall present the basic principle of these detectors and the results in efficiency and position accuracy obtained with the first prototypes. The advantage of these detectors over currently available DNA sequencers will be discussed. (orig.).

  7. The evolutionary sequence: origin and emergences.

    Science.gov (United States)

    Fox, S W

    1986-03-01

    The evolutionary sequence is being reexamined experimentally from a "Big Bang"origin to the protocell and from the emergence of protocell and variety of species to Darwin's mental power (mind) and society (The Descent of Man). A most fundamentally revisionary consequence of experiments is an emphasis on endogenous ordering. This principle, seen vividly in ordered copolymerization of amino acids, has had new impact on the theory of Darwinian evolution and has been found to apply to the entire sequence. Herein, I will discuss some problems of dealing with teaching controversial subjects.

  8. A Bioluminometric Method of DNA Sequencing

    Science.gov (United States)

    Ronaghi, Mostafa; Pourmand, Nader; Stolc, Viktor; Arnold, Jim (Technical Monitor)

    2001-01-01

    Pyrosequencing is a bioluminometric single-tube DNA sequencing method that takes advantage of co-operativity between four enzymes to monitor DNA synthesis. In this sequencing-by-synthesis method, a cascade of enzymatic reactions yields detectable light, which is proportional to incorporated nucleotides. Pyrosequencing has the advantages of accuracy, flexibility and parallel processing. It can be easily automated. Furthermore, the technique dispenses with the need for labeled primers, labeled nucleotides and gel-electrophoresis. In this chapter, the use of this technique for different applications is discussed.

  9. Digital image sequence processing, compression, and analysis

    CERN Document Server

    Reed, Todd R

    2004-01-01

    IntroductionTodd R. ReedCONTENT-BASED IMAGE SEQUENCE REPRESENTATIONPedro M. Q. Aguiar, Radu S. Jasinschi, José M. F. Moura, andCharnchai PluempitiwiriyawejTHE COMPUTATION OF MOTIONChristoph Stiller, Sören Kammel, Jan Horn, and Thao DangMOTION ANALYSIS AND DISPLACEMENT ESTIMATION IN THE FREQUENCY DOMAINLuca Lucchese and Guido Maria CortelazzoQUALITY OF SERVICE ASSESSMENT IN NEW GENERATION WIRELESS VIDEO COMMUNICATIONSGaetano GiuntaERROR CONCEALMENT IN DIGITAL VIDEOFrancesco G.B. De NataleIMAGE SEQUENCE RESTORATION: A WIDER PERSPECTIVEAnil KokaramVIDEO SUMMARIZATIONCuneyt M. Taskiran and Edward

  10. Bioinformatic analysis of whole genome sequencing data

    OpenAIRE

    Maqbool, Khurram

    2014-01-01

    Evolution has shaped the life forms for billion of years. Domestication is an accelerated process that can be used as a model for evolutionary changes. The aim of this thesis project has been to carry out extensive bioinformatic analyses of whole genome sequencing data to reveal SNPs, InDels and selective sweeps in the chicken, pig and dog genome. Pig genome sequencing revealed loci under selection for elongation of back and increased number of vertebrae, associated with the NR6A1, PLAG1,...

  11. Bordism, stable homotopy and adams spectral sequences

    CERN Document Server

    Kochman, Stanley O

    1996-01-01

    This book is a compilation of lecture notes that were prepared for the graduate course "Adams Spectral Sequences and Stable Homotopy Theory" given at The Fields Institute during the fall of 1995. The aim of this volume is to prepare students with a knowledge of elementary algebraic topology to study recent developments in stable homotopy theory, such as the nilpotence and periodicity theorems. Suitable as a text for an intermediate course in algebraic topology, this book provides a direct exposition of the basic concepts of bordism, characteristic classes, Adams spectral sequences, Brown-Peter

  12. On finding frequent patterns in event sequences

    DEFF Research Database (Denmark)

    Campagna, Andrea; Pagh, Rasmus

    2010-01-01

    concerning finding frequent patterns in event sequences. Our motivation comes from working with a data set of 2 million RFID readings from baggage trolleys at Copenhagen Airport. The question of finding frequent passenger movement patterns is mapped to the above problem. We report on experimental findings......Given a directed acyclic graph with labeled vertices, we consider the problem of finding the most common label sequences (``traces'') among all paths in the graph (of some maximum length $m$). Since the number of paths can be huge, we propose novel algorithms whose time complexity depends only...

  13. The evolutionary sequence: origin and emergences

    Science.gov (United States)

    Fox, S. W.

    1986-01-01

    The evolutionary sequence is being reexamined experimentally from a "Big Bang"origin to the protocell and from the emergence of protocell and variety of species to Darwin's mental power (mind) and society (The Descent of Man). A most fundamentally revisionary consequence of experiments is an emphasis on endogenous ordering. This principle, seen vividly in ordered copolymerization of amino acids, has had new impact on the theory of Darwinian evolution and has been found to apply to the entire sequence. Herein, I will discuss some problems of dealing with teaching controversial subjects.

  14. Lacunary ideal convergence of multiple sequences

    Directory of Open Access Journals (Sweden)

    Bipan Hazarika

    2016-01-01

    Full Text Available An ideal I is a family of subsets of N×N which is closed under taking finite unions and subsets of its elements. In this article, the concept of lacunary ideal convergence of double sequences has been introduced. Also the relation between lacunary ideal convergent and lacunary Cauchy double sequences has been established. Furthermore, the notions of lacunary ideal limit point and lacunary ideal cluster points have been introduced and find the relation between these two notions. Finally, we have studied the properties such as solidity, monotonic.

  15. Ancestral sequence reconstruction with Maximum Parsimony

    OpenAIRE

    Herbst, Lina; Fischer, Mareike

    2017-01-01

    One of the main aims in phylogenetics is the estimation of ancestral sequences based on present-day data like, for instance, DNA alignments. One way to estimate the data of the last common ancestor of a given set of species is to first reconstruct a phylogenetic tree with some tree inference method and then to use some method of ancestral state inference based on that tree. One of the best-known methods both for tree inference as well as for ancestral sequence inference is Maximum Parsimony (...

  16. Simulating efficiently the evolution of DNA sequences.

    Science.gov (United States)

    Schöniger, M; von Haeseler, A

    1995-02-01

    Two menu-driven FORTRAN programs are described that simulate the evolution of DNA sequences in accordance with a user-specified model. This general stochastic model allows for an arbitrary stationary nucleotide composition and any transition-transversion bias during the process of base substitution. In addition, the user may define any hypothetical model tree according to which a family of sequences evolves. The programs suggest the computationally most inexpensive approach to generate nucleotide substitutions. Either reproducible or non-repeatable simulations, depending on the method of initializing the pseudo-random number generator, can be performed. The corresponding options are offered by the interface menu.

  17. Bilateral maculopathy associated with Pierre Robin sequence.

    Science.gov (United States)

    Witmer, Matthew T; Vasan, Ryan; Levy, Richard; Davis, Jessica; Chan, R V Paul

    2012-08-01

    Pierre Robin sequence has been associated with a number of ocular complications, including myopia, strabismus, Möbius syndrome, nasolacrimal duct obstruction, glaucoma, cataract, microphthalmos, coloboma of choroid, and retinal detachment. We report a 10-day-old boy who presented with micrognathia, glossoptosis, and cleft palate as well as multiple congenital anomalies. Ophthalmic examination was notable for bilateral maculopathy, with focal areas of retinal and retinal pigment epithelial atrophy. The association of Pierre Robin sequence and maculopathy has been reported only twice previously. Copyright © 2012 American Association for Pediatric Ophthalmology and Strabismus. Published by Mosby, Inc. All rights reserved.

  18. 7th International Workshop on the Identification of Transcribed Sequences. Beyond the Identification of Transcribed Sequences

    Energy Technology Data Exchange (ETDEWEB)

    Gardner, Kathleen

    1997-11-19

    The Seventh Annual Human Genome Conference: Beyond the Identification of Transcribed Sequences (BITS) was held November 16-19, 1997 at the Asilomar Conference Center in Monterey, California. The format for the meeting was a combination of oral presentations, group discussions and poster sessions. The original workshop was held to discuss methodologies for the identification of transcribed sequences in mammalian genomes. Over the years, the focus of the workshops has gradually shifted towards functional analysis, with the most dramatic change in emphasis at this meeting, as reflected in the modest change in the workshop title. Topics presented and discussed included: (1) large scale expression and mutational analysis in yeast, C. elegans, Drosophila and zebrafish, (2) comparative mapping of zebrafish, chicken and Fugu; (3) functional analysis in mouse using promoter traps, mutational analysis of biochemical pathways, and Cre/lox constructs; (4) construction of 5 foot end and complete cDNA libraries; (5) expression analysis in mammalian organisms by array screening and differential display; (6) genome organization as determined by detailed transcriptional mapping and genomic sequence analysis; (7) analysis of genomic sequence, including gene and regulatory sequence predictions, annotation of genomic sequence, development of expression databases and verification of sequence analysis predictions; and (8) structural/functional relationships as determined by RNA secondary structure analysis and evolutionary conservation of non-coding sequences.

  19. Finding the most significant common sequence and structure motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, L.J.; Stormo, G.D.

    1997-01-01

    We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints, In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections, The first part utilizes a simplified...

  20. Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Miri eMichaeli

    2012-12-01

    Full Text Available High throughput sequencing (HTS yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner, a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier, a program for identifying legitimate and artifact insertions and/or deletions (indels. Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.

  1. Genome Sequence of Australian Indigenous Wine Yeast Torulaspora delbrueckii COFT1 Using Nanopore Sequencing.

    Science.gov (United States)

    Tondini, Federico; Jiranek, Vladimir; Grbin, Paul R; Onetto, Cristobal A

    2018-04-26

    Here, we report the first sequenced genome of an indigenous Australian wine isolate of Torulaspora delbrueckii using the Oxford Nanopore MinION and Illumina HiSeq sequencing platforms. The genome size is 9.4 Mb and contains 4,831 genes. Copyright © 2018 Tondini et al.

  2. The sequence of spacers between the consensus sequences modulates the strength of procaryotic promoters

    DEFF Research Database (Denmark)

    Jensen, Peter Ruhdal; Hammer, Karin

    1998-01-01

    A library of synthetic promoters for Lactococcus lactis was constructed, in which the known consensus sequences were kept constant while the sequences of the separating spacers were randomized. The library consists of 38 promoters which differ in strength from 0.3 relative units, and up to more t......-reactors and cell factories....

  3. Polyadenylated Sequencing Primers Enable Complete Readability of PCR Amplicons Analyzed by Dideoxynucleotide Sequencing

    Directory of Open Access Journals (Sweden)

    Martin Beránek

    2012-01-01

    Full Text Available Dideoxynucleotide DNA sequencing is one of the principal procedures in molecular biology. Loss of an initial part of nucleotides behind the 3' end of the sequencing primer limits the readability of sequenced amplicons. We present a method which extends the readability by using sequencing primers modified by polyadenylated tails attached to their 5' ends. Performing a polymerase chain reaction, we amplified eight amplicons of six human genes (AMELX, APOE, HFE, MBL2, SERPINA1 and TGFB1 ranging from 106 bp to 680 bp. Polyadenylation of the sequencing primers minimized the loss of bases in all amplicons. Complete sequences of shorter products (AMELX 106 bp, SERPINA1 121 bp, HFE 208 bp, APOE 244 bp, MBL2 317 bp were obtained. In addition, in the case of TGFB1 products (366 bp, 432 bp, and 680 bp, respectively, the lengths of sequencing readings were significantly longer if adenylated primers were used. Thus, single strand dideoxynucleotide sequencing with adenylated primers enables complete or near complete readability of short PCR amplicons.

  4. Sequence embedding for fast construction of guide trees for multiple sequence alignment

    LENUS (Irish Health Repository)

    Blackshields, Gordon

    2010-05-14

    Abstract Background The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http:\\/\\/www.clustal.org\\/mbed.tgz.

  5. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

    Science.gov (United States)

    Martin, Andrew C R

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.

  6. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences.

    Science.gov (United States)

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D; Adir, Noam

    2016-06-28

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel.

  7. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences

    Science.gov (United States)

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D.; Adir, Noam

    2016-01-01

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel. PMID:27307442

  8. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Directory of Open Access Journals (Sweden)

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  9. Artificial Intelligence Controls Tape-Recording Sequence

    Science.gov (United States)

    Schwuttke, Ursula M.; Otamura, Roy M.; Zottarelli, Lawrence J.

    1989-01-01

    Developmental expert-system computer program intended to schedule recording of large amounts of data on limited amount of magnetic tape. Schedules recording using two sets of rules. First set incorporates knowledge of locations for recording of new data. Second set incorporates knowledge about issuing commands to recorder. Designed primarily for use on Voyager Spacecraft, also applicable to planning and sequencing in industry.

  10. Output-Sensitive Pattern Extraction in Sequences

    DEFF Research Database (Denmark)

    Grossi, Roberto; Menconi, Giulia; Pisanti, Nadia

    2014-01-01

    Genomic Analysis, Plagiarism Detection, Data Mining, Intrusion Detection, Spam Fighting and Time Series Analysis are just some examples of applications where extraction of recurring patterns in sequences of objects is one of the main computational challenges. Several notions of patterns exist...

  11. Mitochondrial DNA sequence variation in Drosophilid species ...

    Indian Academy of Sciences (India)

    Here, we assessed genetic variations in three mitochondrial genes, namely, 16S rRNA, cytochrome c oxidase subunit I and cytochrome c oxidase subunit II (COI and COII) in 26 drosophilid species collected along altitudinal transect from 550 to 2700 m above mean sea level. In the present study, overall 543 sequences ...

  12. Sequencing interval situations and related games

    NARCIS (Netherlands)

    Alparslan-Gok, S.Z.; Brânzei, R.; Fragnelli, V.; Tijs, S.H.

    2013-01-01

    Uncertainty accompanies almost every situation in real world and it influences our decisions. In sequencing situations it may affect parameters used to determine an optimal order in the queue, and consequently the decision of whether (or not) to rearrange the queue by sharing the realized cost

  13. Function-Based Algorithms for Biological Sequences

    Science.gov (United States)

    Mohanty, Pragyan Sheela P.

    2015-01-01

    Two problems at two different abstraction levels of computational biology are studied. At the molecular level, efficient pattern matching algorithms in DNA sequences are presented. For gene order data, an efficient data structure is presented capable of storing all gene re-orderings in a systematic manner. A common characteristic of presented…

  14. Local Renyi entropic profiles of DNA sequences.

    Science.gov (United States)

    Vinga, Susana; Almeida, Jonas S

    2007-10-16

    In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/. The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.

  15. Local Renyi entropic profiles of DNA sequences

    Directory of Open Access Journals (Sweden)

    Vinga Susana

    2007-10-01

    Full Text Available Abstract Background In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM. Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/. Conclusion The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.

  16. Time sequence photography of Roosters Comb

    Science.gov (United States)

    The importance of understanding natural landscape changes is key in properly determining rangeland ecology. Time sequence photography allows a landscape snapshot to be documented and enables the ability to compare natural changes overtime. Photographs of Roosters Comb were taken from the same vantag...

  17. Responsibility: A Thematic Sequence of English Units.

    Science.gov (United States)

    Institute for Services to Education, Inc., Washington, DC.

    One of a series of volumes containing units on specific themes designed for use in college freshman English courses, this particular volume considers people and their responsibilities, through the use of recordings, cartoons, satire, modern and ancient drama, modern fiction, and contemporary essays. The sequence is divided into four sections.…

  18. The Ultramafites and layered Gabbro Sequences

    NARCIS (Netherlands)

    Oosterom, M.G.

    1963-01-01

    On Stjernöy, Seiland and the neighbouring peninsulas of Öksfjord and Bergsfjord ultramafic bodies of peridotite and pyroxenite with associated layered gabbro sequences occur within a complex of highly metamorphic gabbro gneisses, rocks akin to pyroxene-granulites and mafic charnockites. As is shown

  19. A new program for DNA sequence mining

    Indian Academy of Sciences (India)

    Unknown

    activity of proteins by altering their structure (Klintschar and Wiegand 2003). Expressed Sequence Tags ..... among the organisms (for instance; animal versus plant, trees versus annual crops), among the organs (for instance; ... Int. 3rd Balkan Symposium on vegetables and potatoes. Bursa, Turkey, Acta Horticulturae (in ...

  20. Mitochondrial DNA sequence variation in Hippopotamus amphibius ...

    African Journals Online (AJOL)

    Mitochondrial DNA sequence variation in Hippopotamus amphibius from Kruger National Park, Republic of South Africa. ... A test of the hypothesis that calves are more likely to share a mtDNA haplotype with an adult female in the same herd than an adult female from a different herd was not significant. Keywords: ...

  1. Genetic sequences derived from suppression subtractive ...

    African Journals Online (AJOL)

    Leaf scald disease (LSD) is caused by the Gram-negative bacterium, Xanthomonas albilineans. Genomic DNA from X. albilineans and Xanthomonas hyacinthi were analyzed by suppression subtractive hybridization (SSH) using X. albilineans as the tester from which unique sequences were sought and X. hyacinthi as the ...

  2. Applications of High Throughput Nucleotide Sequencing

    DEFF Research Database (Denmark)

    Waage, Johannes Eichler

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come...

  3. Biases in small RNA deep sequencing data.

    Science.gov (United States)

    Raabe, Carsten A; Tang, Thean-Hock; Brosius, Juergen; Rozhdestvensky, Timofey S

    2014-02-01

    High-throughput RNA sequencing (RNA-seq) is considered a powerful tool for novel gene discovery and fine-tuned transcriptional profiling. The digital nature of RNA-seq is also believed to simplify meta-analysis and to reduce background noise associated with hybridization-based approaches. The development of multiplex sequencing enables efficient and economic parallel analysis of gene expression. In addition, RNA-seq is of particular value when low RNA expression or modest changes between samples are monitored. However, recent data uncovered severe bias in the sequencing of small non-protein coding RNA (small RNA-seq or sRNA-seq), such that the expression levels of some RNAs appeared to be artificially enhanced and others diminished or even undetectable. The use of different adapters and barcodes during ligation as well as complex RNA structures and modifications drastically influence cDNA synthesis efficacies and exemplify sources of bias in deep sequencing. In addition, variable specific RNA G/C-content is associated with unequal polymerase chain reaction amplification efficiencies. Given the central importance of RNA-seq to molecular biology and personalized medicine, we review recent findings that challenge small non-protein coding RNA-seq data and suggest approaches and precautions to overcome or minimize bias.

  4. Sequence crystallization during isotherm evaporation of southern ...

    African Journals Online (AJOL)

    Southern Algerian's natural brine sampled from chott Baghdad may be a source of mineral salts with a high economic value. These salts are recoverable by simple solar evaporation. Indeed, during isothermal solar evaporation, it is possible to recover mineral salts and to determine the precipitation sequences of different ...

  5. Multiple echo multi-shot diffusion sequence.

    Science.gov (United States)

    Chabert, Steren; Galindo, César; Tejos, Cristian; Uribe, Sergio A

    2014-04-01

    To measure both transversal relaxation time (T2 ) and diffusion coefficients within a single scan using a multi-shot approach. Both measurements have drawn interest in many applications, especially in skeletal muscle studies, which have short T2 values. Multiple echo single-shot schemes have been proposed to obtain those variables simultaneously within a single scan, resulting in a reduction of the scanning time. However, one problem with those approaches is the associated long echo read-out. Consequently, the minimum achievable echo time tends to be long, limiting the application of these sequences to tissues with relatively long T2 . To address this problem, we propose to extend the multi-echo sequences using a multi-shot approach, so that to allow shorter echo times. A multi-shot dual-echo EPI sequence with diffusion gradients and echo navigators was modified to include independent diffusion gradients in any of the two echoes. The multi-shot approach allows us to drastically reduce echo times. Results showed a good agreement for the T2 and mean diffusivity measurements with gold standard sequences in phantoms and in vivo data of calf muscles from healthy volunteers. A fast and accurate method is proposed to measure T2 and diffusion coefficients simultaneously, tested in vitro and in healthy volunteers. Copyright © 2013 Wiley Periodicals, Inc.

  6. Improving sequence segmentation learning by predicting trigrams

    NARCIS (Netherlands)

    van den Bosch, A.; Daelemans, W.; Dagan, I.; Gildea, D.

    2005-01-01

    Symbolic machine-learning classifiers are known to suffer from near-sightedness when performing sequence segmentation (chunking) tasks in natural language processing: without special architectural additions they are oblivious of the decisions they made earlier when making new ones. We introduce a

  7. Realise : reconstruction of reality from image sequences

    NARCIS (Netherlands)

    Leymarie, F.; de la Fortelle, A.; Koenderink, Jan J.; Kappers, A. M L; Stavridi, M.; van Ginneken, B.; Muller, S.; Krake, S.; Faugeras, O.; Robert, L.; Gauclin, C.; Laveau, S.; Zeller, C.; Anon,

    1996-01-01

    REALISE has for principal goals to extract from sequences of images, acquired with a moving camera, information necessary for determining the 3D (CAD-like) structure of a real-life scene together with information about the radiometric signatures of surfaces bounding the extracted 3D objects (e.g.

  8. Abundance, composition and distribution of simple sequence ...

    Indian Academy of Sciences (India)

    numbers AF369029, AF332093, AF440570) were down- loaded from GenBank. The sizes of these genomes are. 292,967 bp ... determined. Keywords. shrimp; white spot syndrome virus (WSSV); simple sequence repeats (SSRs); compositional bias; genetic distance. Journal of Genetics, Vol. 86, No. 1, April 2007. 69 ...

  9. A zoo of computable binary normal sequences.

    Science.gov (United States)

    Pincus, Steve; Singer, Burton H

    2012-11-20

    Historically there has been a virtual absence of constructive methods to produce broad classes of "certifiably random" infinite sequences, despite considerable interest in this endeavor. Previously, we proved a theorem that yielded explicit algorithms to produce diverse sets of normal numbers, reasonable candidates for random sequences, given their limiting equidistribution of subblocks of all lengths. Herein, we develop this algorithmic approach much further, systematizing the normal number generation process in several ways. We construct delineated, distinct sets of normal numbers (classified by the extent to which initial segments deviate from maximal irregularity), with virtually any allowable specified rate of convergence to 0 of this deviation, encompassing arbitrarily fast and slow rates, and accommodating asymmetric behavior above or below a centered median. As a corollary, we provide an explicit construction of a normal number that satisfies the Law of the Iterated Logarithm. We also produce distinct families of "biased" normal numbers, with virtually any specified rate of convergence of the bias (to 0). This latter theory is in part motivated by the remarkable observation that the binary version of Champernowne's number, which is also normal, is biased-any initial segment has more 1s than 0s. Finally, we construct an interesting normal sequence with arbitrarily fast convergence to equidistribution of singleton blocks, yet arbitrarily slow convergence of pairs, which has profound implications both for probability theory, and for metrics to evaluate the "near-randomness" of sequences.

  10. Abundance, composition and distribution of simple sequence ...

    Indian Academy of Sciences (India)

    δ∗(W-29, W-70) = 1.25; δ∗(W-93, W-70 = 0.75)) even though they originate from different geographical regions. We can, therefore, infer that the WSSV sequences are closely related by ancestry. Table 3. Dinucleotide relative abundance in the ...

  11. n sequences in two pipefish species (Gasteroisteiformes ...

    Indian Academy of Sciences (India)

    incides with an interstitial telomeric sequence in Armenian ham- ster. Cytogenet. Cell Genet. 62, 169–171. Caputo V., Sorice M., Vitturi R., Magistrelli R. and Olmo E. 1998 Cytogenetic studies in some species of Scorpaeniformes. (Teleostei: Percomorpha). Chromosome Res. 6, 255–262. Carcupino M., Baldacci A., Mazzini ...

  12. Deciphering the RNA landscape by RNAome sequencing

    NARCIS (Netherlands)

    K.W.J. Derks (Kasper); B. Misovic (Branislav); M.C.G.N. van den hout (Mirjam); C. Kockx (Christel); C.P. Gomez (Cesar Payan); R.W.W. Brouwer (Rutger); H. Vrieling (Harry); J.H.J. Hoeijmakers (Jan); W.F.J. van IJcken (Wilfred); J. Pothof (Joris)

    2015-01-01

    textabstractCurrent RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species in an unperturbed manner. We report strand-specific RNAome sequencing that determines expression of small and large RNAs from rRNA-depleted total RNA in a

  13. Statistical Convergence of Double Sequences of Order

    Directory of Open Access Journals (Sweden)

    R. Çolak

    2013-01-01

    Full Text Available We intend to make a new approach and introduce the concepts of statistical convergence of order and strongly -Cesàro summability of order for double sequences of complex or real numbers. Also, some relations between the statistical convergence of order and strong -Cesàro summability of order are given.

  14. The amino acid sequence of hypertensin. II.

    Science.gov (United States)

    SKEGGS, L T; LENTZ, K E; KAHN, J R; SHUMWAY, N P; WOODS, K R

    1956-08-01

    The amino acid sequence of horse hypertensin II has been determined by the use of chymotrypsin, the fluorodinitrobenzene method, and stepwise phenylisothiocyanate degradation. The results indicate that the amino acids of hypertensin II are arranged in the following order: asp-arg-val-tyr-iso-hist-pro-phe.

  15. Complete sequence of the mitochondrial genome of ...

    Indian Academy of Sciences (India)

    Supplementary data: Complete sequence of the mitochondrial genome of Odontamblyopus rubicundus (Perciformes: Gobiidae): genome characterization and phylogenetic analysis. Tianxing Liu, Xiaoxiao Jin, Rixin Wang and Tianjun Xu. J. Genet. 92, 423–432. Figure 1. Gene map of O. rubicundus mitochondrial genome.

  16. Mycobacterium tuberculosis H37Ra genome sequencing

    Indian Academy of Sciences (India)

    2007-02-09

    Feb 9, 2007 ... Home; Journals; Journal of Biosciences; Volume 32; Issue 2. Commentary: The value of comparative genomics in understanding mycobacterial virulence: Mycobacterium tuberculosis H37Ra genome sequencing – a worthwhile endeavour. Deepak Sharma Jaya Sivaswami Tyagi. Volume 32 Issue 2 March ...

  17. Expression and sequence characterization of growth hormone ...

    African Journals Online (AJOL)

    ... growth hormone (bGH) which demonstrated active conformation of BbGHBP. These results demonstrate high expression and sequence characterization of BbGHBP in Nili-Ravi buffaloes and provide the basis for the assessment of BbGHBP in other breeds of buffalo. Keywords: Liver, Nili-Ravi buffalo, GHBP, MALDI-TOF ...

  18. Complex Sequencing Problems and Local Search Heuristics

    NARCIS (Netherlands)

    Brucker, P.; Hurink, Johann L.; Osman, I.H.; Kelly, J.P.

    1996-01-01

    Many problems can be formulated as complex sequencing problems. We will present problems in flexible manufacturing that have such a formulation and apply local search methods like iterative improvement, simulated annealing and tabu search to solve these problems. Computational results are reported.

  19. Hurdles in Acquiring the Number Word Sequence

    Science.gov (United States)

    Gould, Peter

    2016-01-01

    Learning the sequence of number words in English up to 30 is not a simple process. In NSW government schools taking part in "Early Action for Success," over 800 students in each of the first 3 years of school were assessed every 5 weeks over the school year to determine the highest correct oral count they could produce. Rather than…

  20. Genome shotgun sequencing and development of microsatellite ...

    African Journals Online (AJOL)

    Analysis of the gerbera genome DNA ('Raon') general library showed that sequences of (AT), (AG), (AAG) and (AAT) repeats appeared most often, whereas (AC), (AAC) and (ACC) were the least frequent. Primer pairs were designed for 80 loci. Only eight primer pairs produced reproducible polymorphic bands in the 28 ...

  1. Hidden Markov models for labeled sequences

    DEFF Research Database (Denmark)

    Krogh, Anders Stærmose

    1994-01-01

    A hidden Markov model for labeled observations, called a class HMM, is introduced and a maximum likelihood method is developed for estimating the parameters of the model. Instead of training it to model the statistics of the training sequences it is trained to optimize recognition. It resembles MMI...

  2. Supplementary data: Comparative studies on sequence ...

    Indian Academy of Sciences (India)

    Unknown

    Supplementary data: Comparative studies on sequence characteristics around translation initiation codon in four eukaryotes. Qingpo Liu and Qingzhong Xue. J. Genet. 84, 317–322. Table 1. Spearman's rank correlation coefficients of 39 base positions around the AUG codon in the four eukaryotic species studied. – 30.

  3. Simple sequence repeats in mycobacterial genomes

    Indian Academy of Sciences (India)

    Prakash

    J. Biosci. 32(1), January 2007. The list of microsatellite rich as well as poor regions in the five mycobacterial genomes. Local GC%. Repeat rich(+)/. Repeat poor(-). Total ORFs. Number of ... Simple sequence repeats in mycobacterial genomes. VATTIPALLY .... heat shock protein (grpE) (15839737), heat shock protein (dnaJ) ...

  4. Sleep Does Not Enhance Motor Sequence Learning

    Science.gov (United States)

    Rickard, Timothy C.; Cai, Denise J.; Rieth, Cory A.; Jones, Jason; Ard, M. Colin

    2008-01-01

    Improvements in motor sequence performance have been observed after a delay involving sleep. This finding has been taken as evidence for an active sleep consolidation process that enhances subsequent performance. In a review of this literature, however, the authors observed 4 aspects of data analyses and experimental design that could lead to…

  5. RESEARCH ARTICLE Full length sequencing and novel ...

    Indian Academy of Sciences (India)

    Navya

    2016-12-16

    Dec 16, 2016 ... Before attempting association analyses between this gene and/or enzyme and phenotypic traits, a study on the genetic variability within this locus is required. The aim of this work was to sequence the entire coding region of. ACACA gene in Valle del Belice sheep breed in order to identify polymorphic sites.

  6. Sequence of the Sugar Pine Megagenome

    Science.gov (United States)

    Kristian A. Stevens; Jill L. Wegrzyn; Aleksey Zimin; Daniela Puiu; Marc Crepeau; Charis Cardeno; Robin Paul; Daniel Gonzalez-Ibeas; Maxim Koriabine; Ann E. Holtz-Morris; Pedro J. Martínez-García; Uzay U. Sezen; Guillaume Marçais; Kathie Jermstad; Patrick E. McGuire; Carol A. Loopstra; John M. Davis; Andrew Eckert; Pieter de Jong; James A. Yorke; Steven L. Salzberg; David B. Neale; Charles H. Langley

    2016-01-01

    Until very recently, complete characterization of the megagenomes of conifers has remained elusive. The diploid genome of sugar pine (Pinus lambertiana Dougl.) has a highly repetitive, 31 billion bp genome. It is the largest genome sequenced and assembled to date, and the first from the subgenus Strobus, or white pines, a group...

  7. Mitochondrial DNA sequence evolution in shorebird populations

    NARCIS (Netherlands)

    Wenink, P.W.

    1994-01-01

    This thesis describes the global molecular population structure of two shorebird species, in particular of the dunlin, Calidris alpina, by means of comparative sequence analysis of the most variable part of the mitochondrial DNA (mtDNA) genome. There are several reasons

  8. Expressed sequence tags (ESTs) and single nucleotide ...

    African Journals Online (AJOL)

    Expressed Sequence Tags (ESTs) and Single Nucleotide Polymorphisms (SNPs) are providing in depth knowledge in plant biology, breeding and biotechnology. The emergence of many novel molecular marker techniques are changing and accelerating the process of producing mutations in plant molecular biology ...

  9. Genome shotgun sequencing and development of microsatellite ...

    African Journals Online (AJOL)

    ADP

    2012-04-10

    Apr 10, 2012 ... 22,527,019 bp, which consisted of 3,085 contigs and 28,249 singletons. We assembled 61,958 reads into. 3,085 contigs, of which 114 (3.70%) contained microsatellite repeats. The average G+C content was. 39.3%. Functional annotation to known sequences yielded 14.7% unigenes in the 'Raon' cultivar.

  10. SEQUENCE ANALYSIS OF MATURASE K (MATK): A ...

    African Journals Online (AJOL)

    Global Journal

    The application and utilization of sequence data has been found very informative in the characterization and phylogenetic relationship of different crops species. This study aimed to use bioinformatics tools to characterize the. matK gene in some selected legumes with special reference to pigeon pea [cajanus cajan ...

  11. Retrieval and Representation of Nucleotide Sequence of ...

    African Journals Online (AJOL)

    Nigerian Journal of Basic and Applied Science (March, 2013), 21(1): 27-32. DOI: http://dx.doi.org/10.4314/njbas.v21i1.4. ISSN 0794-5698. Retrieval and Representation of Nucleotide Sequence of Saccharomyces cerevisiae Cystathionine. Gamma-Lyase (CYS3) Gene in Five Formats. *R. A. Umar, H. Abdullahi and N. Lawal.

  12. Genome sequencing for obstetricians & gynaecologists | Kent ...

    African Journals Online (AJOL)

    The medical profession has been waiting for a decade to be invigorated by the sequencing of the human genome, arguably the greatest scientific project ever. The technology has been spectacular but the results of the project have yielded more unexpected results than definitive answers – many about the very nature of our ...

  13. Early Permian transgressive–regressive cycles: Sequence ...

    Indian Academy of Sciences (India)

    Biplab Bhattacharya

    2018-03-08

    Mar 8, 2018 ... sequence stratigraphic architecture to understand the exact paleogeographic setup of the Raniganj ... regressive cycles in the light of tectonic/basinal changes, fluctuating sea level conditions and pro- ...... allowing incursion of marine water within the basin. (Bhattacharya et al. 2016). As a result, the estu-.

  14. Using "Arabidopsis" Genetic Sequences to Teach Bioinformatics

    Science.gov (United States)

    Zhang, Xiaorong

    2009-01-01

    This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR…

  15. Regulatory sequence of cupin family gene

    Science.gov (United States)

    Hood, Elizabeth; Teoh, Thomas

    2017-07-25

    This invention is in the field of plant biology and agriculture and relates to novel seed specific promoter regions. The present invention further provide methods of producing proteins and other products of interest and methods of controlling expression of nucleic acid sequences of interest using the seed specific promoter regions.

  16. Sequencing Closterium moniliferum: Future prospects in nuclear ...

    African Journals Online (AJOL)

    Akanksha Pandey

    2012-10-06

    Oct 6, 2012 ... disorders. A remarkable feat has been achieved by the recent advancements in sequencing technologies that has revolution- ized the conceptual foundations in a wide range of scientific fields including archaeology, anthropology, genetics, molecu- lar biology, evolutionary genomics and forensic sciences.

  17. Depositional architecture and sequence stratigraphy of Pleistocene ...

    Indian Academy of Sciences (India)

    Within the deltaic sequence, transgressive and highstand systems tracts were recognized. The coarsening/shallowing upward trend observed within the sections suggests that the delta prograded rapidly in the landward portion of the canyon adjacent to the paleo-river outlet. The upper boundary of U1 is represented by a ...

  18. On finding frequent patterns in event sequences

    DEFF Research Database (Denmark)

    Campagna, Andrea; Pagh, Rasmus

    2010-01-01

    concerning finding frequent patterns in event sequences. Our motivation comes from working with a data set of 2 million RFID readings from baggage trolleys at Copenhagen Airport. The question of finding frequent passenger movement patterns is mapped to the above problem. We report on experimental findings...

  19. Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments

    Directory of Open Access Journals (Sweden)

    Bass Ellen J

    2010-03-01

    Full Text Available Abstract Background While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate. Results We compared near-optimal protein sequence alignments produced by the Zuker algorithm and a set of probabilistic alignments produced by the probA program with structural alignments produced by four different structure alignment algorithms. There is significant overlap between the solution spaces of structural alignments and both the near-optimal sequence alignments produced by commonly used scoring parameters for sequences that share significant sequence similarity (E-values -5 and the ensemble of probA alignments. We constructed a logistic regression model incorporating three input variables derived from sets of near-optimal alignments: robustness, edge frequency, and maximum bits-per-position. A ROC analysis shows that this model more accurately classifies amino acid pairs (edges in the alignment path graph according to the likelihood of appearance in structural alignments than the robustness score alone. We investigated various trimming protocols for removing incorrect edges from the optimal sequence alignment; the most effective protocol is to remove matches from the semi-global optimal alignment that are outside the boundaries of the local alignment, although trimming according to the model-generated probabilities achieves a similar level of improvement. The

  20. ϕ-statistically quasi Cauchy sequences

    Directory of Open Access Journals (Sweden)

    Bipan Hazarika

    2016-04-01

    Full Text Available Let P denote the space whose elements are finite sets of distinct positive integers. Given any element σ of P, we denote by p(σ the sequence {pn(σ} such that pn(σ=1 for n ∈ σ and pn(σ=0 otherwise. Further Ps={σ∈P:∑n=1∞pn(σ≤s}, i.e. Ps is the set of those σ whose support has cardinality at most s. Let (ϕn be a non-decreasing sequence of positive integers such that nϕn+1≤(n+1ϕn for all n∈N and the class of all sequences (ϕn is denoted by Φ. Let E⊆N. The number δϕ(E=lims→∞1ϕs|{k∈σ,σ∈Ps:k∈E}| is said to be the ϕ-density of E. A sequence (xn of points in R is ϕ-statistically convergent (or Sϕ-convergent to a real number ℓ for every ε > 0 if the set {n∈N:|xn−ℓ|≥ɛ} has ϕ-density zero. We introduce ϕ-statistically ward continuity of a real function. A real function is ϕ-statistically ward continuous if it preserves ϕ-statistically quasi Cauchy sequences where a sequence (xn is called to be ϕ-statistically quasi Cauchy (or Sϕ-quasi Cauchy when (Δxn=(xn+1−xn is ϕ-statistically convergent to 0. i.e. a sequence (xn of points in R is called ϕ-statistically quasi Cauchy (or Sϕ-quasi Cauchy for every ε > 0 if {n∈N:|xn+1−xn|≥ɛ} has ϕ-density zero. Also we introduce the concept of ϕ-statistically ward compactness and obtain results related to ϕ-statistically ward continuity, ϕ-statistically ward compactness, statistically ward continuity, ward continuity, ward compactness, ordinary compactness, uniform continuity, ordinary continuity, δ-ward continuity, and slowly oscillating continuity.

  1. Neural mechanisms of sequence generation in songbirds

    Science.gov (United States)

    Langford, Bruce

    Animal models in research are useful for studying more complex behavior. For example, motor sequence generation of actions requiring good muscle coordination such as writing with a pen, playing an instrument, or speaking, may involve the interaction of many areas in the brain, each a complex system in itself; thus it can be difficult to determine causal relationships between neural behavior and the behavior being studied. Birdsong, however, provides an excellent model behavior for motor sequence learning, memory, and generation. The song consists of learned sequences of notes that are spectrographically stereotyped over multiple renditions of the song, similar to syllables in human speech. The main areas of the songbird brain involve in singing are known, however, the mechanisms by which these systems store and produce song are not well understood. We used a custom built, head-mounted, miniature motorized microdrive to chronically record the neural firing patterns of identified neurons in HVC, a pre-motor cortical nucleus which has been shown to be important in song timing. These were done in Bengalese finch which generate a song made up of stereotyped notes but variable note sequences. We observed song related bursting in neurons projecting to Area X, a homologue to basal ganglia, and tonic firing in HVC interneurons. Interneuron had firing rate patterns that were consistent over multiple renditions of the same note sequence. We also designed and built a light-weight, low-powered wireless programmable neural stimulator using Bluetooth Low Energy Protocol. It was able to generate perturbations in the song when current pulses were administered to RA, which projects to the brainstem nucleus responsible for syringeal muscle control.

  2. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

    Science.gov (United States)

    Schirmer, Melanie; Ijaz, Umer Z; D'Amore, Rosalinda; Hall, Neil; Sloan, William T; Quince, Christopher

    2015-03-31

    With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. An automated annotation tool for genomic DNA sequences using ...

    Indian Academy of Sciences (India)

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...

  4. The nucleotide sequences of two leghemoglobin genes from soybean

    DEFF Research Database (Denmark)

    Wiborg, O; Hyldig-Nielsen, J J; Jensen, E O

    1982-01-01

    We present the complete nucleotide sequences of two leghemoglobin genes isolated from soybean DNA. Both genes contain three intervening sequences in identical positions. Comparison of the coding sequences with known amino-acid sequences of soybean leghemoglobins suggest that the two genes...

  5. Nonlinear deterministic structures and the randomness of protein sequences

    CERN Document Server

    Huang Yan Zhao

    2003-01-01

    To clarify the randomness of protein sequences, we make a detailed analysis of a set of typical protein sequences representing each structural classes by using nonlinear prediction method. No deterministic structures are found in these protein sequences and this implies that they behave as random sequences. We also give an explanation to the controversial results obtained in previous investigations.

  6. The On-Line Encyclopedia of Integer Sequences

    OpenAIRE

    Sloane, N. J. A.

    2003-01-01

    This article gives a brief introduction to the On-Line Encyclopedia of Integer Sequences (or OEIS). The OEIS is a database of nearly 90,000 sequences of integers, arranged lexicographically. The entry for a sequence lists the initial terms (50 to 100, if available), a description, formulae, programs to generate the sequence, references, links to relevant web pages, and other information.

  7. Sequence comparison and phylogenetic analysis of core gene of ...

    African Journals Online (AJOL)

    STORAGESEVER

    2010-07-19

    Jul 19, 2010 ... sequences from Japan are grouped into same cluster in the phylogenetic tree. Sequence comparison and phylogenetic ..... Tree was generated by Neighbor joining algorithm. Boot strap values are shown ... Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence ...

  8. The Polytopic-k-Step Fibonacci Sequences in Finite Groups

    Directory of Open Access Journals (Sweden)

    Ömür Deveci

    2011-01-01

    Full Text Available We study the polytopic-k-step Fibonacci sequences, the polytopic-k-step Fibonacci sequences modulo m, and the polytopic-k-step Fibonacci sequences in finite groups. Also, we examine the periods of the polytopic-k-step Fibonacci sequences in semidihedral group SD2m.

  9. An automated annotation tool for genomic DNA sequences using

    Indian Academy of Sciences (India)

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...

  10. Full-length sequencing and identification of novel polymorphisms in ...

    Indian Academy of Sciences (India)

    The aim of this work was to sequence the entirecoding region of ACACA gene in Valle del Belice sheep breed to identify polymorphic sites. A total of 51 coding exons of ACACA gene were sequenced in 32 individuals of Valle del Belice sheep breed. Sequencing analysis and alignment of obtained sequences showed the ...

  11. Noncoding sequence classification based on wavelet transform analysis: part I

    Science.gov (United States)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  12. Finding the most significant common sequence and structure motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, L.J.; Stormo, G.D.

    1997-01-01

    We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints, In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections, The first part utilizes a simplified......, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. The choice of scoring system and the method of progressively constructing the final solution are important considerations that are discussed, Example solutions, and comparisons with other...

  13. DNA sequencing using polymerase substrate-binding kinetics.

    Science.gov (United States)

    Previte, Michael John Robert; Zhou, Chunhong; Kellinger, Matthew; Pantoja, Rigo; Chen, Cheng-Yao; Shi, Jin; Wang, BeiBei; Kia, Amirali; Etchin, Sergey; Vieceli, John; Nikoomanzar, Ali; Bomati, Erin; Gloeckner, Christian; Ronaghi, Mostafa; He, Molly Min

    2015-01-23

    Next-generation sequencing (NGS) has transformed genomic research by decreasing the cost of sequencing. However, whole-genome sequencing is still costly and complex for diagnostics purposes. In the clinical space, targeted sequencing has the advantage of allowing researchers to focus on specific genes of interest. Routine clinical use of targeted NGS mandates inexpensive instruments, fast turnaround time and an integrated and robust workflow. Here we demonstrate a version of the Sequencing by Synthesis (SBS) chemistry that potentially can become a preferred targeted sequencing method in the clinical space. This sequencing chemistry uses natural nucleotides and is based on real-time recording of the differential polymerase/DNA-binding kinetics in the presence of correct or mismatch nucleotides. This ensemble SBS chemistry has been implemented on an existing Illumina sequencing platform with integrated cluster amplification. We discuss the advantages of this sequencing chemistry for targeted sequencing as well as its limitations for other applications.

  14. Parallel progressive multiple sequence alignment on reconfigurable meshes

    OpenAIRE

    Nguyen, Ken D; Pan, Yi; Nong, Ge

    2011-01-01

    Abstract Background One of the most fundamental and challenging tasks in bio-informatics is to identify related sequences and their hidden biological significance. The most popular and proven best practice method to accomplish this task is aligning multiple sequences together. However, multiple sequence alignment is a computing extensive task. In addition, the advancement in DNA/RNA and Protein sequencing techniques has created a vast amount of sequences to be analyzed that exceeding the capa...

  15. Efficient computational methods for sequence analysis of small RNAs

    OpenAIRE

    Cozen, Gozde

    2007-01-01

    With the discovery of small regulatory RNAs, there has been a tremendous increase in the number of RNA sequencing projects. Meanwhile, novel high-throughput sequencing technologies, which can sequence as much as 500000 small RNA sequences in one run, have emerged. The challenge of processing this rapidly growing data can be addressed by optimizing current analysis approaches for small RNA sequences. We present fast register-level methods for small RNA pairwise alignment and small RNA to genom...

  16. Nanopore Technology: A Simple, Inexpensive, Futuristic Technology for DNA Sequencing.

    Science.gov (United States)

    Gupta, P D

    2016-10-01

    In health care, importance of DNA sequencing has been fully established. Sanger's Capillary Electrophoresis DNA sequencing methodology is time consuming, cumbersome, hence become more expensive. Lately, because of its versatility DNA sequencing became house hold name, and therefore, there is an urgent need of simple, fast, inexpensive, DNA sequencing technology. In the beginning of this century efforts were made, and Nanopore DNA sequencing technology was developed; still it is infancy, nevertheless, it is the futuristic technology.

  17. The SWISS-PROT protein sequence data bank: current status.

    OpenAIRE

    Bairoch, A; Boeckmann, B

    1994-01-01

    SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively, since 1988, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library. The SWISS-PROT protein sequence data bank consist of sequence entries. Sequence entries are composed of different lines types, each with their own format. For standardization purposes the format of SWISS-PROT follows as closely as possible that of the EMBL Nucleotide Sequence Databa...

  18. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing

    DEFF Research Database (Denmark)

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P

    2007-01-01

    the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. CONCLUSIONS: We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences...

  19. Sequencing skippy: the genome sequence of an Australian kangaroo, Macropus eugenii

    Science.gov (United States)

    2011-01-01

    Sequencing of the tammar wallaby (Macropus eugenii) reveals insights into genome evolution, and mammalian reproduction and development. See research article: http://genomebiology.com/2011/12/8/R81 PMID:21861852

  20. Targeted Gene Sequencing and Whole-Exome Sequencing in Autopsied Fetuses with Prenatally Diagnosed Kidney Anomalies

    DEFF Research Database (Denmark)

    Rasmussen, M; Sunde, L; Nielsen, M L

    2018-01-01

    Identification of fetal kidney anomalies invites questions about underlying causes and recurrence risk in future pregnancies. We therefore investigated the diagnostic yield of next-generation sequencing in fetuses with bilateral kidney anomalies and the correlation between disrupted genes and fetal...... phenotypes. Fetuses with bilateral kidney anomalies were screened using an in-house-designed kidney-gene panel. In families where candidate variants were not identified, whole-exome sequencing was performed. Genes uncovered by this analysis were added to our kidney-panel. We identified likely deleterious...... of nephronophthisis. Exome sequencing identified ROBO1 variants in one family and a GREB1L variant in another family. GREB1L and ROBO1 were added to our kidney-gene panel and additional variants were identified. Next-generation sequencing substantially contributes to identifying causes of fetal kidney anomalies...

  1. A next-generation sequencing method for genotyping-by-sequencing of highly heterozygous autotetraploid potato.

    Directory of Open Access Journals (Sweden)

    Jan G A M L Uitdewilligen

    Full Text Available Assessment of genomic DNA sequence variation and genotype calling in autotetraploids implies the ability to distinguish among five possible alternative allele copy number states. This study demonstrates the accuracy of genotyping-by-sequencing (GBS of a large collection of autotetraploid potato cultivars using next-generation sequencing. It is still costly to reach sufficient read depths on a genome wide scale, across the cultivated gene pool. Therefore, we enriched cultivar-specific DNA sequencing libraries using an in-solution hybridisation method (SureSelect. This complexity reduction allowed to confine our study to 807 target genes distributed across the genomes of 83 tetraploid cultivars and one reference (DM 1-3 511. Indexed sequencing libraries were paired-end sequenced in 7 pools of 12 samples using Illumina HiSeq2000. After filtering and processing the raw sequence data, 12.4 Gigabases of high-quality sequence data was obtained, which mapped to 2.1 Mb of the potato reference genome, with a median average read depth of 63× per cultivar. We detected 129,156 sequence variants and genotyped the allele copy number of each variant for every cultivar. In this cultivar panel a variant density of 1 SNP/24 bp in exons and 1 SNP/15 bp in introns was obtained. The average minor allele frequency (MAF of a variant was 0.14. Potato germplasm displayed a large number of relatively rare variants and/or haplotypes, with 61% of the variants having a MAF below 0.05. A very high average nucleotide diversity (π = 0.0107 was observed. Nucleotide diversity varied among potato chromosomes. Several genes under selection were identified. Genotyping-by-sequencing results, with allele copy number estimates, were validated with a KASP genotyping assay. This validation showed that read depths of ∼60-80× can be used as a lower boundary for reliable assessment of allele copy number of sequence variants in autotetraploids. Genotypic data were associated with

  2. SAMBA: hardware accelerator for biological sequence comparison.

    Science.gov (United States)

    Guerdoux-Jamet, P; Lavenier, D

    1997-12-01

    SAMBA (Systolic Accelerator for Molecular Biological Applications) is a 128 processor hardware accelerator for speeding up the sequence comparison process. The short-term objective is to provide a low-cost board to boost PC or workstation performance on this class of applications. This paper places SAMBA amongst other existing systems and highlights the original features. Real performance obtained from the prototype is demonstrated. For example, a sequence of 300 amino acids is scanned against SWISS-PROT-34 (21 210 389 residues) in 30 s using the Smith and Waterman algorithm. More time-consuming applications, like the bank-to-bank comparison, are computed in a few hours instead of days on standard workstations. Technology allows the prototype to fit onto a single PCI board for plugging into any PC or workstation. SAMBA can be tested on the WEB server at URL http://www.irisa.fr/SAMBA/.

  3. The characterization of twenty sequenced human genomes.

    Directory of Open Access Journals (Sweden)

    Kimberly Pelak

    2010-09-01

    Full Text Available We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten "case" genomes from individuals with severe hemophilia A and ten "control" genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.

  4. Estimation of visual motion in image sequences

    DEFF Research Database (Denmark)

    Larsen, Rasmus

    1994-01-01

    The problem of estimation of visual motion from sequences of images has been considered within a framework consisting of three stages of processing. First the extraction of motion invariants, secondly a local measurement of visual motion, and third integration of local measurements in conjunction...... with a priori knowledge. We have surveyed a series of attempts to extract motion invariants. Specifically we have illustrate the use of local Fourier phase. The Fourier phase is shown to define the local shape of the signal, thus accurately localizing an event. Different strategies for local measurement...... are given. In particular we have investigated the use of smoothness of the second order derivatives, and the use of edge model and prior destributions for the field that favor discontinuities to characterize the motion field. A succesful implementation of a temporal interpolation in a sequence of weather...

  5. Elucidating population histories using genomic DNA sequences.

    Science.gov (United States)

    Vigilant, Linda

    2009-04-01

    In 1993, Cliff Jolly suggested that rather than debating species definitions and classifications, energy would be better spent investigating multidimensional patterns of variation and gene flow among populations. Until now, however, genetic studies of wild primate populations have been limited to very small portions of the genome. Access to complete genome sequences of humans, chimpanzees, macaques, and other primates makes it possible to design studies surveying substantial amounts of DNA sequence variation at multiple genetic loci in representatives of closely related but distinct wild primate populations. Such data can be analyzed with new approaches that estimate not only when populations diverged but also the relative amounts and directions of subsequent gene flow. These analyses will reemphasize the difficulty of achieving consistent species and subspecies definitions by revealing the extent of variation in the amount and duration of gene flow accompanying population divergences.

  6. Tritium pellet injection sequences for TFTR

    International Nuclear Information System (INIS)

    Houlberg, W.A.; Milora, S.L.; Attenberger, S.E.; Singer, C.E.; Schmidt, G.L.

    1983-01-01

    Tritium pellet injection into neutral deuterium, beam heated deuterium plasmas in the Tokamak Fusion Test Reactor (TFTR) is shown to be an attractive means of (1) minimizing tritium use per tritium discharge and over a sequence of tritium discharges; (2) greatly reducing the tritium load in the walls, limiters, getters, and cryopanels; (3) maintaining or improving instantaneous neutron production (Q); (4) reducing or eliminating deuterium-tritium (D-T) neutron production in non-optimized discharges; and (5) generally adding flexibility to the experimental sequences leading to optimal Q operation. Transport analyses of both compression and full-bore TFTR plasmas are used to support the above observations and to provide the basis for a proposed eight-pellet gas gun injector for the 1986 tritium experiments

  7. Protein structure determination using metagenome sequence data.

    Science.gov (United States)

    Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A; Kim, David E; Kamisetty, Hetunandan; Kyrpides, Nikos C; Baker, David

    2017-01-20

    Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact-based structure matching, and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the Protein Data Bank. This approach provides the representative models for large protein families originally envisioned as the goal of the Protein Structure Initiative at a fraction of the cost. Copyright © 2017, American Association for the Advancement of Science.

  8. Sequence Matching Analysis for Curriculum Development

    Directory of Open Access Journals (Sweden)

    Liem Yenny Bendatu

    2015-06-01

    Full Text Available Many organizations apply information technologies to support their business processes. Using the information technologies, the actual events are recorded and utilized to conform with predefined model. Conformance checking is an approach to measure the fitness and appropriateness between process model and actual events. However, when there are multiple events with the same timestamp, the traditional approach unfit to result such measures. This study attempts to develop a sequence matching analysis. Considering conformance checking as the basis of this approach, this proposed approach utilizes the current control flow technique in process mining domain. A case study in the field of educational process has been conducted. This study also proposes a curriculum analysis framework to test the proposed approach. By considering the learning sequence of students, it results some measurements for curriculum development. Finally, the result of the proposed approach has been verified by relevant instructors for further development.

  9. Transcriptome sequencing of the Microarray Quality Control (MAQC RNA reference samples using next generation sequencing

    Directory of Open Access Journals (Sweden)

    Thierry-Mieg Danielle

    2009-06-01

    Full Text Available Abstract Background Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC reference RNA samples using Roche's 454 Genome Sequencer FLX. Results We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. Conclusion Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.

  10. SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Seongmun Jeong

    Full Text Available Over the last decade, a large number of nucleotide sequences have been generated by next-generation sequencing technologies and deposited to public databases. However, most of these datasets do not specify the sex of individuals sampled because researchers typically ignore or hide this information. Male and female genomes in many species have distinctive sex chromosomes, XX/XY and ZW/ZZ, and expression levels of many sex-related genes differ between the sexes. Herein, we describe how to develop sex marker sequences from syntenic regions of sex chromosomes and use them to quickly identify the sex of individuals being analyzed. Array-based technologies routinely use either known sex markers or the B-allele frequency of X or Z chromosomes to deduce the sex of an individual. The same strategy has been used with whole-exome/genome sequence data; however, all reads must be aligned onto a reference genome to determine the B-allele frequency of the X or Z chromosomes. SEXCMD is a pipeline that can extract sex marker sequences from reference sex chromosomes and rapidly identify the sex of individuals from whole-exome/genome and RNA sequencing after training with a known dataset through a simple machine learning approach. The pipeline counts total numbers of hits from sex-specific marker sequences and identifies the sex of the individuals sampled based on the fact that XX/ZZ samples do not have Y or W chromosome hits. We have successfully validated our pipeline with mammalian (Homo sapiens; XY and avian (Gallus gallus; ZW genomes. Typical calculation time when applying SEXCMD to human whole-exome or RNA sequencing datasets is a few minutes, and analyzing human whole-genome datasets takes about 10 minutes. Another important application of SEXCMD is as a quality control measure to avoid mixing samples before bioinformatics analysis. SEXCMD comprises simple Python and R scripts and is freely available at https://github.com/lovemun/SEXCMD.

  11. SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing.

    Science.gov (United States)

    Jeong, Seongmun; Kim, Jiwoong; Park, Won; Jeon, Hongmin; Kim, Namshin

    2017-01-01

    Over the last decade, a large number of nucleotide sequences have been generated by next-generation sequencing technologies and deposited to public databases. However, most of these datasets do not specify the sex of individuals sampled because researchers typically ignore or hide this information. Male and female genomes in many species have distinctive sex chromosomes, XX/XY and ZW/ZZ, and expression levels of many sex-related genes differ between the sexes. Herein, we describe how to develop sex marker sequences from syntenic regions of sex chromosomes and use them to quickly identify the sex of individuals being analyzed. Array-based technologies routinely use either known sex markers or the B-allele frequency of X or Z chromosomes to deduce the sex of an individual. The same strategy has been used with whole-exome/genome sequence data; however, all reads must be aligned onto a reference genome to determine the B-allele frequency of the X or Z chromosomes. SEXCMD is a pipeline that can extract sex marker sequences from reference sex chromosomes and rapidly identify the sex of individuals from whole-exome/genome and RNA sequencing after training with a known dataset through a simple machine learning approach. The pipeline counts total numbers of hits from sex-specific marker sequences and identifies the sex of the individuals sampled based on the fact that XX/ZZ samples do not have Y or W chromosome hits. We have successfully validated our pipeline with mammalian (Homo sapiens; XY) and avian (Gallus gallus; ZW) genomes. Typical calculation time when applying SEXCMD to human whole-exome or RNA sequencing datasets is a few minutes, and analyzing human whole-genome datasets takes about 10 minutes. Another important application of SEXCMD is as a quality control measure to avoid mixing samples before bioinformatics analysis. SEXCMD comprises simple Python and R scripts and is freely available at https://github.com/lovemun/SEXCMD.

  12. Targeted exome sequencing of suspected mitochondrial disorders.

    Science.gov (United States)

    Lieber, Daniel S; Calvo, Sarah E; Shanahan, Kristy; Slate, Nancy G; Liu, Shangtao; Hershman, Steven G; Gold, Nina B; Chapman, Brad A; Thorburn, David R; Berry, Gerard T; Schmahmann, Jeremy D; Borowsky, Mark L; Mueller, David M; Sims, Katherine B; Mootha, Vamsi K

    2013-05-07

    To evaluate the utility of targeted exome sequencing for the molecular diagnosis of mitochondrial disorders, which exhibit marked phenotypic and genetic heterogeneity. We considered a diverse set of 102 patients with suspected mitochondrial disorders based on clinical, biochemical, and/or molecular findings, and whose disease ranged from mild to severe, with varying age at onset. We sequenced the mitochondrial genome (mtDNA) and the exons of 1,598 nuclear-encoded genes implicated in mitochondrial biology, mitochondrial disease, or monogenic disorders with phenotypic overlap. We prioritized variants likely to underlie disease and established molecular diagnoses in accordance with current clinical genetic guidelines. Targeted exome sequencing yielded molecular diagnoses in established disease loci in 22% of cases, including 17 of 18 (94%) with prior molecular diagnoses and 5 of 84 (6%) without. The 5 new diagnoses implicated 2 genes associated with canonical mitochondrial disorders (NDUFV1, POLG2), and 3 genes known to underlie other neurologic disorders (DPYD, KARS, WFS1), underscoring the phenotypic and biochemical overlap with other inborn errors. We prioritized variants in an additional 26 patients, including recessive, X-linked, and mtDNA variants that were enriched 2-fold over background and await further support of pathogenicity. In one case, we modeled patient mutations in yeast to provide evidence that recessive mutations in ATP5A1 can underlie combined respiratory chain deficiency. The results demonstrate that targeted exome sequencing is an effective alternative to the sequential testing of mtDNA and individual nuclear genes as part of the investigation of mitochondrial disease. Our study underscores the ongoing challenge of variant interpretation in the clinical setting.

  13. Identification and chromosomal localization of repeat sequences ...

    Indian Academy of Sciences (India)

    Unknown

    generate linkage maps of human and cattle as well as for other mammalian ... through BAC end sequencing and identification in silico. (Larkin et al. ... LINEs. 2,344. 653,391 bp. 19.80. LTR elements. 500. 124,761 bp. 3.78. DNA elements. 267. 46,569 bp. 1.14. Total interspersed repeats. 1,105,666 bp. 33.51. Small RNA. 26.

  14. Sequence trajectory generation for garment handling systems

    OpenAIRE

    Liu, Honghai; Lin, Hua

    2008-01-01

    This paper presents a novel generic approach to the planning strategy of garment handling systems. An assumption is proposed to separate the components of such systems into a component for intelligent gripper techniques and a component for handling planning strategies. Researchers can concentrate on one of the two components first, then merge the two problems together. An algorithm is addressed to generate the trajectory position and a clothes handling sequence of clothes partitions, which ar...

  15. Enzymatic sequencing of partially acetylated chitosan oligomers.

    Science.gov (United States)

    Hamer, Stefanie Nicole; Moerschbacher, Bruno Maria; Kolkenbrock, Stephan

    2014-06-17

    Chitosan oligosaccharides have diverse biological activities with potentially valuable applications, for example, in the fields of medicine and agriculture. These functionalities are thought to depend on their degree of polymerization and acetylation, and possibly on specific patterns of acetylation. Chitosan oligomers with fully defined architecture are difficult to produce, and their complete analysis is demanding. Analysis is typically done using MS or NMR, requiring access to expensive infrastructure, and yielding unequivocal results only in the case of rather small oligomers. We here describe a simple and cost-efficient method for the sequencing of μg amounts of chitosan oligosaccharides which is based on the sequential action of two recombinant glycosidases, namely an exo-β-N-acetylhexosaminidase (GlcNAcase) from Bacillus subtilis 168 and an exo-β-d-glucosaminidase (GlcNase) from Thermococcus kodakarensis KOD1. Starting from the non-reducing end, GlcNAcase and GlcNase specifically remove N-acetyl glucosamine (A) and glucosamine (D) units, respectively. By the sequential addition and removal of these enzymes in an alternating way followed by analysis of the products using high-performance thin-layer chromatography, the sequence of chitosan oligosaccharides can be revealed. Importantly, both enzymes work under identical conditions so that no buffer exchange is required between steps, and the enzyme can be removed conveniently using simple ultra-filtration devices. As proof-of-principle, the method was used to sequence the product of enzymatic deacetylation of chitin pentamer using a recombinant chitin deacetylase from Vibrio cholerae which specifically removes the acetyl group from the second unit next to the non-reducing end of the substrate, yielding mono-deacetylated pentamer with the sequence ADAAA. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Protein contact order prediction from primary sequences

    Directory of Open Access Journals (Sweden)

    Wishart David S

    2008-05-01

    Full Text Available Abstract Background Contact order is a topological descriptor that has been shown to be correlated with several interesting protein properties such as protein folding rates and protein transition state placements. Contact order has also been used to select for viable protein folds from ab initio protein structure prediction programs. For proteins of known three-dimensional structure, their contact order can be calculated directly. However, for proteins with unknown three-dimensional structure, there is no effective prediction method currently available. Results In this paper, we propose several simple yet very effective methods to predict contact order from the amino acid sequence only. One set of methods is based on a weighted linear combination of predicted secondary structure content and amino acid composition. Depending on the number of components used in these equations it is possible to achieve a correlation coefficient of 0.857–0.870 between the observed and predicted contact order. A second method, based on sequence similarity to known three-dimensional structures, is able to achieve a correlation coefficient of 0.977. We have also developed a much more robust implementation for calculating contact order directly from PDB coordinates that works for > 99% PDB files. All of these contact order predictors and calculators have been implemented as a web server (see Availability and requirements section for URL. Conclusion Protein contact order can be effectively predicted from the primary sequence, at the absence of three-dimensional structure. Three factors, percentage of residues in alpha helices, percentage of residues in beta strands, and sequence length, appear to be strongly correlated with the absolute contact order.

  17. Expression and sequence characterization of growth hormone ...

    African Journals Online (AJOL)

    Qadeer

    2012-01-03

    Jan 3, 2012 ... Bhd., Malaysia. DTCS Quick start kit (Beckman Coulter) was used for sequencing of the gene. The vectors used for cloning and expression of BbGHBP include pTZ57R/T (Fermentas Inc. USA) and pET22b(+) (Novagen EMD Biosciences, Germany). E. coli host strains used in this study were DH5α and BL21 ...

  18. Cranial modularity and sequence heterochrony in mammals.

    Science.gov (United States)

    Goswami, Anjali

    2007-01-01

    Heterochrony, the temporal shifting of developmental events relative to each other, requires a degree of autonomy among those processes or structures. Modularity, the division of larger structures or processes into autonomous sets of internally integrated units, is often discussed in relation to the concept of heterochrony. However, the relationship between the developmental modules derived from studies of heterochrony and evolutionary modules, which should be of adaptive importance and relate to the genotype-phenotype map, has not been explicitly studied. I analyzed a series of sectioned and whole cleared-and-stained embryological and neonatal specimens, supplemented with published ontogenetic data, to test the hypothesis that bones within the same phenotypic modules, as determined by morphometric analysis, are developmentally integrated and will display coordinated heterochronic shifts across taxa. Modularity was analyzed in cranial bone ossification sequences of 12 therian mammals. A dataset of 12-18 developmental events was used to assess if modularity in developmental sequences corresponds to six phenotypic modules, derived from a recent morphometric analysis of cranial modularity in mammals. Kendall's tau was used to measure rank correlations, with randomization tests for significance. If modularity in developmental sequences corresponds to observed phenotypic modules, bones within a single phenotypic module should show integration of developmental timing, maintaining the same timing of ossification relative to each other, despite differences in overall ossification sequences across taxa. Analyses did not find any significant conservation of developmental timing within the six phenotypic modules, meaning that bones that are highly integrated in adult morphology are not significantly integrated in developmental timing.

  19. Multistage Stochastic Programming via Autoregressive Sequences

    Czech Academy of Sciences Publication Activity Database

    Kaňková, Vlasta

    2007-01-01

    Roč. 15, č. 4 (2007), s. 99-110 ISSN 0572-3043 R&D Projects: GA ČR GA402/07/1113; GA ČR(CZ) GA402/06/0990; GA ČR GD402/03/H057 Institutional research plan: CEZ:AV0Z10750506 Keywords : Economic proceses * Multistage stochastic programming * autoregressive sequences * individual probability constraints Subject RIV: BB - Applied Statistics, Operational Research

  20. Clinical applications of sequencing take center stage

    OpenAIRE

    Glusman, Gustavo

    2013-01-01

    A report on the Advances in Genome Biology and Technology (AGBT) meeting, Marco Island, Florida, USA, February 20-23, 2013. This year's Advances in Genome Biology and Technology (AGBT) meeting reflected the current state of 'next generation' sequencing (NGS) technologies: significantly reduced competition and innovation, and a strong focus on standardization and application. Announcements of technological breakthroughs - a hallmark of previous AGBT meetings - were markedly absent, but existin...

  1. Hierarchical morphological segmentation for image sequence coding

    OpenAIRE

    Salembier Clairon, Philippe Jean; Pardàs Feliu, Montse

    1994-01-01

    This paper deals with a hierarchical morphological segmentation algorithm for image sequence coding. Mathematical morphology is very attractive for this purpose because it efficiently deals with geometrical features such as size, shape, contrast, or connectivity that can be considered as segmentation-oriented features. The algorithm follows a top-down procedure. It first takes into account the global information and produces a coarse segmentation, that is, with a small number of regions. Then...

  2. Hierarchical morphological segmentation for image sequence coding.

    Science.gov (United States)

    Salembier, P; Pardas, M

    1994-01-01

    This paper deals with a hierarchical morphological segmentation algorithm for image sequence coding. Mathematical morphology is very attractive for this purpose because it efficiently deals with geometrical features such as size, shape, contrast, or connectivity that can be considered as segmentation-oriented features. The algorithm follows a top-down procedure. It first takes into account the global information and produces a coarse segmentation, that is, with a small number of regions. Then, the segmentation quality is improved by introducing regions corresponding to more local information. The algorithm, considering sequences as being functions on a 3-D space, directly segments 3-D regions. A 3-D approach is used to get a segmentation that is stable in time and to directly solve the region correspondence problem. Each segmentation stage relies on four basic steps: simplification, marker extraction, decision, and quality estimation. The simplification removes information from the sequence to make it easier to segment. Morphological filters based on partial reconstruction are proven to be very efficient for this purpose, especially in the case of sequences. The marker extraction identifies the presence of homogeneous 3-D regions. It is based on constrained flat region labeling and morphological contrast extraction. The goal of the decision is to precisely locate the contours of regions detected by the marker extraction. This decision is performed by a modified watershed algorithm. Finally, the quality estimation concentrates on the coding residue, all the information about the 3-D regions that have not been properly segmented and therefore coded. The procedure allows the introduction of the texture and contour coding schemes within the segmentation algorithm. The coding residue is transmitted to the next segmentation stage to improve the segmentation and coding quality. Finally, segmentation and coding examples are presented to show the validity and interest of

  3. mufasa: the assembly of the red sequence

    Science.gov (United States)

    Davé, Romeel; Rafieferantsoa, Mika H.; Thompson, Robert J.

    2017-10-01

    We examine the growth and evolution of quenched galaxies in the mufasa cosmological hydrodynamic simulations that include an evolving halo mass-based quenching prescription, with galaxy colours computed accounting for line-of-sight extinction to individual star particles. mufasa reproduces the observed present-day red sequence reasonably well, including its slope, amplitude and scatter. In mufasa, the red sequence slope is driven entirely by the steep stellar mass-stellar metallicity relation, which independently agrees with observations. High-mass star-forming galaxies blend smoothly on to the red sequence, indicating the lack of a well-defined green valley at M* ≳ 1010.5 M⊙. The most massive galaxies quench the earliest and then grow very little in mass via dry merging; they attain their high masses at earlier epochs when cold inflows more effectively penetrate hot haloes. To higher redshifts, the red sequence becomes increasingly contaminated with massive dusty star-forming (SF) galaxies; UVJ selection subtly but effectively separates these populations. We then examine the evolution of the mass functions of central and satellite galaxies split into passive and star-forming via UVJ. Massive quenched systems show good agreement with observations out to z ∼ 2, despite not including a rapid early quenching mode associated with mergers. However, low-mass quenched galaxies are far too numerous at z ≲ 1 in mufasa, indicating that mufasa strongly overquenches satellites. A challenge for hydrodynamic simulations is to devise a quenching model that produces enough early massive quenched galaxies and keeps them quenched to z = 0, while not being so strong as to overquench satellites; mufasa's current scheme fails at the latter.

  4. Irreducible Tests for Space Mission Sequencing Software

    Science.gov (United States)

    Ferguson, Lisa

    2012-01-01

    As missions extend further into space, the modeling and simulation of their every action and instruction becomes critical. The greater the distance between Earth and the spacecraft, the smaller the window for communication becomes. Therefore, through modeling and simulating the planned operations, the most efficient sequence of commands can be sent to the spacecraft. The Space Mission Sequencing Software is being developed as the next generation of sequencing software to ensure the most efficient communication to interplanetary and deep space mission spacecraft. Aside from efficiency, the software also checks to make sure that communication during a specified time is even possible, meaning that there is not a planet or moon preventing reception of a signal from Earth or that two opposing commands are being given simultaneously. In this way, the software not only models the proposed instructions to the spacecraft, but also validates the commands as well.To ensure that all spacecraft communications are sequenced properly, a timeline is used to structure the data. The created timelines are immutable and once data is as-signed to a timeline, it shall never be deleted nor renamed. This is to prevent the need for storing and filing the timelines for use by other programs. Several types of timelines can be created to accommodate different types of communications (activities, measurements, commands, states, events). Each of these timeline types requires specific parameters and all have options for additional parameters if needed. With so many combinations of parameters available, the robustness and stability of the software is a necessity. Therefore a baseline must be established to ensure the full functionality of the software and it is here where the irreducible tests come into use.

  5. Strong-Q-sequences and small d

    Czech Academy of Sciences Publication Activity Database

    Chodounský, David

    2012-01-01

    Roč. 159, č. 3 (2012), s. 2942-2946 ISSN 0166-8641. [Prague Symposium on General Topology and its Relations to Modern Analysis and Algebra /11./. Prague, 07.08.2011-12.08.2011] Institutional support: RVO:67985840 Keywords : Katowice problem * strong-Q-sequence * dominating number Subject RIV: BA - General Mathematics Impact factor: 0.562, year: 2012 http://www.sciencedirect.com/science/article/pii/S0166864112002222

  6. Next-Generation Sequencing in Intellectual Disability

    OpenAIRE

    Carvill, Gemma L.; Mefford, Heather C.

    2015-01-01

    Next-generation sequencing technologies have revolutionized gene discovery in patients with intellectual disability (ID) and led to an unprecedented expansion in the number of genes implicated in this disorder. We discuss the strategies that have been used to identify these novel genes for both syndromic and nonsyndromic ID and highlight the phenotypic and genetic heterogeneity that underpin this condition. Finally, we discuss the future of defining the genetic etiology of ID, including the r...

  7. Protein sequence database for pathogenic arenaviruses

    OpenAIRE

    Bui, HH; Botten, J; Fusseder, N; Pasquetto, V; Mothe, B; Buchmeier, MJ; Sette, A

    2007-01-01

    Background: Arenaviruses are a family of rodent-borne viruses that cause several hemorrhagic fevers. These diseases can be devastating and are often lethal. Herein, to aid in the design and development of diagnostics, treatments and vaccines for arenavirus infections, we have developed a database containing protein sequences from the seven pathogenic arenaviruses (Junin, Guanarito, Sabia, Machupo, Whitewater Arroyo, Lassa and LCMV). Results: The database currently contains a non-redundant set...

  8. Unsupervised statistical clustering of environmental shotgun sequences

    Directory of Open Access Journals (Sweden)

    Bhatnagar Srijak

    2009-10-01

    Full Text Available Abstract Background The development of effective environmental shotgun sequence binning methods remains an ongoing challenge in algorithmic analysis of metagenomic data. While previous methods have focused primarily on supervised learning involving extrinsic data, a first-principles statistical model combined with a self-training fitting method has not yet been developed. Results We derive an unsupervised, maximum-likelihood formalism for clustering short sequences by their taxonomic origin on the basis of their k-mer distributions. The formalism is implemented using a Markov Chain Monte Carlo approach in a k-mer feature space. We introduce a space transformation that reduces the dimensionality of the feature space and a genomic fragment divergence measure that strongly correlates with the method's performance. Pairwise analysis of over 1000 completely sequenced genomes reveals that the vast majority of genomes have sufficient genomic fragment divergence to be amenable for binning using the present formalism. Using a high-performance implementation, the binner is able to classify fragments as short as 400 nt with accuracy over 90% in simulations of low-complexity communities of 2 to 10 species, given sufficient genomic fragment divergence. The method is available as an open source package called LikelyBin. Conclusion An unsupervised binning method based on statistical signatures of short environmental sequences is a viable stand-alone binning method for low complexity samples. For medium and high complexity samples, we discuss the possibility of combining the current method with other methods as part of an iterative process to enhance the resolving power of sorting reads into taxonomic and/or functional bins.

  9. Sequencing and phasing cancer mutations in lung cancers using a long-read portable sequencer.

    Science.gov (United States)

    Suzuki, Ayako; Suzuki, Mizuto; Mizushima-Sugano, Junko; Frith, Martin C; Makalowski, Wojciech; Kohno, Takashi; Sugano, Sumio; Tsuchihara, Katsuya; Suzuki, Yutaka

    2017-12-01

    Here, we employed cDNA amplicon sequencing using a long-read portable sequencer, MinION, to characterize various types of mutations in cancer-related genes, namely, EGFR, KRAS, NRAS and NF1. For homozygous SNVs, the precision and recall rates were 87.5% and 91.3%, respectively. For previously reported hotspot mutations, the precision and recall rates reached 100%. The precise junctions of EML4-ALK, CCDC6-RET and five other gene fusions were also detected. Taking advantages of long-read sequencing, we conducted phasing of EGFR mutations and elucidated the mutational allelic backgrounds of anti-tumor drug-sensitive and resistant mutations, which could provide useful information for selecting therapeutic approaches. In the H1975 cells, 72% of the reads harbored both L858R and T790M mutations, and 22% of the reads harbored neither mutation. To ensure that the clinical requirements can be met in potentially low cancer cell populations, we further conducted a serial dilution analysis of the template for EGFR mutations. Several percent of the mutant alleles could be detected depending on the yield and quality of the sequencing data. Finally, we characterized the mutation genotypes in eight clinical samples. This method could be a convenient long-read sequencing-based analytical approach and thus may change the current approaches used for cancer genome sequencing. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  10. PN Sequence Preestimator Scheme for DS-SS Signal Acquisition Using Block Sequence Estimation

    Science.gov (United States)

    Hyun, Kwangmin; Yoon, Dongweon; Park, Sang Kyu

    2005-12-01

    An [InlineEquation not available: see fulltext.]-sequence (PN sequence) preestimator scheme for direct-sequence spread spectrum (DS-SS) signal acquisition by using block sequence estimation (BSE) is proposed and analyzed. The proposed scheme consists of an estimator and a verifier which work according to the PN sequence chip clock, and provides not only the enhanced chip estimates with a threshold decision logic and one-chip error correction among the first [InlineEquation not available: see fulltext.] received chips, but also the reliability check of the estimates with additional decision logic. The probabilities of the estimator and verifier operations are calculated. With these results, the detection, the false alarm, and the missing probabilities of the proposed scheme are derived. In addition, using a signal flow graph, the average acquisition time is calculated. The proposed scheme can be used as a preestimator and easily implemented by changing the internal signal path of a generally used digital matched filter (DMF) correlator or any other correlator that has a lot of sampling data memories for sampled PN sequence. The numerical results show rapid acquisition performance in a relatively good CNR.

  11. Quantifying Next Generation Sequencing Sample Pre-Processing Bias in HIV-1 Complete Genome Sequencing.

    Science.gov (United States)

    Vrancken, Bram; Trovão, Nídia Sequeira; Baele, Guy; van Wijngaerden, Eric; Vandamme, Anne-Mieke; van Laethem, Kristel; Lemey, Philippe

    2016-01-07

    Genetic analyses play a central role in infectious disease research. Massively parallelized "mechanical cloning" and sequencing technologies were quickly adopted by HIV researchers in order to broaden the understanding of the clinical importance of minor drug-resistant variants. These efforts have, however, remained largely limited to small genomic regions. The growing need to monitor multiple genome regions for drug resistance testing, as well as the obvious benefit for studying evolutionary and epidemic processes makes complete genome sequencing an important goal in viral research. In addition, a major drawback for NGS applications to RNA viruses is the need for large quantities of input DNA. Here, we use a generic overlapping amplicon-based near full-genome amplification protocol to compare low-input enzymatic fragmentation (Nextera™) with conventional mechanical shearing for Roche 454 sequencing. We find that the fragmentation method has only a modest impact on the characterization of the population composition and that for reliable results, the variation introduced at all steps of the procedure--from nucleic acid extraction to sequencing--should be taken into account, a finding that is also relevant for NGS technologies that are now more commonly used. Furthermore, by applying our protocol to deep sequence a number of pre-therapy plasma and PBMC samples, we illustrate the potential benefits of a near complete genome sequencing approach in routine genotyping.

  12. PN Sequence Preestimator Scheme for DS-SS Signal Acquisition Using Block Sequence Estimation

    Directory of Open Access Journals (Sweden)

    Sang Kyu Park

    2005-03-01

    Full Text Available An m-sequence (PN sequence preestimator scheme for direct-sequence spread spectrum (DS-SS signal acquisition by using block sequence estimation (BSE is proposed and analyzed. The proposed scheme consists of an estimator and a verifier which work according to the PN sequence chip clock, and provides not only the enhanced chip estimates with a threshold decision logic and one-chip error correction among the first m received chips, but also the reliability check of the estimates with additional decision logic. The probabilities of the estimator and verifier operations are calculated. With these results, the detection, the false alarm, and the missing probabilities of the proposed scheme are derived. In addition, using a signal flow graph, the average acquisition time is calculated. The proposed scheme can be used as a preestimator and easily implemented by changing the internal signal path of a generally used digital matched filter (DMF correlator or any other correlator that has a lot of sampling data memories for sampled PN sequence. The numerical results show rapid acquisition performance in a relatively good CNR.

  13. DNA sequencing using fluorescence background electroblotting membrane

    Science.gov (United States)

    Caldwell, K.D.; Chu, T.J.; Pitt, W.G.

    1992-05-12

    A method for the multiplex sequencing on DNA is disclosed which comprises the electroblotting or specific base terminated DNA fragments, which have been resolved by gel electrophoresis, onto the surface of a neutral non-aromatic polymeric microporous membrane exhibiting low background fluorescence which has been surface modified to contain amino groups. Polypropylene membranes are preferably and the introduction of amino groups is accomplished by subjecting the membrane to radio or microwave frequency plasma discharge in the presence of an aminating agent, preferably ammonia. The membrane, containing physically adsorbed DNA fragments on its surface after the electroblotting, is then treated with crosslinking means such as UV radiation or a glutaraldehyde spray to chemically bind the DNA fragments to the membrane through amino groups contained on the surface. The DNA fragments chemically bound to the membrane are subjected to hybridization probing with a tagged probe specific to the sequence of the DNA fragments. The tagging may be by either fluorophores or radioisotopes. The tagged probes hybridized to the target DNA fragments are detected and read by laser induced fluorescence detection or autoradiograms. The use of aminated low fluorescent background membranes allows the use of fluorescent detection and reading even when the available amount of DNA to be sequenced is small. The DNA bound to the membranes may be reprobed numerous times. No Drawings

  14. Constrained Optimization of MIMO Training Sequences

    Directory of Open Access Journals (Sweden)

    Coon Justin P

    2007-01-01

    Full Text Available Multiple-input multiple-output (MIMO systems have shown a huge potential for increased spectral efficiency and throughput. With an increasing number of transmitting antennas comes the burden of providing training for channel estimation for coherent detection. In some special cases optimal, in the sense of mean-squared error (MSE, training sequences have been designed. However, in many practical systems it is not feasible to analytically find optimal solutions and numerical techniques must be used. In this paper, two systems (unique word (UW single carrier and OFDM with nulled subcarriers are considered and a method of designing near-optimal training sequences using nonlinear optimization techniques is proposed. In particular, interior-point (IP algorithms such as the barrier method are discussed. Although the two systems seem unrelated, the cost function, which is the MSE of the channel estimate, is shown to be effectively the same for each scenario. Also, additional constraints, such as peak-to-average power ratio (PAPR, are considered and shown to be easily included in the optimization process. Numerical examples illustrate the effectiveness of the designed training sequences, both in terms of MSE and bit-error rate (BER.

  15. An approach to sequence DNA without tagging

    Science.gov (United States)

    Niu, Sanjun; Saraf, Ravi F.

    2002-10-01

    Microarray technology is playing an increasingly important role in biology and medicine and its application to genomics for gene expression analysis has already reached the market with a variety of commercially available instruments. In these combinatorial analysis methods, known probe single-strand DNA (ssDNA) 'primers' are attached in clusters of typically 100 µm × 100 µm pixels. Each pixel of the array has a slightly different sequence. On exposure to 'unknown' target ssDNA, the pixels with the right complementary probe ssDNA sequence convert to double-stranded DNA (dsDNA) by a hybridization reaction. To transduct the conversion of the pixel to dsDNA, the target ssDNA is labelled with a photoluminescent tag during the polymerase chain reaction (PCR) amplification process. Due to the statistical distribution of the tags in the target ssDNA, it becomes significantly difficult to implement these methods as a diagnostic tool in a pathology laboratory. A method to sequence DNA without tagging the molecule is developed. The fabrication process is compatible with current microelectronics and (emerging) soft-material fabrication technologies, allowing the method to be integrable with micro-electromechanical systems (MEMS) and lab-on-a-chip devices. An estimated sensitivity of 10-12 g on a 1 cm2 device area is obtained.

  16. FastMotif: spectral sequence motif discovery.

    Science.gov (United States)

    Colombo, Nicoló; Vlassis, Nikos

    2015-08-15

    Sequence discovery tools play a central role in several fields of computational biology. In the framework of Transcription Factor binding studies, most of the existing motif finding algorithms are computationally demanding, and they may not be able to support the increasingly large datasets produced by modern high-throughput sequencing technologies. We present FastMotif, a new motif discovery algorithm that is built on a recent machine learning technique referred to as Method of Moments. Based on spectral decompositions, our method is robust to model misspecifications and is not prone to locally optimal solutions. We obtain an algorithm that is extremely fast and designed for the analysis of big sequencing data. On HT-Selex data, FastMotif extracts motif profiles that match those computed by various state-of-the-art algorithms, but one order of magnitude faster. We provide a theoretical and numerical analysis of the algorithm's robustness and discuss its sensitivity with respect to the free parameters. The Matlab code of FastMotif is available from http://lcsb-portal.uni.lu/bioinformatics. vlassis@adobe.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Protein sequence comparison and protein evolution

    Energy Technology Data Exchange (ETDEWEB)

    Pearson, W.R. [Univ. of Virginia, Charlottesville, VA (United States). Dept. of Biochemistry

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. This tutorial examines how the information conserved during the evolution of a protein molecule can be used to infer reliably homology, and thus a shared proteinfold and possibly a shared active site or function. The authors start by reviewing a geological/evolutionary time scale. Next they look at the evolution of several protein families. During the tutorial, these families will be used to demonstrate that homologous protein ancestry can be inferred with confidence. They also examine different modes of protein evolution and consider some hypotheses that have been presented to explain the very earliest events in protein evolution. The next part of the tutorial will examine the technical aspects of protein sequence comparison. Both optimal and heuristic algorithms and their associated parameters that are used to characterize protein sequence similarities are discussed. Perhaps more importantly, they survey the statistics of local similarity scores, and how these statistics can both be used to improve the selectivity of a search and to evaluate the significance of a match. They them examine distantly related members of three protein families, the serine proteases, the glutathione transferases, and the G-protein-coupled receptors (GCRs). Finally, the discuss how sequence similarity can be used to examine internal repeated or mosaic structures in proteins.

  18. Collaborative Filtering Recommendation on Users' Interest Sequences.

    Directory of Open Access Journals (Sweden)

    Weijie Cheng

    Full Text Available As an important factor for improving recommendations, time information has been introduced to model users' dynamic preferences in many papers. However, the sequence of users' behaviour is rarely studied in recommender systems. Due to the users' unique behavior evolution patterns and personalized interest transitions among items, users' similarity in sequential dimension should be introduced to further distinguish users' preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users' interest sequences (IS that rank users' ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users' longest common sub-IS (LCSIS and the count of users' total common sub-IS (ACSIS. Then, these semantics are utilized to obtain users' IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users' preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction.

  19. DNA sequencing using fluorescence background electroblotting membrane

    Science.gov (United States)

    Caldwell, Karin D.; Chu, Tun-Jen; Pitt, William G.

    1992-01-01

    A method for the multiplex sequencing on DNA is disclosed which comprises the electroblotting or specific base terminated DNA fragments, which have been resolved by gel electrophoresis, onto the surface of a neutral non-aromatic polymeric microporous membrane exhibiting low background fluorescence which has been surface modified to contain amino groups. Polypropylene membranes are preferably and the introduction of amino groups is accomplished by subjecting the membrane to radio or microwave frequency plasma discharge in the presence of an aminating agent, preferably ammonia. The membrane, containing physically adsorbed DNA fragments on its surface after the electroblotting, is then treated with crosslinking means such as UV radiation or a glutaraldehyde spray to chemically bind the DNA fragments to the membrane through said smino groups contained on the surface thereof. The DNA fragments chemically bound to the membrane are subjected to hybridization probing with a tagged probe specific to the sequence of the DNA fragments. The tagging may be by either fluorophores or radioisotopes. The tagged probes hybridized to said target DNA fragments are detected and read by laser induced fluorescence detection or autoradiograms. The use of aminated low fluorescent background membranes allows the use of fluorescent detection and reading even when the available amount of DNA to be sequenced is small. The DNA bound to the membrances may be reprobed numerous times.

  20. Exploring Nitrilase Sequence Space for Enantioselective Catalysis†

    Science.gov (United States)

    Robertson, Dan E.; Chaplin, Jennifer A.; DeSantis, Grace; Podar, Mircea; Madden, Mark; Chi, Ellen; Richardson, Toby; Milan, Aileen; Miller, Mark; Weiner, David P.; Wong, Kelvin; McQuaid, Jeff; Farwell, Bob; Preston, Lori A.; Tan, Xuqiu; Snead, Marjory A.; Keller, Martin; Mathur, Eric; Kretz, Patricia L.; Burk, Mark J.; Short, Jay M.

    2004-01-01

    Nitrilases are important in the biosphere as participants in synthesis and degradation pathways for naturally occurring, as well as xenobiotically derived, nitriles. Because of their inherent enantioselectivity, nitrilases are also attractive as mild, selective catalysts for setting chiral centers in fine chemical synthesis. Unfortunately, nitrilases have been reported in the scientific and patent literature, and because of stability or specificity shortcomings, their utility has been largely unrealized. In this study, 137 unique nitrilases, discovered from screening of >600 biotope-specific environmental DNA (eDNA) libraries, were characterized. Using culture-independent means, phylogenetically diverse genomes were captured from entire biotopes, and their genes were expressed heterologously in a common cloning host. Nitrilase genes were targeted in a selection-based expression assay of clonal populations numbering 106 to 1010 members per eDNA library. A phylogenetic analysis of the novel sequences discovered revealed the presence of at least five major sequence clades within the nitrilase subfamily. Using three nitrile substrates targeted for their potential in chiral pharmaceutical synthesis, the enzymes were characterized for substrate specificity and stereospecificity. A number of important correlations were found between sequence clades and the selective properties of these nitrilases. These enzymes, discovered using a high-throughput, culture-independent method, provide a catalytic toolbox for enantiospecific synthesis of a variety of carboxylic acid derivatives, as well as an intriguing library for evolutionary and structural analyses. PMID:15066841

  1. Sequence of tenses: playing with possibilities

    Directory of Open Access Journals (Sweden)

    Bohdan Ulašin

    2012-12-01

    Full Text Available The aim of this article is to analyse the sequence of tenses in Spanish, a phenomenon typical of the Romance languages. Our purpose is to systematize and formulate the rules of use by means of examples with graphical representations. We focus on such cases of use of the sequence of tenses where it is possible to apply a “double-access interpretation”, i. e. which allows the possibility of choosing between reference points: with the actual time (the moment of speech or with the main clause. This double interpretation is to be found in all three types of time relationships: simultaneousness, priority and posteriority. Some pairs differ semantically; others are used according to the preferences of the speaker with no change of meaning. Most examples are subordinate noun clauses, nonetheless we included examples of other types of clauses as well (e.g. relative, reason clauses. We also analyze the problem of the sequence of tenses where the main verb is in the conditional tense.

  2. Optimization of short amino acid sequences classifier

    Science.gov (United States)

    Barcz, Aleksy; Szymański, Zbigniew

    This article describes processing methods used for short amino acid sequences classification. The data processed are 9-symbols string representations of amino acid sequences, divided into 49 data sets - each one containing samples labeled as reacting or not with given enzyme. The goal of the classification is to determine for a single enzyme, whether an amino acid sequence would react with it or not. Each data set is processed separately. Feature selection is performed to reduce the number of dimensions for each data set. The method used for feature selection consists of two phases. During the first phase, significant positions are selected using Classification and Regression Trees. Afterwards, symbols appearing at the selected positions are substituted with numeric values of amino acid properties taken from the AAindex database. In the second phase the new set of features is reduced using a correlation-based ranking formula and Gram-Schmidt orthogonalization. Finally, the preprocessed data is used for training LS-SVM classifiers. SPDE, an evolutionary algorithm, is used to obtain optimal hyperparameters for the LS-SVM classifier, such as error penalty parameter C and kernel-specific hyperparameters. A simple score penalty is used to adapt the SPDE algorithm to the task of selecting classifiers with best performance measures values.

  3. Motor sequence learning and movement disorders.

    Science.gov (United States)

    Doyon, Julien

    2008-08-01

    New insights into the psychophysiological determinants of performance changes and brain plasticity associated with motor sequence learning have recently been gained through behavioral and imaging studies in healthy individuals. In addition, using a variety of motor sequential paradigms in groups of patients affected by a movement disorder, major advances have been achieved in our understanding of the pathophysiological mechanisms underlying Parkinson's and Huntington's diseases, as well as primary forms of dystonia. This review begins by describing the latest findings in normal participants with regards to the dynamic alterations in neural networks observed across the different phases of motor sequence learning. It then focuses on the hotly debated issue of motor memory consolidation, highlighting the results of novel studies that investigated the role of both day and night sleep, the neural substrates and the developmental evolution mediating this process. Finally, this paper addresses current work looking at motor sequence learning in movement disorders that helps to better comprehend the functional contribution of basal ganglia structures to this type of memory, to assess the impact of such diseases on related patterns of brain activation, as well as to identify the neuronal compensatory mechanisms educed by these basal ganglia disorders. Such advances have major implications, not only for optimizing ways to learn new skilled behaviors in real-life situations, but also for guiding therapeutic approaches in patients with movement disorders.

  4. Information theory applications for biological sequence analysis.

    Science.gov (United States)

    Vinga, Susana

    2014-05-01

    Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.

  5. BS Seeker: precise mapping for bisulfite sequencing

    Directory of Open Access Journals (Sweden)

    Pellegrini Matteo

    2010-04-01

    Full Text Available Abstract Background Bisulfite sequencing using next generation sequencers yields genome-wide measurements of DNA methylation at single nucleotide resolution. Traditional aligners are not designed for mapping bisulfite-treated reads, where the unmethylated Cs are converted to Ts. We have developed BS Seeker, an approach that converts the genome to a three-letter alphabet and uses Bowtie to align bisulfite-treated reads to a reference genome. It uses sequence tags to reduce mapping ambiguity. Post-processing of the alignments removes non-unique and low-quality mappings. Results We tested our aligner on synthetic data, a bisulfite-converted Arabidopsis library, and human libraries generated from two different experimental protocols. We evaluated the performance of our approach and compared it to other bisulfite aligners. The results demonstrate that among the aligners tested, BS Seeker is more versatile and faster. When mapping to the human genome, BS Seeker generates alignments significantly faster than RMAP and BSMAP. Furthermore, BS Seeker is the only alignment tool that can explicitly account for tags which are generated by certain library construction protocols. Conclusions BS Seeker provides fast and accurate mapping of bisulfite-converted reads. It can work with BS reads generated from the two different experimental protocols, and is able to efficiently map reads to large mammalian genomes. The Python program is freely available at http://pellegrini.mcdb.ucla.edu/BS_Seeker/BS_Seeker.html.

  6. Stress triggering and the Canterbury earthquake sequence

    Science.gov (United States)

    Steacy, Sandy; Jiménez, Abigail; Holden, Caroline

    2014-01-01

    The Canterbury earthquake sequence, which includes the devastating Christchurch event of 2011 February, has to date led to losses of around 40 billion NZ dollars. The location and severity of the earthquakes was a surprise to most inhabitants as the seismic hazard model was dominated by an expected Mw > 8 earthquake on the Alpine fault and an Mw 7.5 earthquake on the Porters Pass fault, 150 and 80 km to the west of Christchurch. The sequence to date has included an Mw = 7.1 earthquake and 3 Mw ≥ 5.9 events which migrated from west to east. Here we investigate whether the later events are consistent with stress triggering and whether a simple stress map produced shortly after the first earthquake would have accurately indicated the regions where the subsequent activity occurred. We find that 100 per cent of M > 5.5 earthquakes occurred in positive stress areas computed using a slip model for the first event that was available within 10 d of its occurrence. We further find that the stress changes at the starting points of major slip patches of post-Darfield main events are consistent with triggering although this is not always true at the hypocentral locations. Our results suggest that Coulomb stress changes contributed to the evolution of the Canterbury sequence and we note additional areas of increased stress in the Christchurch region and on the Porters Pass fault.

  7. Genome sequence and analysis of Lactobacillus helveticus

    Directory of Open Access Journals (Sweden)

    Paola eCremonesi

    2013-01-01

    Full Text Available The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of L. helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract.As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones.

  8. Multiple sequence alignment accuracy and phylogenetic inference.

    Science.gov (United States)

    Ogden, T Heath; Rosenberg, Michael S

    2006-04-01

    Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.

  9. Genome Sequence of the Palaeopolyploid soybean

    Energy Technology Data Exchange (ETDEWEB)

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  10. Google matrix analysis of DNA sequences.

    Science.gov (United States)

    Kandiah, Vivek; Shepelyansky, Dima L

    2013-01-01

    For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  11. Google matrix analysis of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Vivek Kandiah

    Full Text Available For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW. At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  12. Method for priming and DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Mugasimangalam, R.C.; Ulanovsky, L.E.

    1997-12-01

    A method is presented for improving the priming specificity of an oligonucleotide primer that is non-unique in a nucleic acid template which includes selecting a continuous stretch of several nucleotides in the template DNA where one of the four bases does not occur in the stretch. This also includes bringing the template DNA in contract with a non-unique primer partially or fully complimentary to the sequence immediately upstream of the selected sequence stretch. This results in polymerase-mediated differential extension of the primer in the presence of a subset of deoxyribonucleotide triphosphates that does not contain the base complementary to the base absent in the selected sequence stretch. These reactions occur at a temperature sufficiently low for allowing the extension of the non-unique primer. The method causes polymerase-mediated extension reactions in the presence of all four natural deoxyribonucleotide triphosphates or modifications. At this high temperature discrimination occurs against priming sites of the non-unique primer where the differential extension has not made the primer sufficiently stable to prime. However, the primer extended at the selected stretch is sufficiently stable to prime.

  13. ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS

    Directory of Open Access Journals (Sweden)

    Alves-Ferreira Marcelo

    2008-09-01

    Full Text Available Abstract Background Genome survey sequences (GSS offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. Results We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. Conclusion The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties

  14. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers.

    Directory of Open Access Journals (Sweden)

    Stephan Pabinger

    Full Text Available Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM. Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage

  15. Development of expressed sequence tag and expressed sequence tag-simple sequence repeat marker resources for Musa acuminata.

    Science.gov (United States)

    Passos, Marco A N; de Oliveira Cruz, Viviane; Emediato, Flavia L; de Camargo Teixeira, Cristiane; Souza, Manoel T; Matsumoto, Takashi; Rennó Azevedo, Vânia C; Ferreira, Claudia F; Amorim, Edson P; de Alencar Figueiredo, Lucio Flavio; Martins, Natalia F; de Jesus Barbosa Cavalcante, Maria; Baurens, Franc-Christophe; da Silva, Orzenil Bonfim; Pappas, Georgios J; Pignolet, Luc; Abadie, Catherine; Ciampi, Ana Y; Piffanelli, Pietro; Miller, Robert N G

    2012-01-01

    Banana (Musa acuminata) is a crop contributing to global food security. Many varieties lack resistance to biotic stresses, due to sterility and narrow genetic background. The objective of this study was to develop an expressed sequence tag (EST) database of transcripts expressed during compatible and incompatible banana-Mycosphaerella fijiensis (Mf) interactions. Black leaf streak disease (BLSD), caused by Mf, is a destructive disease of banana. Microsatellite markers were developed as a resource for crop improvement. cDNA libraries were constructed from in vitro-infected leaves from BLSD-resistant M. acuminata ssp. burmaniccoides Calcutta 4 (MAC4) and susceptible M. acuminata cv. Cavendish Grande Naine (MACV). Clones were 5'-end Sanger sequenced, ESTs assembled with TGICL and unigenes annotated using BLAST, Blast2GO and InterProScan. Mreps was used to screen for simple sequence repeats (SSRs), with markers evaluated for polymorphism using 20 diploid (AA) M. acuminata accessions contrasting in resistance to Mycosphaerella leaf spot diseases. A total of 9333 high-quality ESTs were obtained for MAC4 and 3964 for MACV, which assembled into 3995 unigenes. Of these, 2592 displayed homology to genes encoding proteins with known or putative function, and 266 to genes encoding proteins with unknown function. Gene ontology (GO) classification identified 543 GO terms, 2300 unigenes were assigned to EuKaryotic orthologous group categories and 312 mapped to Kyoto Encyclopedia of Genes and Genomes pathways. A total of 624 SSR loci were identified, with trinucleotide repeat motifs the most abundant in MAC4 (54.1 %) and MACV (57.6 %). Polymorphism across M. acuminata accessions was observed with 75 markers. Alleles per polymorphic locus ranged from 2 to 8, totalling 289. The polymorphism information content ranged from 0.08 to 0.81. This EST collection offers a resource for studying functional genes, including transcripts expressed in banana-Mf interactions. Markers are

  16. Characterization of simple sequence repeats (SSRs from Phlebotomus papatasi (Diptera: Psychodidae expressed sequence tags (ESTs

    Directory of Open Access Journals (Sweden)

    Hamarsheh Omar

    2011-09-01

    Full Text Available Abstract Background Phlebotomus papatasi is a natural vector of Leishmania major, which causes cutaneous leishmaniasis in many countries. Simple sequence repeats (SSRs, or microsatellites, are common in eukaryotic genomes and are short, repeated nucleotide sequence elements arrayed in tandem and flanked by non-repetitive regions. The enrichment methods used previously for finding new microsatellite loci in sand flies remain laborious and time consuming; in silico mining, which includes retrieval and screening of microsatellites from large amounts of sequence data from sequence data bases using microsatellite search tools can yield many new candidate markers. Results Simple sequence repeats (SSRs were characterized in P. papatasi expressed sequence tags (ESTs derived from a public database, National Center for Biotechnology Information (NCBI. A total of 42,784 sequences were mined, and 1,499 SSRs were identified with a frequency of 3.5% and an average density of 15.55 kb per SSR. Dinucleotide motifs were the most common SSRs, accounting for 67% followed by tri-, tetra-, and penta-nucleotide repeats, accounting for 31.1%, 1.5%, and 0.1%, respectively. The length of microsatellites varied from 5 to 16 repeats. Dinucleotide types; AG and CT have the highest frequency. Dinucleotide SSR-ESTs are relatively biased toward an excess of (AXn repeats and a low GC base content. Forty primer pairs were designed based on motif lengths for further experimental validation. Conclusion The first large-scale survey of SSRs derived from P. papatasi is presented; dinucleotide SSRs identified are more frequent than other types. EST data mining is an effective strategy to identify functional microsatellites in P. papatasi.

  17. Automated degenerate PCR primer design for high-throughput sequencing improves efficiency of viral sequencing

    Directory of Open Access Journals (Sweden)

    Li Kelvin

    2012-11-01

    Full Text Available Abstract Background In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally. Results We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute’s (JCVI high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates. Conclusions Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus

  18. Polymorphism Sequence - JSNP | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us JSNP Polymorphism Sequence Data detail Data name Polymorphism Sequence DOI 10.18908/lsdba.nb...dc00114-001 Description of data contents Information on polymorphisms (SNPs and insertions/deletions) and th...se Name database name JSNP_SNP: single nucleotide polymorphism JSNP_InsDel_IND: insertion/deletion JSNP_InsD...ved allele observed 3' Flanking Sequence 3' flanking sequence Offset in Flanking Sequence position of the polymorphism...uence Accession No. accession No. of the sequence for polymorphism screening Offset in Record position of the polymorphism

  19. Selective learning enabled by intention to learn in sequence learning.

    Science.gov (United States)

    Miyawaki, Kaori

    2012-01-01

    This study investigated whether a target sequence that people intend to learn is learned selectively when it is interleaved with another (non-target) sequence. Three experiments used a serial reaction time task in which different spatial and color stimuli occurred alternately. Each of the two interleaved sequences had structural regularity. Participants in an intentional learning group were instructed to learn the target (spatial) sequence whereas those in an incidental learning group were not. In Experiments 1 and 2 spatial and color sequences were correlated. Results showed that the intentional group learned the spatial sequence better than the incidental group and learned it independently of the color sequence, whereas the incidental group learned the two sequences as a combined sequence. In Experiment 3 the sequences were uncorrelated. Results showed that the intentional group was no longer superior in learning the spatial sequence. Findings indicate that the intention to learn a target sequence enables selective learning of it only when it is correlated with a non-target sequence.

  20. Methods for the detection and assembly of novel sequence in high-throughput sequencing data.

    Science.gov (United States)

    Holtgrewe, Manuel; Kuchenbecker, Leon; Reinert, Knut

    2015-06-15

    Large insertions of novel sequence are an important type of structural variants. Previous studies used traditional de novo assemblers for assembling non-mapping high-throughput sequencing (HTS) or capillary reads and then tried to anchor them in the reference using paired read information. We present approaches for detecting insertion breakpoints and targeted assembly of large insertions from HTS paired data: BASIL and ANISE. On near identity repeats that are hard for assemblers, ANISE employs a repeat resolution step. This results in far better reconstructions than obtained by the compared methods. On simulated data, we found our insert assembler to be competitive with the de novo assemblers ABYSS and SGA while yielding already anchored inserted sequence as opposed to unanchored contigs as from ABYSS/SGA. On real-world data, we detected novel sequence in a human individual and thoroughly validated the assembled sequence. ANISE was found to be superior to the competing tool MindTheGap on both simulated and real-world data. ANISE and BASIL are available for download at http://www.seqan.de/projects/herbarium under a permissive open source license. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  1. Quantifying Next Generation Sequencing Sample Pre-Processing Bias in HIV-1 Complete Genome Sequencing

    Directory of Open Access Journals (Sweden)

    Bram Vrancken

    2016-01-01

    Full Text Available Genetic analyses play a central role in infectious disease research. Massively parallelized “mechanical cloning” and sequencing technologies were quickly adopted by HIV researchers in order to broaden the understanding of the clinical importance of minor drug-resistant variants. These efforts have, however, remained largely limited to small genomic regions. The growing need to monitor multiple genome regions for drug resistance testing, as well as the obvious benefit for studying evolutionary and epidemic processes makes complete genome sequencing an important goal in viral research. In addition, a major drawback for NGS applications to RNA viruses is the need for large quantities of input DNA. Here, we use a generic overlapping amplicon-based near full-genome amplification protocol to compare low-input enzymatic fragmentation (Nextera™ with conventional mechanical shearing for Roche 454 sequencing. We find that the fragmentation method has only a modest impact on the characterization of the population composition and that for reliable results, the variation introduced at all steps of the procedure—from nucleic acid extraction to sequencing—should be taken into account, a finding that is also relevant for NGS technologies that are now more commonly used. Furthermore, by applying our protocol to deep sequence a number of pre-therapy plasma and PBMC samples, we illustrate the potential benefits of a near complete genome sequencing approach in routine genotyping.

  2. Targeted sequencing of cancer-related genes in colorectal cancer using next-generation sequencing.

    Directory of Open Access Journals (Sweden)

    Sae-Won Han

    Full Text Available Recent advance in sequencing technology has enabled comprehensive profiling of genetic alterations in cancer. We have established a targeted sequencing platform using next-generation sequencing (NGS technology for clinical use, which can provide mutation and copy number variation data. NGS was performed with paired-end library enriched with exons of 183 cancer-related genes. Normal and tumor tissue pairs of 60 colorectal adenocarcinomas were used to test feasibility. Somatic mutation and copy number alteration were analyzed. A total of 526 somatic non-synonymous sequence variations were found in 113 genes. Among these, 278 single nucleotide variations were 232 different somatic point mutations. 216 SNV were 79 known single nucleotide polymorphisms in the dbSNP. 32 indels were 28 different indel mutations. Median number of mutated gene per tumor was 4 (range 0-23. Copy number gain (>X2 fold was found in 65 genes in 40 patients, whereas copy number loss (sequencing platform using NGS technology is feasible for clinical use and provides comprehensive genetic alteration data.

  3. Comparison of next generation sequencing technologies for transcriptome characterization

    Directory of Open Access Journals (Sweden)

    Soltis Douglas E

    2009-08-01

    Full Text Available Abstract Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19. We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica and the magnoliid avocado (Persea americana using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB, 119,518 (88.7% mapped exactly to known exons, while 1,117 (0.8% mapped to introns, 11,524 (8.6% spanned annotated intron/exon boundaries, and 3,066 (2.3% extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance

  4. NEW COOLING SEQUENCES FOR OLD WHITE DWARFS

    International Nuclear Information System (INIS)

    Renedo, I.; Althaus, L. G.; GarcIa-Berro, E.; Miller Bertolami, M. M.; Romero, A. D.; Corsico, A. H.; Rohrmann, R. D.

    2010-01-01

    We present full evolutionary calculations appropriate for the study of hydrogen-rich DA white dwarfs. This is done by evolving white dwarf progenitors from the zero-age main sequence, through the core hydrogen-burning phase, the helium-burning phase, and the thermally pulsing asymptotic giant branch phase to the white dwarf stage. Complete evolutionary sequences are computed for a wide range of stellar masses and for two different metallicities, Z = 0.01, which is representative of the solar neighborhood, and Z = 0.001, which is appropriate for the study of old stellar systems, like globular clusters. During the white dwarf cooling stage, we self-consistently compute the phase in which nuclear reactions are still important, the diffusive evolution of the elements in the outer layers and, finally, we also take into account all the relevant energy sources in the deep interior of the white dwarf, such as the release of latent heat and the release of gravitational energy due to carbon-oxygen phase separation upon crystallization. We also provide colors and magnitudes for these sequences, based on a new set of improved non-gray white dwarf model atmospheres, which include the most up-to-date physical inputs like the Lyα quasi-molecular opacity. The calculations are extended down to an effective temperature of 2500 K. Our calculations provide a homogeneous set of evolutionary cooling tracks appropriate for mass and age determinations of old DA white dwarfs and for white dwarf cosmochronology of the different Galactic populations.

  5. Stress Drops for Potentially Induced Earthquake Sequences

    Science.gov (United States)

    Huang, Y.; Beroza, G. C.; Ellsworth, W. L.

    2015-12-01

    Stress drop, the difference between shear stress acting across a fault before and after an earthquake, is a fundamental parameter of the earthquake source process and the generation of strong ground motions. Higher stress drops usually lead to more high-frequency ground motions. Hough [2014 and 2015] observed low intensities in "Did You Feel It?" data for injection-induced earthquakes, and interpreted them to be a result of low stress drops. It is also possible that the low recorded intensities could be a result of propagation effects. Atkinson et al. [2015] show that the shallow depth of injection-induced earthquakes can lead to a lack of high-frequency ground motion as well. We apply the spectral ratio method of Imanishi and Ellsworth [2006] to analyze stress drops of injection-induced earthquakes, using smaller earthquakes with similar waveforms as empirical Green's functions (eGfs). Both the effects of path and linear site response should be cancelled out through the spectral ratio analysis. We apply this technique to the Guy-Greenbrier earthquake sequence in central Arkansas. The earthquakes migrated along the Guy-Greenbrier Fault while nearby injection wells were operating in 2010-2011. Huang and Beroza [GRL, 2015] improved the magnitude of completeness to about -1 using template matching and found that the earthquakes deviated from Gutenberg-Richter statistics during the operation of nearby injection wells. We identify 49 clusters of highly similar events in the Huang and Beroza [2015] catalog and calculate stress drops using the source model described in Imanishi and Ellsworth [2006]. Our results suggest that stress drops of the Guy-Greenbrier sequence are similar to tectonic earthquakes at Parkfield, California (the attached figure). We will also present stress drop analysis of other suspected induced earthquake sequences using the same method.

  6. Mobius sequence--a Swedish multidiscipline study.

    Science.gov (United States)

    Strömland, Kerstin; Sjögreen, Lotta; Miller, Marilyn; Gillberg, Christopher; Wentz, Elisabet; Johansson, Maria; Nylén, Olle; Danielsson, Aina; Jacobsson, Catharina; Andersson, Jan; Fernell, Elisabeth

    2002-01-01

    Mobius sequence/syndrome is a rare disorder characterized by congenital palsy of the 6th and 7th cranial nerves. Other cranial nerves may be affected, skeletal and orofacial anomalies and mental retardation occur. The aims were to determine the frequency of associated clinical characteristics and to identify any pregnancy or environmental factors in patients with Mobius sequence. A prospective study of 25 Swedes with apparent involvement of the 6th and 7th cranial nerves was performed and 25 patients, 1 month to 55 years old, were examined. Obvious associated systemic anomalies observed included: limb malformations (10), Poland anomaly (2), hypodontia (7), microglossia (6), cleft palate (4), hearing impairment (5) and external ear malformation (1). Pronounced functional abnormalities were observed involving facial expression (16), speech (13), eating and swallowing (12) and difficulty in sucking in infancy (11). Six patients had an autistic syndrome, one an autistic-like condition, and mental retardation was found in all these patients. No common aetiological cause was found but their mothers' pregnancy histories revealed a history of benzodiazepines (1), bleeding during pregnancy (8), spontaneous abortion (7) and chorion villus sampling in the second month of pregnancy (1). In conclusion, many patients had multiple problems with eating and communication resulting from facial palsy, cleft palate and tongue anomalies. Autism and mental retardation was diagnosed in one-third of the patients. Awareness of the wide spectrum of manifestations in Mobius sequence will assist in identification of the associated malformations and functional problems that are often seen and result in better care of the children.

  7. Flexible taxonomic assignment of ambiguous sequencing reads

    Directory of Open Access Journals (Sweden)

    Jansson Jesper

    2011-01-01

    Full Text Available Abstract Background To characterize the diversity of bacterial populations in metagenomic studies, sequencing reads need to be accurately assigned to taxonomic units in a given reference taxonomy. Reads that cannot be reliably assigned to a unique leaf in the taxonomy (ambiguous reads are typically assigned to the lowest common ancestor of the set of species that match it. This introduces a potentially severe error in the estimation of bacteria present in the sample due to false positives, since all species in the subtree rooted at the ancestor are implicitly assigned to the read even though many of them may not match it. Results We present a method that maps each read to a node in the taxonomy that minimizes a penalty score while balancing the relevance of precision and recall in the assignment through a parameter q. This mapping can be obtained in time linear in the number of matching sequences, because LCA queries to the reference taxonomy take constant time. When applied to six different metagenomic datasets, our algorithm produces different taxonomic distributions depending on whether coverage or precision is maximized. Including information on the quality of the reads reduces the number of unassigned reads but increases the number of ambiguous reads, stressing the relevance of our method. Finally, two measures of performance are described and results with a set of artificially generated datasets are discussed. Conclusions The assignment strategy of sequencing reads introduced in this paper is a versatile and a quick method to study bacterial communities. The bacterial composition of the analyzed samples can vary significantly depending on how ambiguous reads are assigned depending on the value of the q parameter. Validation of our results in an artificial dataset confirm that a combination of values of q produces the most accurate results.

  8. Bayesian Correlation Analysis for Sequence Count Data.

    Directory of Open Access Journals (Sweden)

    Daniel Sánchez-Taltavull

    Full Text Available Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low-especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.

  9. Duplex scanning using sparse data sequences

    DEFF Research Database (Denmark)

    Møllenbach, S. K.; Jensen, Jørgen Arendt

    2008-01-01

    The velocity distribution in vessels can be displayed using duplex scanning where B-mode acquisitions are interspaced with the velocity data. This gives an image for orientation, but lowers the maximum detectable velocity by a factor of two. Other pulse sequences either omits the B-mode image...... is scaled by the factor A/T. The approach has been investigated using in vivo RF data from the Hepatic vein, Carotid artery and Aorta from a 33 year old healthy male. A B-K Medical 3535 ultrasound scanner has been used in Duplex mode with a BK 8556, 3.2 MHz linear array probe. The sampling frequency...

  10. Implicitly Defined Neural Networks for Sequence Labeling

    Science.gov (United States)

    2017-07-31

    Wall Street Journal corpus (Marcus et al., 1993), blocks 0-18, validated on 19-21, and tested on 22-24, and compared it to the results of the off-the...network are coupled together, in order to improve perfor- mance on complex, long-range dependencies in either direction of a sequence. We contrast our...architecture with a bidirectional RNN, and show that our proposed architecture the bidi- rectional network matches it’s performance on one task, while

  11. Motif discovery in ranked lists of sequences

    DEFF Research Database (Denmark)

    Nielsen, Morten Muhlig; Tataru, Paula; Madsen, Tobias

    2016-01-01

    a growing need for motif analysis methods that can exploit this coupled data structure and be tailored for specific biological questions. Here, we present an exploratory motif analysis tool, Regmex (REGular expression Motif EXplorer), which offers several methods to evaluate the correlation of motifs....... These features make Regmex well suited for a range of biological sequence analysis problems related to motif discovery, exemplified by microRNA seed enrichment, but also including enrichment problems involving complex motifs and combinations of motifs. We demonstrate a number of usage scenarios that take...

  12. Mitochondrial DNA sequence of Onychostoma rara.

    Science.gov (United States)

    Zeng, Chun-Fang; Li, Xiao-Ling; Li, Chuan-Wu; Huang, Xiang-Rong; Wan, Yi-Wen

    2015-01-01

    The complete mitochondrial genome sequence of Onychostoma rara was determined to be 16,590 bp in length and contains 13 protein-coding genes (PCGs), 22 tRNA genes, large (rrnL) and small (rrnS) rRNA and the non-coding control region. Its total A + T content is 55.65%. We also analyzed the structure of control region, 6 CSBs (CSB-1, CSB-2, CSB-3, CSB-D, CSB-E and CSB-F) and 2 bp tandem repeat were detected.

  13. Sequence learning in differentially activated dendrites

    DEFF Research Database (Denmark)

    Nielsen, Bjørn Gilbert

    2003-01-01

    to participate in multiple sequences, which can be learned without suffering from the 'wash-out' of synaptic efficacy associated with superimposition of training patterns. This is a biologically plausible solution to the stability-plasticity dilemma of learning in neural networks........ It is proposed that the neural machinery required in such a learning/retrieval mechanism could involve the NMDA receptor, in conjunction with the ability of dendrites to maintain differentially activated regions. In particular, it is suggested that such a parcellation of the dendrite allows the neuron...

  14. Novel overlapping coding sequences in Chlamydia trachomatis

    DEFF Research Database (Denmark)

    Jensen, Klaus Thorleif; Petersen, Lise; Falk, Søren

    2006-01-01

    Chlamydia trachomatis is the aetiological agent of trachoma and sexually transmitted infections. The C. trachomatis genome sequence revealed an organism adapted to the intracellular habitat with a high coding ratio and a small genome consisting of 1.042-kilobase (kb) with 895 annotated protein...... of the novel genes in C. trachomatis Serovar A and Chlamydia muridarum. Several of the genes have typical gene-like and protein-like features. Furthermore, we confirm transcriptional activity from 10 of the putative genes. The combined evidence suggests that at least seven of the 15 are protein coding genes...

  15. The 2016 Central Italy "reverse" seismic sequence

    Science.gov (United States)

    Chiaraluce, Lauro; Di Stefano, Raffaele; Tinti, Elisa; Scognamiglio, Laura; Michele, Maddalena; Cattaneo, Marco; De Gori, Pasquale; Chiarabba, Claudio; Monachesi, Giancarlo; Lombardi, Annamaria; Valoroso, Luisa; Latorre, Diana; Marzorati, Simone

    2017-04-01

    The 2016 seismic sequence consists so far of a series of moderate to large earthquakes that within three month's time activated a 60 km long segmented normal fault system located in the Central Italy and almost contiguous to the 1997 Colfiorito and 2009 L'Aquila normal fault systems. The first mainshock of the sequence occurred with MW6.0 on the 24th of August at 01:36 UTC close to the Accumoli and Amatrice villages producing evidence for centimetres' surface ruptures along the Mt. Vettore normal fault outcrop. Two months later on the 26th of October at 19:18 UTC another mainshock with MW5.9 occurred 25 km to the north activating another normal fault segment approximately on the along strike continuation of the first structure. Then, four days later on the 30th of October at 06:40 UTC the largest shock of the sequence with MW6.5 close to Norcia, in the middle part of the fault system activated two months before. We reconstruct the first order anatomy of the activated normal faults system, by analysing the spatial and temporal distribution of 25,354 aftershocks with 0.1foot-wall of the main planes. The entire fault system is constrained at depth by a 2-3km thick layer where small magnitude events plus a series of large aftershocks (up to M 4) occur. This basal layer is almost flat between 8-10km at the two edges of the fault system, while in the central portion it starts at about 6-7 km of depth to the west, reaching almost 12km to the east thus showing a gentle dip to the east. The variability observed all along the fault system in the anatomy of such a basal layer located in between the upper and lower crust suggest a thick skin tectonic as a structural style for the area. Observing the spatial relationship between the seismicity distribution and the mapped compressional structures, we detect a complex interaction. The thrusts inherited by the previous tectonic phase seems in fact to modulate in space and time the seismicity pattern evolution including the

  16. Repdigits in k-Lucas sequences

    Indian Academy of Sciences (India)

    −k) and 2 (see [14]). To simplify the notation, in general, we omit the dependence on k of α. We now consider for an integer s ≥ 2, the function fs(x) = x − 1. 2 + (s + 1)(x − 2) for x > 2(1 − 2. −s). ..... Repdigits in k-Lucas sequences. 147 where we have used the well-known facts that h(xy) ≤ h(x) + h(y) and h(x) = h(x. −1. ). In p.

  17. Comparison of 61 Sequenced Escherichia coli Genomes

    DEFF Research Database (Denmark)

    Lukjancenko, Oksana; Wassenaar, T. M.; Ussery, David

    2010-01-01

    Escherichia coli is an important component of the biosphere and is an ideal model for studies of processes involved in bacterial genome evolution. Sixty-one publically available E. coli and Shigella spp. sequenced genomes are compared, using basic methods to produce phylogenetic and proteomics......% of the pan-genome and about 80% of a typical genome; some of these variable genes tend to be co-localized on genomic islands. The diversity within the species E. coli, and the overlap in gene content between this and related species, suggests a continuum rather than sharp species borders in this group...

  18. Rapid resistome mapping using nanopore sequencing

    DEFF Research Database (Denmark)

    van der Helm, Eric; Imamovic, Lejla; Ellabaan, Mostafa M Hashim

    2017-01-01

    of bacterial infections. Yet, rapid workflows for resistome characterization are lacking. To address this challenge we developed the poreFUME workflow that deploys functional metagenomic selections and nanopore sequencing to resistome mapping. We demonstrate the approach by functionally characterizing the gut...... resistome of an ICU (intensive care unit) patient. The accuracy of the poreFUME pipeline is with >97% sufficient for the annotation of antibiotic resistance genes. The poreFUME pipeline provides a promising approach for efficient resistome profiling that could inform antibiotic treatment decisions...

  19. Small oscillations, Sturm sequences, and orthogonal polynomials

    International Nuclear Information System (INIS)

    Baake, M.

    1986-07-01

    The relation between small oscillations of one-dimensional mechanical systems and the theory of orthogonal polynomials is investigated. It is shown how the polynomials provide a natural tool to determine the eigenfrequencies and eigencoordinates completely, where the existence of a certain two-termed recurrence formula is essential. Physical and mathematical statements are formulated in terms of the recursion coefficients which can directly be obtained from the corresponding secular equation. Several known as well as new results on Sturm sequences and orthogonal polynomials are presented with respect to the treatment of small oscillations. (orig.)

  20. Evolutionary insights from suffix array-based genome sequence ...

    Indian Academy of Sciences (India)

    2007-08-06

    Aug 6, 2007 ... Keywords. Biological language modelling toolkit (BLMT); genome sequence analysis; n-grams; pattern matching; suffix arrays; suffix trees; short peptide sequences genetic code bias ...