WorldWideScience

Sample records for abrf edman sequencing

  1. Broad coverage identification of multiple proteolytic cleavage site sequences in complex high molecular weight proteins using quantitative proteomics as a complement to edman sequencing.

    Doucet, Alain; Overall, Christopher M

    2011-05-01

    Proteolytic processing modifies the pleiotropic functions of many large, complex, and modular proteins and can generate cleavage products with new biological activity. The identification of exact proteolytic cleavage sites in the extracellular matrix laminins, fibronectin, and other extracellular matrix proteins is not only important for understanding protein turnover but is needed for the identification of new bioactive cleavage products. Several such products have recently been recognized that are suggested to play important cellular regulatory roles in processes, including angiogenesis. However, identifying multiple cleavage sites in extracellular matrix proteins and other large proteins is challenging as N-terminal Edman sequencing of multiple and often closely spaced cleavage fragments on SDS-PAGE gels is difficult, thus limiting throughput and coverage. We developed a new liquid chromatography-mass spectrometry approach we call amino-terminal oriented mass spectrometry of substrates (ATOMS) for the N-terminal identification of protein cleavage fragments in solution. ATOMS utilizes efficient and low cost dimethylation isotopic labeling of original N-terminal and proteolytically generated N termini of protein cleavage fragments followed by quantitative tandem mass spectrometry analysis. Being a peptide-centric approach, ATOMS is not dependent on the SDS-PAGE resolution limits for protein fragments of similar mass. We demonstrate that ATOMS reliably identifies multiple proteolytic sites per reaction in complex proteins. Fifty-five neutrophil elastase cleavage sites were identified in laminin-1 and fibronectin-1 with 34 more identified by matrix metalloproteinase cleavage. Hence, our degradomics approach offers a complimentary alternative to Edman sequencing with broad applicability in identifying N termini such as cleavage sites in complex high molecular weight extracellular matrix proteins after in vitro cleavage assays. ATOMS can therefore be useful in

  2. Computer-aided design of a catalyst for Edman degradation utilizing substrate-assisted catalysis

    Borgo, Benjamin; Havranek, James J.

    2014-01-01

    Molecular biology has been revolutionized by the miniaturization and parallelization of DNA sequencing assays previously performed on bulk samples. Many of these technologies rely on biomolecular reagents to facilitate detection, synthesis, or labeling of samples. To aid in the construction of analogous experimental approaches for proteins and peptides, we have used computer-aided design to engineer an enzyme capable of catalyzing the cleavage step of the Edman degradation. We exploit the sim...

  3. A photothermally responsive nanoprobe for bioimaging based on Edman degradation

    Liu, Yi; Wang, Zhantong; Zhang, Huimin; Lang, Lixin; Ma, Ying; He, Qianjun; Lu, Nan; Huang, Peng; Liu, Yijing; Song, Jibin; Liu, Zhibo; Gao, Shi; Ma, Qingjie; Kiesewetter, Dale O.; Chen, Xiaoyuan

    2016-05-01

    A new type of photothermally responsive nanoprobe based on Edman degradation has been synthesized and characterized. Under irradiation by an 808 nm laser, the heat generated by the gold nanorod core breaks the thiocarbamide structure and releases the fluorescent dye Cy5.5 with increased near-infrared (NIR) fluorescence under mild acidic conditions. This RGD modified nanoprobe is capable of fluorescence imaging of ανβ3 over-expressing U87MG cells in vitro and in vivo. This Edman degradation-based nanoprobe provides a novel strategy to design activatable probes for biomedical imaging and drug/gene delivery.A new type of photothermally responsive nanoprobe based on Edman degradation has been synthesized and characterized. Under irradiation by an 808 nm laser, the heat generated by the gold nanorod core breaks the thiocarbamide structure and releases the fluorescent dye Cy5.5 with increased near-infrared (NIR) fluorescence under mild acidic conditions. This RGD modified nanoprobe is capable of fluorescence imaging of ανβ3 over-expressing U87MG cells in vitro and in vivo. This Edman degradation-based nanoprobe provides a novel strategy to design activatable probes for biomedical imaging and drug/gene delivery. Electronic supplementary information (ESI) available: HPLC, MS and 1H NMR spectrum. See DOI: 10.1039/c6nr01400c

  4. Can Edman degradation be used for quantification? Isotope-dilution liquid chromatography-electrospray ionization tandem mass spectrometry and the long-term stability of 20 phenylthiohydantoin-amino acids.

    Satoh, Ryo; Goto, Takaaki; Lee, Seon Hwa; Oe, Tomoyuki

    2013-10-01

    Edman degradation is a well-known method for obtaining amino acid (AA) sequences from a peptide by means of sequential reactions that release the N-terminal AAs from the peptide as a phenylthiohydantoin (PTH) derivative. Because of unexpected loss during the reaction and handling, there are few reports of use of this reaction for quantification. This manuscript describes the development of isotope-dilution liquid chromatography-electrospray ionization tandem mass spectrometry for 20 PTH-AA derivatives, and long-term stability testing of PTH-AAs to ensure quantitative quality in the reaction. The 20 corresponding [(13)C6]-PTH-AAs were prepared by use of a one-pot reaction involving a mixture of [(13)C6]-Edman reagent and 20 AAs. Good linearity was observed for standard curves for the PTH-AAs, using the corresponding [(13)C6]-PTH-AAs as internal standards (1-100 pmol per injection, r(2) = 0.989-1.000). Serum albumin (human), pepsin (porcine stomach mucosa), α-casein (bovine milk), ribonuclease A (bovine), lysozyme (chicken egg white), and insulin (bovine) subjected to Edman degradation were examined as model proteins and peptides for N-terminal AA analysis. The results of the impurity test were satisfactory. Yield from the entire reaction with human serum albumin was estimated to be at least 75%, indicating great potential for absolute quantification of proteins without protein standards. PMID:23545858

  5. Broad Coverage Identification of Multiple Proteolytic Cleavage Site Sequences in Complex High Molecular Weight Proteins Using Quantitative Proteomics as a Complement to Edman Sequencing*

    Doucet, Alain; Christopher M Overall

    2010-01-01

    Proteolytic processing modifies the pleiotropic functions of many large, complex, and modular proteins and can generate cleavage products with new biological activity. The identification of exact proteolytic cleavage sites in the extracellular matrix laminins, fibronectin, and other extracellular matrix proteins is not only important for understanding protein turnover but is needed for the identification of new bioactive cleavage products. Several such products have recently been recognized t...

  6. Homology of amino acid sequences of rat liver cathepsins B and H with that of papain.

    Takio, K; Towatari, T; Katunuma, N.; Teller, D C; Titani, K

    1983-01-01

    The amino acid sequences of rat liver lysosomal thiol endopeptidases, cathepsins B and H, are presented and compared with that of the plant thiol protease papain. The 252-residue sequence of cathepsin B and the 220-residue sequence of cathepsin H were determined largely by automated Edman degradation of their intact polypeptide chains and of the two chains of each enzyme generated by limited proteolysis. Subfragments of the chains were produced by enzymatic digestion and by chemical cleavage ...

  7. Isoforms of a cuticular protein from larvae of the meal beetle, Tenebrio molitor, studied by mass spectrometry in combination with Edman degradation and two-dimensional polyacrylamide gel electrophor

    Haebel, Sophie; Jensen, Charlotte; Andersen, Svend Olav; Roepstorff, Peter

    Allelic variants, electroelution, gel electrophoresis, insect cuticle proteins, mass spectrometry, protein sequencing.......Allelic variants, electroelution, gel electrophoresis, insect cuticle proteins, mass spectrometry, protein sequencing....

  8. Amino acid sequence of the cold-active alkaline phosphatase from Atlantic cod (Gadus morhua)

    Asgeirsson, Bjarni; Nielsen, Berit Noesgaard; Højrup, Peter

    2003-01-01

    Atlantic cod is a marine fish that lives at low temperatures of 0-10 degrees C and contains a cold-adapted alkaline phosphatase (AP). Preparations of AP from either the lower part of the intestines or the pyloric caeca area were subjected to proteolytic digestion, mass spectrometry and amino acid...... sequencing by Edman degradation. The primary structure exhibits greatest similarity to human tissue non-specific AP (80%), and approximately 30% similarity to AP from Escherichia coli. The key residues required for catalysis are conserved in the cod AP, except for the third metal binding site, where cod AP...

  9. Complete amino acid sequence of the myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani.

    Jones, B N; Wang, C C; Dwulet, F E; Lehman, L D; Meuth, J L; Bogardt, R A; Gurd, F R

    1979-04-25

    The complete amino acid sequence of the major component myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani, was determined by the automated Edman degradation of several large peptides obtained by specific cleavage of the protein. The acetimidated apomyoglobin was selectively cleaved at its two methionyl residues with cyanogen bromide and at its three arginyl residues by trypsin. By subjecting four of these peptides and the apomyoglobin to automated Edman degradation, over 80% of the primary structure of the protein was obtained. The remainder of the covalent structure was determined by the sequence analysis of peptides that resulted from further digestion of the central cyanogen bromide fragment. This fragment was cleaved at its glutamyl residues with staphylococcal protease and its lysyl residues with trypsin. The action of trypsin was restricted to the lysyl residues by chemical modification of the single arginyl residue of the fragment with 1,2-cyclohexanedione. The primary structure of this myoglobin proved to be identical with that from the Atlantic bottlenosed dolphin and Pacific common dolphin but differs from the myoglobins of the killer whale and pilot whale at two positions. The above sequence identities and differences reflect the close taxonomic relationship of these five species of Cetacea. PMID:454657

  10. cDNA and deduced amino acid sequence of human pulmonary surfactant-associated proteolipid SPL(Phe).

    Glasser, S W; Korfhagen, T R; Weaver, T.; Pilot-Matias, T; Fox, J L; Whitsett, J A

    1987-01-01

    Hydrophobic surfactant-associated protein of Mr 6000-14,000 was isolated from ether/ethanol or chloroform/methanol extracts of mammalian pulmonary surfactant. Automated Edman degradation in a gas-phase sequencer showed the major N-terminus of the human low molecular weight protein to be Phe-Pro-Ile-Pro-Leu-Pro-Tyr-Cys-Trp-Leu-Cys-Arg-Ala-Leu-. Because of the N-terminal phenylalanine, the surfactant protein was designated SPL(Phe). Antiserum generated against hydrophobic surfactant protein(s) ...

  11. Sequence homology between the subunits of two immunologically and functionally distinct types of fimbriae of Actinomyces spp.

    Yeung, M K; Cisar, J O

    1990-01-01

    Nucleotide sequencing of the type 1 fimbrial subunit gene of Actinomyces viscosus T14V revealed a consensus ribosome-binding site followed by an open reading frame of 1,599 nucleotides. The encoded protein of 533 amino acids (Mr = 56,899) was predominantly hydrophilic except for an amino-terminal signal peptide and a carboxy-terminal region identified as a potential membrane-spanning segment. Edman degradation of the cloned protein expressed in Escherichia coli and the type 1 fimbriae of A. v...

  12. Nucleotide sequence analysis of the gene encoding the Deinococcus radiodurans surface protein, derived amino acid sequence, and complementary protein chemical studies

    Peters, J.; Peters, M.; Lottspeich, F.; Schaefer, W.; Baumeister, W.

    1987-11-01

    The complete nucleotide sequence of the gene encoding the surface (hexagonally packed intermediate (HPI))-layer polypeptide of Deinococcus radiodurans Sark was determined and found to encode a polypeptide of 1036 amino acids. Amino acid sequence analysis of about 30% of the residues revealed that the mature polypeptide consists of at least 978 amino acids. The N terminus was blocked to Edman degradation. The results of proteolytic modification of the HPI layer in situ and M/sub r/ estimations of the HPI polypeptide expressed in Escherichia coli indicated that there is a leader sequence. The N-terminal region contained a very high percentage (29%) of threonine and serine, including a cluster of nine consecutive serine or threonine residues, whereas a stretch near the C terminus was extremely rich in aromatic amino acids (29%). The protein contained at least two disulfide bridges, as well as tightly bound reducing sugars and fatty acids.

  13. Nucleotide sequence analysis of the gene encoding the Deinococcus radiodurans surface protein, derived amino acid sequence, and complementary protein chemical studies

    The complete nucleotide sequence of the gene encoding the surface (hexagonally packed intermediate [HPI])-layer polypeptide of Deinococcus radiodurans Sark was determined and found to encode a polypeptide of 1036 amino acids. Amino acid sequence analysis of about 30% of the residues revealed that the mature polypeptide consists of at least 978 amino acids. The N terminus was blocked to Edman degradation. The results of proteolytic modification of the HPI layer in situ and M/sub r/ estimations of the HPI polypeptide expressed in Escherichia coli indicated that there is a leader sequence. The N-terminal region contained a very high percentage (29%) of threonine and serine, including a cluster of nine consecutive serine or threonine residues, whereas a stretch near the C terminus was extremely rich in aromatic amino acids (29%). The protein contained at least two disulfide bridges, as well as tightly bound reducing sugars and fatty acids

  14. Protein chemotaxonomy. XIII. Amino acid sequence of ferredoxin from Panax ginseng.

    Mino, Yoshiki

    2006-08-01

    The complete amino acid sequence of [2Fe-2S] ferredoxin from Panax ginseng (Araliaceae) has been determined by automated Edman degradation of the entire S-carboxymethylcysteinyl protein and of the peptides obtained by enzymatic digestion. This ferredoxin has a unique amino acid sequence, which includes an insertion of Tyr at the 3rd position from the amino-terminus and a deletion of two amino acid residues at the carboxyl terminus. This ferredoxin had 18 differences in its amino acid sequence compared to that of Petroselinum sativum (Umbelliferae). In contrast, 23-33 differences were observed compared to other dicotyledonous plants. This suggests that Panax ginseng is related taxonomically to umbelliferous plants. PMID:16880642

  15. Two distinct ferredoxins from Rhodobacter capsulatus: complete amino acid sequences and molecular evolution.

    Saeki, K; Suetsugu, Y; Yao, Y; Horio, T; Marrs, B L; Matsubara, H

    1990-09-01

    Two distinct ferredoxins were purified from Rhodobacter capsulatus SB1003. Their complete amino acid sequences were determined by a combination of protease digestion, BrCN cleavage and Edman degradation. Ferredoxins I and II were composed of 64 and 111 amino acids, respectively, with molecular weights of 6,728 and 12,549 excluding iron and sulfur atoms. Both contained two Cys clusters in their amino acid sequences. The first cluster of ferredoxin I and the second cluster of ferredoxin II had a sequence, CxxCxxCxxxCP, in common with the ferredoxins found in Clostridia. The second cluster of ferredoxin I had a sequence, CxxCxxxxxxxxCxxxCM, with extra amino acids between the second and third Cys, which has been reported for other photosynthetic bacterial ferredoxins and putative ferredoxins (nif-gene products) from nitrogen-fixing bacteria, and with a unique occurrence of Met. The first cluster of ferredoxin II had a CxxCxxxxCxxxCP sequence, with two additional amino acids between the second and third Cys, a characteristics feature of Azotobacter-[3Fe-4S] [4Fe-4S]-ferredoxin. Ferredoxin II was also similar to Azotobacter-type ferredoxins with an extended carboxyl (C-) terminal sequence compared to the common Clostridium-type. The evolutionary relationship of the two together with a putative one recently found to be encoded in nifENXQ region in this bacterium [Moreno-Vivian et al. (1989) J. Bacteriol. 171, 2591-2598] is discussed. PMID:2277040

  16. Automatic sequences

    Haeseler, Friedrich

    2003-01-01

    Automatic sequences are sequences which are produced by a finite automaton. Although they are not random they may look as being random. They are complicated, in the sense of not being not ultimately periodic, they may look rather complicated, in the sense that it may not be easy to name the rule by which the sequence is generated, however there exists a rule which generates the sequence. The concept automatic sequences has special applications in algebra, number theory, finite automata and formal languages, combinatorics on words. The text deals with different aspects of automatic sequences, in particular:· a general introduction to automatic sequences· the basic (combinatorial) properties of automatic sequences· the algebraic approach to automatic sequences· geometric objects related to automatic sequences.

  17. CSTX-9, a toxic peptide from the spider Cupiennius salei: amino acid sequence, disulphide bridge pattern and comparison with other spider toxins containing the cystine knot structure.

    Schalle, J; Kämpfer, U; Schürch, S; Kuhn-Nentwig, L; Haeberli, S; Nentwig, W

    2001-09-01

    CSTX-9 (68 residues, 7530.9 Da) is one of the most abundant toxic polypeptides in the venom of the wandering spider Cupiennius salei. The amino acid sequence was determined by Edman degradation using reduced and alkylated CSTX-9 and peptides generated by cleavages with endoproteinase Asp-N and trypsin, respectively. Sequence comparison with CSTX-1, the most abundant and the most toxic polypeptide in the crude spider venom, revealed a high degree of similarity (53% identity). By means of limited proteolysis with immobilised trypsin and RP-HPLC, the cystine-containing peptides of CSTX-9 were isolated and the disulphide bridges were assigned by amino acid analysis, Edman degradation and nanospray tandem mass spectrometry. The four disulphide bonds present in CSTX-9 are arranged in the following pattern: 1-4, 2-5, 3-8 and 6-7 (Cys6-Cys21, Cys13-Cys30, Cys20-Cys48, Cys32-Cys46). Sequence comparison of CSTX-1 with CSTX-9 clearly indicates the same disulphide bridge pattern, which is also found in other spider polypeptide toxins, e.g. agatoxins (omega-AGA-IVA, omega-AGA-IVB, mu-AGA-I and mu-AGA-VI) from Agelenopsis aperta, SNX-325 from Segestria florentina and curtatoxins (CT-I, CT-II and CT-III) from Hololena curta. CSTX-1/CSTX-9 belong to the family of ion channel toxins containing the inhibitor cystine knot structural motif. CSTX-9, lacking the lysine-rich C-terminal tail of CSTX-1, exhibits a ninefold lower toxicity to Drosophila melanogaster than CSTX-1. This is in accordance with previous observations of CSTX-2a and CSTX-2b, two truncated forms of CSTX-1 which, like CSTX-9, also lack the C-terminal lysine-rich tail. PMID:11693532

  18. Complete amino acid sequence of branched-chain amino acid aminotransferase (transaminase B) of Salmonella typhimurium, identification of the coenzyme-binding site and sequence comparison analysis

    The complete amino acid sequence of the subunit of branched-chain amino acid aminotransferase of Salmonella typhimurium was determined by automated Edman degradation of peptide fragments generated by chemical and enzymatic digestion of S-carboxymethylated and S-pyridylethylated transaminase B. Peptide fragments of transaminase B were generated by treatment of the enzyme with trypsin, Staphylococcus aureus V8 protease, endoproteinase Lys-C, and cyanogen bromide. Protocols were developed for separation of the peptide fragments by reverse-phase high performance liquid chromatography (HPLC), ion-exchange HPLC, and SDS-urea gel electrophoresis. The enzyme subunit contains 308 amino acid residues and has a molecular weight of 33,920 daltons. The coenzyme-binding site was determined by treatment of the enzyme, containing bound pyridoxal 5-phosphate, with tritiated sodium borohydride prior to trypsin digestion. Monitoring radioactivity incorporation and peptide map comparisons with an apoenzyme tryptic digest, allowed identification of the pyridoxylated-peptide which was isolated by reverse-phase HPLC and sequenced. The coenzyme-binding site is a lysyl residue at position 159. Some peptides were further characterized by fast atom bombardment mass spectrometry

  19. Sequence assembly

    Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria;

    2009-01-01

    Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and...... plays an important role in processing the information generated by these methods. Here, we provide a comprehensive overview of the current publicly available sequence assembly programs. We describe the basic principles of computational assembly along with the main concerns, such as repetitive sequences...... in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html....

  20. Genome Sequencing

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  1. De novo sequencing of two novel peptides homologous to calcitonin-like peptides, from skin secretion of the Chinese Frog, Odorrana schmackeri

    Geisa P.C. Evaristo

    2015-09-01

    Full Text Available An MS/MS based analytical strategy was followed to solve the complete sequence of two new peptides from frog (Odorrana schmackeri skin secretion. This involved reduction and alkylation with two different alkylating agents followed by high resolution tandem mass spectrometry. De novo sequencing was achieved by complementary CID and ETD fragmentations of full-length peptides and of selected tryptic fragments. Heavy and light isotope dimethyl labeling assisted with annotation of sequence ion series. The identified primary structures are GCD[I/L]STCATHN[I/L]VNE[I/L]NKFDKSKPSSGGVGPESP-NH2 and SCNLSTCATHNLVNELNKFDKSKPSSGGVGPESF-NH2, i.e. two carboxyamidated 34 residue peptides with an aminoterminal intramolecular ring structure formed by a disulfide bridge between Cys2 and Cys7. Edman degradation analysis of the second peptide positively confirmed the exact sequence, resolving I/L discriminations. Both peptide sequences are novel and share homology with calcitonin, calcitonin gene related peptide (CGRP and adrenomedullin from other vertebrates. Detailed sequence analysis as well as the 34 residue length of both O. schmackeri peptides, suggest they do not fully qualify as either calcitonins (32 residues or CGRPs (37 amino acids and may justify their classification in a novel peptide family within the calcitonin gene related peptide superfamily. Smooth muscle contractility assays with synthetic replicas of the S–S linked peptides on rat tail artery, uterus, bladder and ileum did not reveal myotropic activity.

  2. Insertion sequences.

    Mahillon, Jacques; Chandler, M.

    1998-01-01

    Insertion sequences (ISs) constitute an important component of most bacterial genomes. Over 500 individual ISs have been described in the literature to date, and many more are being discovered in the ongoing prokaryotic and eukaryotic genome-sequencing projects. The last 10 years have also seen some striking advances in our understanding of the transposition process itself. Not least of these has been the development of various in vitro transposition systems for both prokaryotic and eukaryoti...

  3. Main: Sequences [KOME

    Full Text Available Sequences Nucleotide Sequence Nucleotide sequence of full length cDNA (trimmed sequence) kome_ine_full_seque...nce_db.fasta.zip kome_ine_full_sequence_db.zip kome_ine_full_sequence_db ...

  4. [Sequencing babies?].

    Jordan, Bertrand

    2015-10-01

    An extension of newborn screening to genome sequencing is now feasible but raises a number of scientific, organisational and ethical issues. This is being explored in discussions and in several funded trials, in order to maximize benefits and avoid some identified risks. As some companies are already offering such a service, this is quite an urgent matter. PMID:26481033

  5. Operational Sequencing

    Nielsen, Thomas Rosendal; Hustvedt, Kjersti

    2016-01-01

    Bakhtinian theory, Brian Edmiston developed a solution to this in the 1990s: the principle of ‘dialogic sequencing’. Aiming to escape the conflict between relativism and absolutism, we present an alternative to Edmiston’s approach, based on Niklas Luhmann’s theory of ‘operational closure’: operational...... sequencing. The principle is presented in the context of the previous debate between Edmiston and Joe Winston, and its application is demonstrated and assessed in our prototype process drama, Fertility Miracles....

  6. Novel proline-hydroxyproline glycopeptides from the dandelion (Taraxacum officinale Wigg.) flowers: de novo sequencing and biological activity.

    Astafieva, Alexandra A; Enyenihi, Atim A; Rogozhin, Eugene A; Kozlov, Sergey A; Grishin, Eugene V; Odintsova, Tatyana I; Zubarev, Roman A; Egorov, Tsezi A

    2015-09-01

    Two novel homologous peptides named ToHyp1 and ToHyp2 that show no similarity to any known proteins were isolated from Taraxacum officinale Wigg. flowers by multidimensional liquid chromatography. Amino acid and mass spectrometry analyses demonstrated that the peptides have unusual structure: they are cysteine-free, proline-hydroxyproline-rich and post-translationally glycosylated by pentoses, with 5 carbohydrates in ToHyp2 and 10 in ToHyp1. The ToHyp2 peptide with a monoisotopic molecular mass of 4350.3Da was completely sequenced by a combination of Edman degradation and de novo sequencing via top down multistage collision induced dissociation (CID) and higher energy dissociation (HCD) tandem mass spectrometry (MS(n)). ToHyp2 consists of 35 amino acids, contains eighteen proline residues, of which 8 prolines are hydroxylated. The peptide displays antifungal activity and inhibits growth of Gram-positive and Gram-negative bacteria. We further showed that carbohydrate moieties have no significant impact on the peptide structure, but are important for antifungal activity although not absolutely necessary. The deglycosylated ToHyp2 peptide was less active against the susceptible fungus Bipolaris sorokiniana than the native peptide. Unique structural features of the ToHyp2 peptide place it into a new family of plant defense peptides. The discovery of ToHyp peptides in T. officinale flowers expands the repertoire of molecules of plant origin with practical applications. PMID:26259198

  7. The α and β chains of human platelet glycoprotein Ib are both transmembrane proteins containing a leucine-rich amino acid sequence

    The primary structure of the β chain of human glycoprotein Ib (GPIb), the platelet receptor for von Willebrand factor, has been established by a combination of cDNA cloning and amino acid sequence analysis. A λ phage cDNA expression library prepared from human erythroleukemia cells (HEL cells) was screened with a radiolabeled affinity-purified rabbit polyclonal antibody to the β chain of GPIb. Eighteen positive clones were isolated and plaque-purified and the nucleotide sequences of three were determined. The composite sequence spanned 968 nucleotides and included a 5' untranslated region of 22 nucleotides, an open reading frame of 618 nucleotides encoding a signal peptide of 28 amino acids and a mature protein of 181 amino acids, a stop codon, and a 3' noncoding region of 307 nucleotides. The 3' noncoding sequence also contained a polyadenylylation signal (AATAAA) 14 nucleotides upstream from the poly(A) tail of 18 nucleotides. Edman degradation of the intact β chain and of peptides produced by chemical cleavage yielded amino acid sequences spanning 76 residues that were identical to those predicted from the cDNA. The amino-terminal region of the β chain contains a leucine-rich sequence of 24 amino acids that is similar to a sequence that occurs as seven tandem repeats in the α chain of GPIb and nine tandem repeats in leucine-rich α2-glycoprotein. The amino-terminal region of the β chain of GPIb is followed by a transmembrane segment of 25 amino acids and an intracellular segment of 34 amino acids at the carboxyl terminus of the protein

  8. Main: Sequences [KOME

    Full Text Available Sequences Amino Acid Sequence Amino Acid sequence of full length cDNA (Longest ORF) kome_ine_full_sequence..._amino_db.fasta.zip kome_ine_full_sequence_amino_db.zip kome_ine_full_sequence_amino_db ...

  9. Sequence to Sequence Learning with Neural Networks

    Sutskever, Ilya; Vinyals, Oriol; Le, Quoc V.

    2014-01-01

    Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensi...

  10. Purification and amino acid sequence of a bacteriocins produced by Lactobacillus salivarius K7 isolated from chicken intestine

    Kenji Sonomoto

    2006-03-01

    Full Text Available A bacteriocin-producing strain, Lactobacillus K7, was isolated from a chicken intestine. The inhibitory activity was determined by spot-on-lawn technique. Identification of the strain was performed by morphological, biochemical (API 50 CH kit and molecular genetic (16S rDNA basis. Bacteriocin purification processes were carried out by amberlite adsorption, cation exchange and reverse-phase high perform- ance liquid chromatography. N-terminal amino acid sequences were performed by Edman degradation. Molecular mass was determined by electrospray-ionization (ESI mass spectrometry (MS. Lactobacillus K7 showed inhibitory activity against Lactobacillus sakei subsp. sakei JCM 1157T, Leuconostoc mesenteroides subsp. mesenteroides JCM 6124T and Bacillus coagulans JCM 2257T. This strain was identified as Lb. salivarius. The antimicrobial substance was destroyed by proteolytic enzymes, indicating its proteinaceous structure designated as a bacteriocin type. The purification of bacteriocin by amberlite adsorption, cation exchange, and reverse-phase chromatography resulted in only one single active peak, which was designated FK22. Molecular weight of this fraction was 4331.70 Da. By amino acid sequence, this peptide was homology to Abp 118 beta produced by Lb. salivarius UCC118. In addition, Lb. salivarius UCC118 produced 2-peptide bacteriocin, which was Abp 118 alpha and beta. Based on the partial amino acid sequences of Abp 118 beta, specific primers were designed from nucleotide sequences according to data from GenBank. The result showed that the deduced peptide was high homology to 2-peptide bacteriocin, Abp 118 alpha and beta.

  11. Amino acid sequence and posttranslational modifications of human factor VIIa from plasma and transfected baby hamster kidney cells

    Blood coagulation factor VII is a vitamin K dependent glycoprotein which in its activated form, factor VIIa, participates in the coagulation process by activating factor X and/or factor IX in the presence of Ca2+ and tissue factor. Three types of potential posttranslational modifications exist in the human factor VIIa molecule, namely, 10 γ-carboxylated, N-terminally located glutamic acid residues, 1 β-hydroxylated aspartic acid residue, and 2 N-glycosylated asparagine residues. In the present study, the amino acid sequence and posttranslational modifications of recombinant factor VIIa as purified from the culture medium of a transfected baby hamster kidney cell line have been compared to human plasma factor VIIa. By use of HPLC, amino acid analysis, peptide mapping, and automated Edman degradation, the protein backbone of recombinant factor VIIa was found to be identical with human factor VIIa. Asparagine residues 145 and 322 were found to be fully N-glycosylated in human plasma factor VIIa. In the recombinant factor VIIa, asparagine residue 322 was fully glycosylated whereas asparagine residue 145 was only partially (approximately 66%) glycosylated. Besides minor differences in the sialic acid and fucose contents, the overall carbohydrate compositions were nearly identical in recombinant factor VIIa and human plasma factor VIIa. These results show that factor VIIa as produced in the transfected baby hamster kidney cells is very similar to human plasma factor VIIa and that this cell line thus might represent an alternative source for human factor VIIa

  12. High performance liquid chromatography purification and amino acid sequence of toxins from the muscarinic fraction of Tityus discrepans scorpion venom.

    D'Suze, G; Corona, F; Possani, L D; Sevcik, C

    1996-05-01

    Tityus discrepans venom was fractionated by gel filtration on Sephadex G-50 column. The peptides in fraction II from Sephadex were further purified by high performance liquid chromatography, through a C4 reverse-phase column. Lethality of purified peptides was determined by injection into mice and crabs, and their effects were verified electrophysiologically on frog (Hyla crepitans) sartorius neuromuscular junction. Toxins having retention times between 39.6 and 40.7 min depolarized the muscle membrane and caused acetylcholine release at the endplate. The toxin eluted at 42.67 min increased the frequency of miniature endplate potentials without depolarizing muscle fibres. The four most active toxins were reduced, carboxymethylated and sequenced by automatic Edman degradation and named TdII-1 to II-4. Toxin gamma from Tityus serrulatus venom and the toxins from T. discrepans venom were found to be structurally distinct. TdII-1 to II-4 lack the pancreatic effects of T. serrulatus' toxin gamma; yet, the five toxins act on Na+ channels. PMID:8783453

  13. Classifying Genomic Sequences by Sequence Feature Analysis

    Zhi-Hua Liu; Dian Jiao; Xiao Sun

    2005-01-01

    Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream,exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis.

  14. Multi-task Sequence to Sequence Learning

    Luong, Minh-Thang; Le, Quoc V.; Sutskever, Ilya; Vinyals, Oriol; Kaiser, Lukasz

    2015-01-01

    Sequence to sequence learning has recently emerged as a new paradigm in supervised learning. To date, most of its applications focused on only one task and not much work explored this framework for multiple tasks. This paper examines three multi-task learning (MTL) settings for sequence to sequence models: (a) the oneto-many setting - where the encoder is shared between several tasks such as machine translation and syntactic parsing, (b) the many-to-one setting - useful when only the decoder ...

  15. Shotgun protein sequencing.

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  16. Whole Genome Sequencing

    ... you want to learn. Search form Search Whole Genome Sequencing You are here Home Testing & Services Testing ... the full story, click here . What is whole genome sequencing? Whole genome sequencing is the mapping out ...

  17. Sequence Read Archive (SRA)

    U.S. Department of Health & Human Services — The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome...

  18. Science sequence design

    Koskela, P. E.; Bollman, W. E.; Freeman, J. E.; Helton, M. R.; Reichert, R. J.; Travers, E. S.; Zawacki, S. J.

    1973-01-01

    The activities of the following members of the Navigation Team are recorded: the Science Sequence Design Group, responsible for preparing the final science sequence designs; the Advanced Sequence Planning Group, responsible for sequence planning; and the Science Recommendation Team (SRT) representatives, responsible for conducting the necessary sequence design interfaces with the teams during the mission. The interface task included science support in both advance planning and daily operations. Science sequences designed during the mission are also discussed.

  19. Consensus Sequence Zen

    Schneider, Thomas D.

    2002-01-01

    Consensus sequences are widely used in molecular biology but they have many flaws. As a result, binding sites of proteins and other molecules are missed during studies of genetic sequences and important biological effects cannot be seen. Information theory provides a mathematically robust way to avoid consensus sequences. Instead of using consensus sequences, sequence conservation can be quantitatively presented in bits of information by using sequence logo graphics to repre...

  20. The generalized quaternion sequence

    Deveci, Ömür

    2016-04-01

    In this work, we define the recurrence sequence by using the relation matrix of the generalized quaternion group and then, we obtain miscellaneous properties of this sequence. Also, we obtain the cyclic groups and the semigroups which are produced by generating matrix of the sequence defined when read modulo m. Furthermore, we study this sequence modulo m, and then we derive the relationship among the order the cyclic groups obtained and the periods of the sequence defined.

  1. Polynomially Bounded Sequences and Polynomial Sequences

    Okazaki Hiroyuki

    2015-09-01

    Full Text Available In this article, we formalize polynomially bounded sequences that plays an important role in computational complexity theory. Class P is a fundamental computational complexity class that contains all polynomial-time decision problems [11], [12]. It takes polynomially bounded amount of computation time to solve polynomial-time decision problems by the deterministic Turing machine. Moreover we formalize polynomial sequences [5].

  2. Identification of the phosphorylation sites in early region 1A proteins of adenovirus type 5 by amino acid sequencing of peptide fragments

    Tremblay, M.L.; McGlade, C.J.; Gerber, G.E.; Branton, P.E.

    1988-05-05

    The authors have mapped the positions of three of the phosphorylation sites on the 289 and 243 residue (289R and 243R) early region 1A (E1A) proteins of human adenovirus type 5 (Ad5). These proteins, which play roles in both transcriptional control and oncogenic transformation, have identical sequences except for the presence in 289R of 46 additional internal amino acids. Phosphorylation was detected exclusively at serine residues. E1A proteins purified from (/sup 35/S)methionine- or (/sup 32/P)orthophosphate-labeled Ad5-infected cells were digested with trypsin, and two phosphopeptides were isolated by reverse-phase chromatography and subjected to automated Edman degradation. The major species was shown to contain a single phosphorylation site at Ser-219. The second phosphopeptide was shown to contain at least one phosphorylation site at Ser-231. A third phosphorylated tryptic peptide could not be eluted from the column but was isolated using an E1A-specific rat monoclonal antibody. Following subcleavage by Staphylococcus aureus V-8 protease, this peptide was shown to contain at least one phosphorylation site at Ser-89. The present data indicate that both the 289R and 243R E1A proteins are phosphorylated at the same sites, at least one in the amino terminal half of the molecule, and at least two toward the carboxyl terminus.

  3. Identification of the phosphorylation sites in early region 1A proteins of adenovirus type 5 by amino acid sequencing of peptide fragments

    The authors have mapped the positions of three of the phosphorylation sites on the 289 and 243 residue (289R and 243R) early region 1A (E1A) proteins of human adenovirus type 5 (Ad5). These proteins, which play roles in both transcriptional control and oncogenic transformation, have identical sequences except for the presence in 289R of 46 additional internal amino acids. Phosphorylation was detected exclusively at serine residues. E1A proteins purified from [35S]methionine- or [32P]orthophosphate-labeled Ad5-infected cells were digested with trypsin, and two phosphopeptides were isolated by reverse-phase chromatography and subjected to automated Edman degradation. The major species was shown to contain a single phosphorylation site at Ser-219. The second phosphopeptide was shown to contain at least one phosphorylation site at Ser-231. A third phosphorylated tryptic peptide could not be eluted from the column but was isolated using an E1A-specific rat monoclonal antibody. Following subcleavage by Staphylococcus aureus V-8 protease, this peptide was shown to contain at least one phosphorylation site at Ser-89. The present data indicate that both the 289R and 243R E1A proteins are phosphorylated at the same sites, at least one in the amino terminal half of the molecule, and at least two toward the carboxyl terminus

  4. Genome Sequence Databases (Overview): Sequencing and Assembly

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  5. Roles of repetitive sequences

    Bell, G.I.

    1991-12-31

    The DNA of higher eukaryotes contains many repetitive sequences. The study of repetitive sequences is important, not only because many have important biological function, but also because they provide information on genome organization, evolution and dynamics. In this paper, I will first discuss some generic effects that repetitive sequences will have upon genome dynamics and evolution. In particular, it will be shown that repetitive sequences foster recombination among, and turnover of, the elements of a genome. I will then consider some examples of repetitive sequences, notably minisatellite sequences and telomere sequences as examples of tandem repeats, without and with respectively known function, and Alu sequences as an example of interspersed repeats. Some other examples will also be considered in less detail.

  6. DNA sequencing conference, 2

    Cook-Deegan, R.M. [Georgetown Univ., Kennedy Inst. of Ethics, Washington, DC (United States); Venter, J.C. [National Inst. of Neurological Disorders and Strokes, Bethesda, MD (United States); Gilbert, W. [Harvard Univ., Cambridge, MA (United States); Mulligan, J. [Stanford Univ., CA (United States); Mansfield, B.K. [Oak Ridge National Lab., TN (United States)

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  7. Anomaly Detection in Sequences

    National Aeronautics and Space Administration — We present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that...

  8. sequenceMiner algorithm

    National Aeronautics and Space Administration — Detecting and describing anomalies in large repositories of discrete symbol sequences. sequenceMiner has been open-sourced! Download the file below to try it out....

  9. Sequence information signal processor

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1999-01-01

    An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.

  10. The LHC sequencer

    The Large Hadron Collider (LHC) at CERN is a highly complex system made of many different sub-systems whose operation implies the execution of many tasks with stringent constraints on the order and duration of the execution. To be able to operate such a system in the most efficient and reliable way, the operators in the CERN control room use a high level control system: the LHC Sequencer. The LHC Sequencer system is composed of several components, including an Oracle database where operational sequences are configured, a core server that orchestrates the execution of the sequences, and two graphical user interfaces: one for sequence edition, and another for sequence execution. This paper describes the architecture of the LHC Sequencer system, and how the sequences are prepared and used for LHC operation. (authors)

  11. The LHC Sequencer

    Alemany-Fernandez, Reyes; Gorbonosov, Roman; Khasbulatov, Denis; Lamont, Mike; Le Roux, Pascal; Roderick, Chris

    2011-01-01

    The Large Hadron Collider (LHC) at CERN is a highly complex system made of many different sub-systems whose operation implies the execution of many tasks with stringent constraints on the order and duration of the execution. To be able to operate such a system in the most efficient and reliable way, the operators in the CERN control room use a high level control system: the LHC Sequencer. The LHC Sequencer system is composed of several components, including an Oracle database where operational sequences are configured, a core server that orchestrates the execution of the sequences, and two graphical user interfaces: one for sequence edition, and another for sequence execution. This paper describes the architecture of the LHC Sequencer system, and how the sequences are prepared and used for LHC operation.

  12. A symplectic Gysin sequence

    Perutz, Timothy

    2008-01-01

    We use the theory of pseudo-holomorphic quilts to establish a counterpart, in symplectic Floer homology, to the Gysin sequence for the homology of a sphere-bundle. In a motivating class of examples, this "symplectic Gysin sequence" is precisely analogous to an exact sequence describing the behaviour of Seiberg-Witten monopole Floer homology for 3-manifolds under connected sum.

  13. Repdigits in -Lucas Sequences

    Jhon J J Bravo; Florian Luca

    2014-05-01

    For an integer ≥ 2, let $(L_n^{(k)})_n$ be the -Lucas sequence which starts with $0,\\ldots,0,2,1$ ( terms) and each term afterwards is the sum of the preceding terms. In 2000, Luca (Port. Math. 57(2) 2000 243-254) proved that 11 is the largest number with only one distinct digit (the so-called repdigit) in the sequence $(L_n^{(2)})_n$. In this paper, we address a similar problem in the family of -Lucas sequences. We also show that the -Lucas sequences have similar properties to those of -Fibonacci sequences and occur in formulae simultaneously with the latter.

  14. Low autocorrelation binary sequences

    Packebusch, Tom; Mertens, Stephan

    2016-04-01

    Binary sequences with minimal autocorrelations have applications in communication engineering, mathematics and computer science. In statistical physics they appear as groundstates of the Bernasconi model. Finding these sequences is a notoriously hard problem, that so far can be solved only by exhaustive search. We review recent algorithms and present a new algorithm that finds optimal sequences of length N in time O(N {1.73}N). We computed all optimal sequences for N≤slant 66 and all optimal skewsymmetric sequences for N≤slant 119.

  15. cDNA and deduced amino acid sequence of human pulmonary surfactant-associated proteolipid SPL(Phe)

    Hydrophobic surfactant-associated protein of M/sub r/ 6000-14,000 was isolated from either/ethanol or chloroform/methanol extracts of mammalian pulmonary surfactant. Automated Edman degradation in a gas-phase sequencer showed the major N-terminus of the human low molecular weight protein to be Phe-Pro-Ile-Pro-Leu-Pro-Try-Cys-Trp-Leu-Cys-Arg-Ala-Leu-. Because of the N-terminal phenylalanine, the surfactant protein was designated SPL(Phe). Antiserum generated against hydrophobic surfactant protein(s) from bovine pulmonary surfactant recognized protein of M/sub r/ 6000-14,000 in immunoblot analysis and was used to screen a λgt11 expression library constructed from adult human lung poly(A)+ RNA. This resulted in identification of a 1.4-kilobase cDNA clone that was shown to encode the N-terminus of the surfactant polypeptide SPL(Phe) (Phe-Pro-Ile-Pro-Leu-Pro-) within an open reading frame for a larger protein. Expression of a fused β-galactosidase-SPL (Phe) gene in Escherichia coli yielded an immunoreactive M/sub r/ 34,000 fusion peptide. Hybrid-arrested translation with the cDNA and immunoprecipitation of [35S]methionine-labeled in vitro translation products of human poly(A)+ RNA with a surfactant polyclonal antibody resulted in identification of a M/sub r/ 40,000 precursor protein. Blot hybridization analysis of electrophoretically fractionated RNA from human lung detected a 2.0-kilobase RNA that was more abundant in adult lung than in fetal lung. These proteins, and specifically SPL(Phe), may therefore be useful for synthesis of replacement surfactants for treatment of hyaline membrane disease in newborn infants or of other surfactant-deficient states

  16. cDNA and deduced amino acid sequence of human pulmonary surfactant-associated proteolipid SPL(Phe)

    Glasser, S.W.; Korfhagen, T.R.; Weaver, T.; Pilot-Matias, T.; Fox, J.L.; Whitsett, J.A.

    1987-06-01

    Hydrophobic surfactant-associated protein of M/sub r/ 6000-14,000 was isolated from either/ethanol or chloroform/methanol extracts of mammalian pulmonary surfactant. Automated Edman degradation in a gas-phase sequencer showed the major N-terminus of the human low molecular weight protein to be Phe-Pro-Ile-Pro-Leu-Pro-Try-Cys-Trp-Leu-Cys-Arg-Ala-Leu-. Because of the N-terminal phenylalanine, the surfactant protein was designated SPL(Phe). Antiserum generated against hydrophobic surfactant protein(s) from bovine pulmonary surfactant recognized protein of M/sub r/ 6000-14,000 in immunoblot analysis and was used to screen a lambdagt11 expression library constructed from adult human lung poly(A)/sup +/ RNA. This resulted in identification of a 1.4-kilobase cDNA clone that was shown to encode the N-terminus of the surfactant polypeptide SPL(Phe) (Phe-Pro-Ile-Pro-Leu-Pro-) within an open reading frame for a larger protein. Expression of a fused ..beta..-galactosidase-SPL (Phe) gene in Escherichia coli yielded an immunoreactive M/sub r/ 34,000 fusion peptide. Hybrid-arrested translation with the cDNA and immunoprecipitation of (/sup 35/S)methionine-labeled in vitro translation products of human poly(A)/sup +/ RNA with a surfactant polyclonal antibody resulted in identification of a M/sub r/ 40,000 precursor protein. Blot hybridization analysis of electrophoretically fractionated RNA from human lung detected a 2.0-kilobase RNA that was more abundant in adult lung than in fetal lung. These proteins, and specifically SPL(Phe), may therefore be useful for synthesis of replacement surfactants for treatment of hyaline membrane disease in newborn infants or of other surfactant-deficient states.

  17. Successive Spectral Sequences

    Matschke, Benjamin

    2013-01-01

    If a chain complex is filtered over a poset I, then for every chain in I we obtain a spectral sequence. In this paper we define a spectral system that contains all these spectral sequences and relates their pages via differentials, extensions, and natural isomorphisms. We also study an analog of exact couples that provides a more general construction method for these spectral systems. This turns out to be a good framework for unifying several spectral sequences that one would usually apply on...

  18. Sequences of commutator operations

    Aichinger, Erhard

    2012-01-01

    Given the congruence lattice L of a finite algebra A with a Mal'cev term, we look for those sequences of operations on L that are sequences of higher commutator operations of expansions of A. The properties of higher commutators proved so far delimit the number of such sequences: the number is always at most countably infinite; if it is infinite, then L is the union of two proper subintervals with nonempty intersection.

  19. Efficient multivariate sequence classification

    Kuksa, Pavel P.

    2014-01-01

    Kernel-based approaches for sequence classification have been successfully applied to a variety of domains, including the text categorization, image classification, speech analysis, biological sequence analysis, time series and music classification, where they show some of the most accurate results. Typical kernel functions for sequences in these domains (e.g., bag-of-words, mismatch, or subsequence kernels) are restricted to {\\em discrete univariate} (i.e. one-dimensional) string data, such ...

  20. Hardware bitstream sequence recognizer

    Karpin, Oleksandr; Sokil, Volodymyr

    2009-01-01

    This paper describes how to implement in hardware a bistream sequence recognizer using the PSoC Pseudo Random Sequence Generator (PRS) User Module. The PRS can be used in digital communication systems with the serial data interface for automatic preamble detection and extraction, control words selection, etc.

  1. Cosmetology: Scope and Sequence.

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a cosmetology vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  2. Nucleosome dynamics: Sequence matters.

    Eslami-Mossallam, Behrouz; Schiessel, Helmut; van Noort, John

    2016-06-01

    About three quarter of all eukaryotic DNA is wrapped around protein cylinders, forming nucleosomes. Even though the histone proteins that make up the core of nucleosomes are highly conserved in evolution, nucleosomes can be very different from each other due to posttranslational modifications of the histones. Another crucial factor in making nucleosomes unique has so far been underappreciated: the sequence of their DNA. This review provides an overview of the experimental and theoretical progress that increasingly points to the importance of the nucleosomal base pair sequence. Specifically, we discuss the role of the underlying base pair sequence in nucleosome positioning, sliding, breathing, force-induced unwrapping, dissociation and partial assembly and also how the sequence can influence higher-order structures. A new view emerges: the physical properties of nucleosomes, especially their dynamical properties, are determined to a large extent by the mechanical properties of their DNA, which in turn depends on DNA sequence. PMID:26896338

  3. Protein sequence databases.

    Apweiler, Rolf; Bairoch, Amos; Wu, Cathy H

    2004-02-01

    A variety of protein sequence databases exist, ranging from simple sequence repositories, which store data with little or no manual intervention in the creation of the records, to expertly curated universal databases that cover all species and in which the original sequence data are enhanced by the manual addition of further information in each sequence record. As the focus of researchers moves from the genome to the proteins encoded by it, these databases will play an even more important role as central comprehensive resources of protein information. Several the leading protein sequence databases are discussed here, with special emphasis on the databases now provided by the Universal Protein Knowledgebase (UniProt) consortium. PMID:15036160

  4. Evolution of DNA Sequencing

    Sanger and coworkers introduced DNA sequencing in 1970s for the first time. It principally relied on termination of growing nucleotide chain when a dideoxythymidine triphosphate (ddTTP) was inserted in it. Detection of terminated sequences was done radiographically on Polyacrylamide Gel Electrophoresis (PAGE). Improvements that have evolved over time in original Sanger sequencing include replacement of radiography with fluorescence, use of separate fluorescent markers for each nucleotide, use of capillary electrophoresis instead of polyacrylamide gel electrophoresis and then introduction of capillary array electrophoresis. However, this technique suffered from few inherent limitations like decreased sensitivity for low level mutant alleles, complexities in analyzing highly polymorphic regions like Major Histocompatibility Complex (MHC) and high DNA concentrations required. Several Next Generation Sequencing (NGS) technologies have been introduced by Roche, Illumina and other commercial manufacturers that tend to overcome Sanger sequencing limitations and have been reviewed. Introduction of NGS in clinical research and medical diagnostics is expected to change entire diagnostic approach. These include study of cancer variants, detection of minimal residual disease, exome sequencing, detection of Single Nucleotide Polymorphisms (SNPs) and their disease association, epigenetic regulation of gene expression and sequencing of microorganisms genome. (author)

  5. Mapping sequences by parts

    Guziolowski Carito

    2007-09-01

    Full Text Available Abstract Background: We present the N-map method, a pairwise and asymmetrical approach which allows us to compare sequences by taking into account evolutionary events that produce shuffled, reversed or repeated elements. Basically, the optimal N-map of a sequence s over a sequence t is the best way of partitioning the first sequence into N parts and placing them, possibly complementary reversed, over the second sequence in order to maximize the sum of their gapless alignment scores. Results: We introduce an algorithm computing an optimal N-map with time complexity O (|s| × |t| × N using O (|s| × |t| × N memory space. Among all the numbers of parts taken in a reasonable range, we select the value N for which the optimal N-map has the most significant score. To evaluate this significance, we study the empirical distributions of the scores of optimal N-maps and show that they can be approximated by normal distributions with a reasonable accuracy. We test the functionality of the approach over random sequences on which we apply artificial evolutionary events. Practical Application: The method is illustrated with four case studies of pairs of sequences involving non-standard evolutionary events.

  6. HIV Sequence Compendium 2015

    Foley, Brian Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas Kenneth [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Cristian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Pennsylvania, Philadelphia, PA (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette Tina Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  7. Biological sequence analysis

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose;

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and...

  8. Yeast genome sequencing:

    Piskur, Jure; Langkjær, Rikke Breinhold

    2004-01-01

    For decades, unicellular yeasts have been general models to help understand the eukaryotic cell and also our own biology. Recently, over a dozen yeast genomes have been sequenced, providing the basis to resolve several complex biological questions. Analysis of the novel sequence data has shown...... of closely related species helps in gene annotation and to answer how many genes there really are within the genomes. Analysis of non-coding regions among closely related species has provided an example of how to determine novel gene regulatory sequences, which were previously difficult to analyse because...... they are short and degenerate and occupy different positions. Comparative genomics helps to understand the origin of yeasts and points out crucial molecular events in yeast evolutionary history, such as whole-genome duplication and horizontal gene transfer(s). In addition, the accumulating sequence data provide...

  9. Text Mining: (Asynchronous Sequences

    Sheema Khan

    2014-12-01

    Full Text Available In this paper we tried to correlate text sequences those provides common topics for semantic clues. We propose a two step method for asynchronous text mining. Step one check for the common topics in the sequences and isolates these with their timestamps. Step two takes the topic and tries to give the timestamp of the text document. After multiple repetitions of step two, we could give optimum result.

  10. Biological sequence analysis

    Speed, T. P.

    2003-01-01

    This talk will review a little over a decade's research on applying certain stochastic models to biological sequence analysis. The models themselves have a longer history, going back over 30 years, although many novel variants have arisen since that time. The function of the models in biological sequence analysis is to summarize the information concerning what is known as a motif or a domain in bioinformatics, and to provide a tool for discovering instances of that motif or domain in a separa...

  11. HIV Sequence Compendium 2010

    Kuiken, Carla [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Foley, Brian [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Christian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Alabama, Tuscaloosa, AL (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  12. Nanapore Sequencing with MSPA

    Gundlach, Jens H.

    2011-10-01

    Nanopore sequencing is the simplest concept of converting the sequence of a single DNA molecule directly into an electronic signal. We introduced the protein pore MspA. derived from Mycobacterium smegmatis, to nanpore sequencing [1]. MspA has a single, narrow (-1.2nm) and short (MspA is reproducible with sub-nanometer precision and is engineerable using genetic mutations. DNA moves through the pore at rates exceeding 1nt/microsec. too fast to observe the passage of each nucleotide. However, when DNA is held with double stranded DNA sections or an avidin anchor, single nucleotides resident in MspA's constriction can be identified with highly resolved current differences. We have provided proof of principle of a nanopore sequencing method [2] in which we use DNA modified by inserting double stranded DNA-sections between every nucleotide. The double stranded sections are designed to halt translocation for long enough to sequentially read the sequence of the original DNA molecule. Prospects and developments to sequence unmodified native DNA using MspA will be discussed.[4pt] [1] T.Z. Butler, et al, PNAS 105 20647 (2008)[0pt] [2] I.M. Derrington, et al, PNAS 107 16060 (2010).

  13. Adaptive Processing for Sequence Alignment

    Zidan, Mohammed Affan

    2012-01-26

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  14. What is next generation sequencing?

    Behjati, Sam; Tarpey, Patrick S.

    2013-01-01

    Next generation sequencing (NGS), massively parallel or deep sequencing are related terms that describe a DNA sequencing technology which has revolutionised genomic research. Using NGS an entire human genome can be sequenced within a single day. In contrast, the previous Sanger sequencing technology, used to decipher the human genome, required over a decade to deliver the final draft. Although in genome research NGS has mostly superseded conventional Sanger sequencing, it has not yet translat...

  15. Program Synthesizes UML Sequence Diagrams

    Barry, Matthew R.; Osborne, Richard N.

    2006-01-01

    A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.

  16. Ternary Chaotic Pulse Compression Sequences

    J. B. Seventline

    2010-09-01

    Full Text Available In this paper method available for generating ternary sequences is discussed. These sequences are useful in many applications but specifically in synchronization of block codes and pulse compression in radar. The ternary sequences are derived from chaotic maps. It is feasible to achieve simultaneously superior performances in detection range and range resolution using the proposed ternary sequences. The properties of these sequences like autocorrelation function, Peak Side Lobe Ratio (PSLR, ambiguity diagram and performance under AWGN noise background has been studied. The generation of these sequences is much simpler, and the available number of sequences is virtually infinite and not limited by the length of the sequence.

  17. Sequencing BPS Spectra

    Gukov, Sergei; Saberi, Ingmar; Stosic, Marko; Sulkowski, Piotr

    2015-01-01

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincar\\'e polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (refined) modular $S$-matrix. This leads to the identifi...

  18. DNA sequencing: chemical methods

    Limited base-specific or base-selective cleavage of a defined DNA fragment yields polynucleotide products, the length of which correlates with the positions of the particular base (or bases) in the original fragment. Sverdlov and co-workers recognized the possibility of using this principle for the determination of DNA sequences. In 1977 a fully elaborated method was introduced based on this principle, which allowed routine analysis of DNA sequences over distances greater than 100 nucleotide unite from a defined, radiolabeled terminus. Six procedures for partial cleavage were described. Simultaneous parallel resolution of an appropriate set of partial cleavage mixtures by polyacrylamide gel electrophoresis, followed by visualization of the radioactive bands by autoradiography, allows the deduction of nucleotide sequence

  19. Next-generation sequencing

    Rieneck, Klaus; Bak, Mads; Jønson, Lars;

    2013-01-01

    information obtained allows well for statistical analysis of the data. This general approach can be integrated into current laboratory practice and has numerous applications. Besides DNA-based predictions of blood group phenotypes, platelet phenotypes, or sickle cell anemia, and the determination of zygosity......, Illumina); several millions of PCR sequences were analyzed. RESULTS: The results demonstrated the feasibility of diagnosing the fetal KEL1 or KEL2 blood group from cell-free DNA purified from maternal plasma. CONCLUSION: This method requires only one primer pair, and the large amount of sequence...

  20. Quaternions and Rotation Sequences

    Kuipers, Jack B.

    2000-01-01

    In this paper we introduce and define the quaternion; we give a brief introduction to its properties and algebra, and we show (what appears to be) its primary application—the quaternion rotation operator. The quaternion rotation operator competes with the conventional matrix rotation operator in a variety of rotation sequences.

  1. THE RHIC SEQUENCER

    The Relativistic Heavy Ion Collider (RHIC) has a high level asynchronous time-line driven by a controlling program called the ''Sequencer''. Most high-level magnet and beam related issues are orchestrated by this system. The system also plays an important task in coordinated data acquisition and saving. We present the program, operator interface, operational impact and experience

  2. Instruction sequence processing operators

    J.A. Bergstra; C.A. Middelburg

    2009-01-01

    This paper concerns instruction sequences whose execution involves the processing of instructions by an execution environment that offers a family of services and may yield a Boolean value at termination. We introduce a composition operator for families of services and three operators that have a di

  3. THE RHIC SEQUENCER.

    VAN ZEIJTS,J.; DOTTAVIO,T.; FRAK,B.; MICHNOFF,R.

    2001-06-18

    The Relativistic Heavy Ion Collider (RHIC) has a high level asynchronous time-line driven by a controlling program called the ''Sequencer''. Most high-level magnet and beam related issues are orchestrated by this system. The system also plays an important task in coordinated data acquisition and saving. We present the program, operator interface, operational impact and experience.

  4. A Sequence of Cylinders

    Johnson, Erica

    2006-01-01

    Hoping to develop in her students an understanding of mathematics as a way of thinking more than a way of doing, the author of this article describes how her students worked on a spatial reasoning problem stemming from an iteratively constructed sequence of cylinders. She presents an activity of making cylinders out of paper models, and for every…

  5. Properties of Semijoin Sequences

    BengC.Ooi; B.Srinivasan

    1989-01-01

    The problem of finding optimum semijoin sequ4ence of an arbitrary query under linear cost function for the transmission cost is NP.hard.Hence heuristic algorithms with desirable properties are explored.In this paper four properties of semijoin programs for distributed query processing are identified,The use of these properties in constructing semijoin sequence is justified.An existing algorithm is modified incorporating these properties.Empirical comparison with existing algorithms shows the superiority of the proposed algorithm.

  6. Learning academic formulaic sequences

    Peters, Elke; Pauwels, Paul

    2015-01-01

    This paper reports on a classroom-based study that explored the effect of explicit, vocabulary-focused instruction on English as a Foreign Language (EFL) students’ recognition, cued output and spontaneous use of academic formulaic sequences (FS). In addition, the study aimed to shed some light on which type of classroom activity might be most beneficial. Data were collected among second-year EFL business students (L1 = Dutch) in a classroom-based experiment during students’ regular English cl...

  7. Sequence Classification: 885394 [

    Full Text Available 703); The expression pattern of this gene is described in PMID:12000842; possible frameshift detected when compare...Non-TMB TMH Non-TMB Non-TMB Non-TMB Non-TMB >gi|23619146|ref|NP_705108.1| Slight difference exist when compa...red to the published sequence of EBL-1 from Dd2 strain of P. falciparum (PMID:10613

  8. Oscillatory Nonautonomous Lucas Sequences

    José M. Ferreira

    2010-01-01

    Full Text Available The oscillatory behavior of the solutions of the second-order linear nonautonomous equation x(n+1=a(nx(n−b(nx(n−1,  n∈ℕ0, where a,b:ℕ0→ℝ, is studied. Under the assumption that the sequence b(n dominates somehow a(n, the amplitude of the oscillations and the asymptotic behavior of its solutions are also analized.

  9. Information Theory of DNA Sequencing

    Motahari, Abolfazl; Tse, David

    2012-01-01

    DNA sequencing is the basic workhorse of modern day biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original sequence. By drawing an analogy between the DNA sequencing problem and the classic communication problem, we define an information theoretic notion of sequencing capacity. This is the maximum number of DNA base pairs that can be resolved reliably per read, and provides a fundamental limit to the performance that can be achieved by any assembly algorithm. We compute the sequencing capacity explicitly for a simple statistical model of the DNA sequence and the read process. Using this framework, we also study the impact of noise in the read process on the sequencing capacity.

  10. Image sequence analysis

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  11. Psychoacoustic Properties of Fibonacci Sequences

    J. Sokoll

    2008-01-01

    Full Text Available 1202, Fibonacci set up one of the most interesting sequences in number theory. This sequence can be represented by so-called Fibonacci Numbers, and by a binary sequence of zeros and ones. If such a binary Fibonacci Sequence is played back as an audio file, a very dissonant sound results. This is caused by the “almost-periodic”, “self-similar” property of the binary sequence. The ratio of zeros and ones converges to the golden ratio, as do the primary and secondary spectral components intheir frequencies and amplitudes. These Fibonacci Sequences will be characterized using listening tests and psychoacoustic analyses. 

  12. Allele Re-sequencing Technologies

    Byrne, Stephen; Farrell, Jacqueline Danielle; Asp, Torben

    The development of next-generation sequencing technologies has made sequencing an affordable approach for detection of genetic variations associated with various traits. However, the cost of whole genome re-sequencing still remains too high to be feasible for many plant species with large and...... complex genomes. Recent developments in strategies for target-enrichment, transcriptome re-sequencing, and partial genome re-sequencing allows for enrichment for regions of interest at a scale that is matched to the throughput of next-generation sequencing platforms, and has emerged as a promising...

  13. Sequence Maneuverer: tool for sequence extraction from genomes

    Yasmin, Tayyaba; Rehman, Inayat Ur; Ansari, Adnan Ahmad; liaqat, Khurrum; Khan, Muhammad Irfan

    2012-01-01

    The availability of genomic sequences of many organisms has opened new challenges in many aspects particularly in terms of genome analysis. Sequence extraction is a vital step and many tools have been developed to solve this issue. These tools are available publically but have limitations with reference to the sequence extraction, length of the sequence to be extracted, organism specificity and lack of user friendly interface. We have developed a java based software package having three modul...

  14. Pseudorandom Binary Sequences Generated by Sequences (n.alpha)

    Porubský, Štefan

    Marseille: CIRM, 2008, s. 1-40. [International Conference on Uniform Distribution. Marseille, 21.01.2008-25.01.2008)] Institutional research plan: CEZ:AV0Z10300504 Keywords : binary sequence * multiples of of an irrational number * pseudorandom sequence * distribution properties of sequences Subject RIV: BA - General Mathematics

  15. Sequence repeats and protein structure

    Hoang, Trinh X.; Trovato, Antonio; Seno, Flavio; Banavar, Jayanth R.; Maritan, Amos

    2012-11-01

    Repeats are frequently found in known protein sequences. The level of sequence conservation in tandem repeats correlates with their propensities to be intrinsically disordered. We employ a coarse-grained model of a protein with a two-letter amino acid alphabet, hydrophobic (H) and polar (P), to examine the sequence-structure relationship in the realm of repeated sequences. A fraction of repeated sequences comprises a distinct class of bad folders, whose folding temperatures are much lower than those of random sequences. Imperfection in sequence repetition improves the folding properties of the bad folders while deteriorating those of the good folders. Our results may explain why nature has utilized repeated sequences for their versatility and especially to design functional proteins that are intrinsically unstructured at physiological temperatures.

  16. Recent advances in nanopore sequencing

    Maitra, Raj D.; Kim, Jungsuk; Dunbar, William B.

    2012-01-01

    The prospect of nanopores as a next-generation sequencing (NGS) platform has been a topic of growing interest and considerable government-sponsored research for more than a decade. Oxford Nanopore Technologies recently announced the first commercial nanopore sequencing devices, to be made available by the end of 2012, while other companies (Life, Roche, IBM) are also pursuing nanopore sequencing approaches. In this paper, the state of the art in nanopore sequencing is reviewed, focusing on th...

  17. Region segmentation along image sequence

    A method to extract regions in sequence of images is proposed. Regions are not matched from one image to the following one. The result of a region segmentation is used as an initialization to segment the following and image to track the region along the sequence. The image sequence is exploited as a spatio-temporal event. (authors). 12 refs., 8 figs

  18. Sequence Maneuverer: tool for sequence extraction from genomes

    Yasmin, Tayyaba; Rehman, Inayat Ur; Ansari, Adnan Ahmad; liaqat, Khurrum; khan, Muhammad Irfan

    2012-01-01

    The availability of genomic sequences of many organisms has opened new challenges in many aspects particularly in terms of genome analysis. Sequence extraction is a vital step and many tools have been developed to solve this issue. These tools are available publically but have limitations with reference to the sequence extraction, length of the sequence to be extracted, organism specificity and lack of user friendly interface. We have developed a java based software package having three modules which can be used independently or sequentially. The tool efficiently extracts sequences from large datasets with few simple steps. It can efficiently extract multiple sequences of any desired length from a genome of any organism. The results are crosschecked by published data. Availability URL 1: http://ww3.comsats.edu.pk/bio/ResearchProjects.aspx URL 2: http://ww3.comsats.edu.pk/bio/SequenceManeuverer.aspx PMID:23275734

  19. Rapid Polymer Sequencer

    Stolc, Viktor (Inventor); Brock, Matthew W (Inventor)

    2013-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component. The nanopore preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  20. Complete Convergence of Exchangeable Sequences

    George Stoica

    2011-01-01

    Full Text Available We prove that exchangeable sequences converge completely in the Baum-Katz sense under the same conditions as i.i.d. sequences do. Problem statement: The research was needed as the rate of convergence in the law of large numbers for exchangeable sequences was previously obtained under restricted hypotheses. Approach: We applied powerful techniques involving inequalities for independent sequences of random variables. Results: We obtained the maximal rate of convergence and provided an example to show that our findings are sharp. Conclusion/Recommendations: The technique used in the paper may be adapted in the similar study for identically distributed sequences.

  1. Weyl Spreading Sequence Optimizing CDMA

    Tsuda, Hirofumi; Umeno, Ken

    2016-01-01

    Recently, the new spreading sequence obtained by the Weyl sequence is proposed for CDMA systems. Its cross-correlation function follows $O(\\frac{1}{N})$, where $N$ is the code length of the spreading sequence. In this paper, we optimize the Weyl sequence code design to assign to each user for CDMA systems and we analytically calculate its theoretical SIR (Signal to Interference Noise Ratio). It is theoretically proven that the CDMA systems with spreading sequence has about 2.5 times larger ca...

  2. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  3. Comparative statistics for DNA and protein sequences: multiple sequence analysis.

    Karlin, S.; Ghandour, G

    1985-01-01

    Concepts and methods [Karlin, S. & Ghandour, G. (1985) Proc. Natl. Acad. Sci. USA 82, 5800-5804] for the analysis of patterns and relationships are extended to multiple DNA and protein sequences. Functionals include multiple sequence common word occurrence distributions, characterizations of high frequency shared words, and ascertainment of long block identities. Various comparisons of sequences using natural alphabets obtained from grouping nucleotides or amino acids by their chemical and fu...

  4. Physical Complexity of Symbolic Sequences

    Adami, C

    2008-01-01

    A practical measure for the complexity of sequences of symbols (``strings'') is introduced that is rooted in automata theory but avoids the problems of Kolmogorov-Chaitin complexity. This physical complexity can be estimated for ensembles of sequences, for which it reverts to the difference between the maximal entropy of the ensemble and the actual entropy given the specific environment within which the sequence is to be interpreted. Thus, the physical complexity measures the amount of information about the environment that is coded in the sequence, and is conditional on such an environment. In practice, an estimate of the complexity of a string can be obtained by counting the number of loci per string that are fixed in the ensemble, while the volatile positions represent, again with respect to the environment, randomness. We apply this measure to tRNA sequence data. - Substantially improved and clarified version, includes application to EMBL tRNA sequence data

  5. Turtle Graphics of Morphic Sequences

    Zantema, Hans

    2016-02-01

    The simplest infinite sequences that are not ultimately periodic are pure morphic sequences: fixed points of particular morphisms mapping single symbols to strings of symbols. A basic way to visualize a sequence is by a turtle curve: for every alphabet symbol fix an angle, and then consecutively for all sequence elements draw a unit segment and turn the drawing direction by the corresponding angle. This paper investigates turtle curves of pure morphic sequences. In particular, criteria are given for turtle curves being finite (consisting of finitely many segments), and for being fractal or self-similar: it contains an up-scaled copy of itself. Also space-filling turtle curves are considered, and a turtle curve that is dense in the plane. As a particular result we give an exact relationship between the Koch curve and a turtle curve for the Thue-Morse sequence, where until now for such a result only approximations were known.

  6. Graphene nanodevices for DNA sequencing

    Heerema, Stephanie J.; Dekker, Cees

    2016-02-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with nanopores. Owing to its unique structure and properties, graphene provides interesting opportunities for the development of a new sequencing technology. In recent years, a wide range of creative ideas for graphene sequencers have been theoretically proposed and the first experimental demonstrations have begun to appear. Here, we review the different approaches to using graphene nanodevices for DNA sequencing, which involve DNA passing through graphene nanopores, nanogaps, and nanoribbons, and the physisorption of DNA on graphene nanostructures. We discuss the advantages and problems of each of these key techniques, and provide a perspective on the use of graphene in future DNA sequencing technology.

  7. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  8. Spatiotemporal correlations of aftershock sequences

    Peixoto, Tiago P.; Doblhoff-Dier, Katharina; Davidsen, Jörn

    2010-01-01

    Aftershock sequences are of particular interest in seismic research since they may condition seismic activity in a given region over long time spans. While they are typically identified with periods of enhanced seismic activity after a large earthquake as characterized by the Omori law, our knowledge of the spatiotemporal correlations between events in an aftershock sequence is limited. Here, we study the spatiotemporal correlations of two aftershock sequences form California (Parkfield and H...

  9. Application of difference sequences theory

    Khantarzhiev, Georgii

    2013-01-01

    The results of difference sequences theory are applied to analytic function theory and Diophantine equations. As a result we have the equation which connects the $n$-th derivative of a function with the difference sequence for the values of this function. Also the results of difference sequences theory helps to discover some features of the whole kind of Diophantine equations. The method presented allows to find limits where Diophantine equation does not have integer solutions. The higher pow...

  10. Cutting sequences on translation surfaces

    Davis, Diana

    2013-01-01

    We analyze the cutting sequences associated to geodesic flow on a large class of translation surfaces, including Bouw-Moller surfaces. We give a combinatorial rule that relates a cutting sequence corresponding to a given trajectory, to the cutting sequence corresponding to the image of that trajectory under the parabolic element of the Veech group. This extends previous work for regular polygon surfaces to a larger class of translation surfaces. We find that the combinatorial rule is the same...

  11. SNMR pulse sequence phase cycling

    Walsh, David O; Grunewald, Elliot D

    2013-11-12

    Technologies applicable to SNMR pulse sequence phase cycling are disclosed, including SNMR acquisition apparatus and methods, SNMR processing apparatus and methods, and combinations thereof. SNMR acquisition may include transmitting two or more SNMR pulse sequences and applying a phase shift to a pulse in at least one of the pulse sequences, according to any of a variety cycling techniques. SNMR processing may include combining SNMR from a plurality of pulse sequences comprising pulses of different phases, so that desired signals are preserved and indesired signals are canceled.

  12. Fast global sequence alignment technique

    Bonny, Mohamed Talal

    2011-11-01

    Bioinformatics database is growing exponentially in size. Processing these large amount of data may take hours of time even if super computers are used. One of the most important processing tool in Bioinformatics is sequence alignment. We introduce fast alignment algorithm, called \\'Alignment By Scanning\\' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the \\'GAP\\' (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 51% enhancement in alignment score when it is compared with the GAP Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  13. Blazar Sequence in Fermi Era

    Liang Chen

    2014-09-01

    In this paper, we review the latest research results on the topic of blazar sequence. It seems that the blazar sequence is phenomenally ruled out, while the theoretical blazar sequence still holds. We point out that black hole mass is a dominated parameter accounting for high-power-high-synchrotron-peaked and low-power-low-sychrotron-peaked blazars. Because most blazars have similar size of emission region, theoretical blazar sequence implies that the break of Spectral Energy Distribution (SED) is a cooling break in nature.

  14. ABS: Sequence alignment by scanning

    Bonny, Mohamed Talal

    2011-08-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  15. Transcriptional sequencing: A method for DNA sequencing using RNA polymerase

    Sasaki, Nobuya; Izawa, Masaki; Watahiki, Masanori; Ozawa, Kaori; Tanaka, Takumi; Yoneda, Yuko; Matsuura, Shuji; Carninci, Piero; Muramatsu, Masami; Okazaki, Yasushi; Hayashizaki, Yoshihide

    1998-01-01

    We have developed a sequencing method based on the RNA polymerase chain termination reaction with rhodamine dye attached to 3′-deoxynucleoside triphosphate (3′-dNTP). This method enables us to conduct a rapid isothermal sequencing reaction in

  16. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon;

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS)...

  17. Sequence Algebra, Sequence Decision Diagrams and Dynamic Fault Trees

    Rauzy, Antoine B., E-mail: Antoine.Rauzy@lix.polytechnique.f [LIX-CNRS, Computer Science, Ecole Polytechnique, 91128 Palaiseau Cedex (France)

    2011-07-15

    A large attention has been focused on the Dynamic Fault Trees in the past few years. By adding new gates to static (regular) Fault Trees, Dynamic Fault Trees aim to take into account dependencies among events. Merle et al. proposed recently an algebraic framework to give a formal interpretation to these gates. In this article, we extend Merle et al.'s work by adopting a slightly different perspective. We introduce Sequence Algebras that can be seen as Algebras of Basic Events, representing failures of non-repairable components. We show how to interpret Dynamic Fault Trees within this framework. Finally, we propose a new data structure to encode sets of sequences of Basic Events: Sequence Decision Diagrams. Sequence Decision Diagrams are very much inspired from Minato's Zero-Suppressed Binary Decision Diagrams. We show that all operations of Sequence Algebras can be performed on this data structure.

  18. Aircraft Mechanics: Scope and Sequence.

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for an aircraft mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and…

  19. AMPLIFICATION OF RIBOSOMAL RNA SEQUENCES

    This book chapter offers an overview of the use of ribosomal RNA sequences. A history of the technology traces the evolution of techniques to measure bacterial phylogenetic relationships and recent advances in obtaining rRNA sequence information. The manual also describes procedu...

  20. PERIODIC COMPLEMENTARY BINARY SEQUENCE PAIRS

    XuChengqian; ZhaoXiaoqun

    2002-01-01

    A new set of binary sequences-Periodic Complementary Binary Sequence Pair (PCSP)is proposed .A new class of block design-Difference Family Pair (DFP)is also proposed .The relationship between PCSP and DFP,the properties and exising conditions of PCSP and the recursive constructions for PCSP are given.

  1. PERIODIC COMPLEMENTARY BINARY SEQUENCE PAIRS

    Xu Chengqian; Zhao Xiaoqun

    2002-01-01

    A new set of binary sequences-Periodic Complementary Binary Sequence Pair (PCSP) is proposed. A new class of block design-Difference Family Pair (DFP) is also proposed.The relationship between PCSP and DFP, the properties and existing conditions of PCSP and the recursive constructions for PCSP are given.

  2. Approaches to Sequence Similarity Representation

    Sokolov, Artem; Rachkovskij, Dmitri

    2006-01-01

    We discuss several approaches to similarity preserving coding of symbol sequences and possible connections of their distributed versions to metric embeddings. Interpreting sequence representation methods with embeddings can help develop an approach to their analysis and may lead to discovering useful properties.

  3. Finite extensions of Bessel sequences

    Bakić, Damir; Berić, Tomislav

    2015-01-01

    The paper studies finite extensions of Bessel sequences in infinite-dimensional Hilbert spaces. We provide a characterization of Bessel sequences that can be extended to frames by adding finitely many vectors. We also characterize frames that can be converted to Parseval frames by finite-dimensional perturbations. Finally, some results on excesses of frames and near-Riesz bases are derived.

  4. Rapid Diagnostics of Onboard Sequences

    Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

    2012-01-01

    Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command

  5. Spatiotemporal correlations of aftershock sequences

    Peixoto, Tiago P; Davidsen, Jörn

    2010-01-01

    Aftershock sequences are of particular interest in seismic research since they may condition seismic activity in a given region over long time spans. While they are typically identified with periods of enhanced seismic activity after a large earthquake as characterized by the Omori law, our knowledge of the spatiotemporal correlations between events in an aftershock sequence is limited. Here, we study the spatiotemporal correlations of two aftershock sequences form California (Parkfield and Hector Mine) using the recently introduced concept of "recurrent" events. We find that both sequences have very similar properties and that most of them are captured by the space-time epidemic-type aftershock sequence (ETAS) model if one takes into account catalog incompleteness. However, the stochastic model does not capture the spatiotemporal correlations leading to the observed structure of seismicity on small spatial scales.

  6. Multilocus sequence typing of total-genome-sequenced bacteria.

    Larsen, Mette V; Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Pontén, Thomas; Ussery, David W; Aarestrup, Frank M; Lund, Ole

    2012-04-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST. PMID:22238442

  7. Default processing of event sequences.

    Hymel, Alicia; Levin, Daniel T; Baker, Lewis J

    2016-02-01

    In a wide range of circumstances, it is important to perceive and represent the sequence of events. For example, sequence perception is necessary to learn statistical contingencies between events, and to generate predictions about events when segmenting actions. However, viewer's awareness of event sequence is rarely tested, and at least some means of encoding event sequence are likely to be resource-intensive. Therefore, previous research may have overestimated the degree to which viewers are aware of specific event sequences. In the experiments reported here, we tested viewers' ability to detect anomalies during visual event sequences. Participants viewed videos containing events that either did or did not contain an out-of-order action. Participants were unable to consistently detect the misordered events, and performance on the task decreased significantly to very low levels when performing a secondary task. In addition, participants almost never detected misorderings in an incidental version of the task, and performance increased when videos ended immediately after the misordering, We argue that these results demonstrate that viewers can effectively perceive the elements of events, but do not consistently test their expectations about the specific sequence of natural events unless bidden to do so by task-specific demands. (PsycINFO Database Record PMID:26348070

  8. Variational Sequences, Representation Sequences and Applications in Physics

    Palese, Marcella; Rossi, Olga; Winterroth, Ekkehart; Musilová, Jana

    2015-01-01

    This paper is a review containing new original results on the finite order variational sequence and its different representations with emphasis on applications in the theory of variational symmetries and conservation laws in physics.

  9. Binary Sequences Generated by Sequences {n}, n = 1, 2, . .

    Porubský, Štefan; Strauch, O.

    2010-01-01

    Roč. 77, 1-2 (2010), s. 139-170. ISSN 0033-3883 R&D Projects: GA ČR GA201/07/0191 Grant ostatní: VEGA(SK) 2/7138/27 Institutional research plan: CEZ:AV0Z10300504 Keywords : pseudorandomness * binary sequence * measures of pseudorandomness * well distribution * uniform distribution * correlation * Sturmian sequence Subject RIV: BA - General Mathematics Impact factor: 0.568, year: 2010

  10. Sequence to Sequence Learning for Optical Character Recognition

    Sahu, Devendra Kumar; Sukhwani, Mohak

    2015-01-01

    We propose an end-to-end recurrent encoder-decoder based sequence learning approach for printed text Optical Character Recognition (OCR). In contrast to present day existing state-of-art OCR solution which uses connectionist temporal classification (CTC) output layer, our approach makes minimalistic assumptions on the structure and length of the sequence. We use a two step encoder-decoder approach -- (a) A recurrent encoder reads a variable length printed text word image and encodes it to a f...

  11. New Orthogonal Small Set Kasami Code Sequence

    I Nyoman Pramaita; I G.A.G.K. Diafari; DNKP Negara; Agus Dharma

    2015-01-01

    In this paper, the authors propose the design of a new orthogonal small set Kasami code sequence generated using combination of non-orthogonal m-sequence and small set Kasami code sequence. The authors demonstrate that the proposed code sequence has comparable auto-correlation function (ACF), cross- correlation function (CCF), peak cross-correlation values with that of the existing orthogonal small set Kasami code sequence. Though the proposed code sequence has less code sequence sets than th...

  12. WebLogo: A Sequence Logo Generator

    Crooks, G. E.; Hon, G; Chandonia, J. M.; Brenner, Steven E.

    2004-01-01

    WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measure...

  13. A Criterion for Regular Sequences

    D P Patil; U Storch; J Stückrad

    2004-05-01

    Let be a commutative noetherian ring and $f_1,\\ldots,f_r \\in R$. In this article we give (cf. the Theorem in $\\mathcal{x}$2) a criterion for $f_1,\\ldots,f_r$ to be regular sequence for a finitely generated module over which strengthens and generalises a result in [2]. As an immediate consequence we deduce that if $V(g_1,\\ldots,g_r) \\subseteq V(f_1,\\ldots,f_r)$ in Spec and if $f_1,\\ldots,f_r$ is a regular sequence in , then $g_1,\\ldots,g_r$ is also a regular sequence in .

  14. Modified Genetic Algorithm for DNA Sequence Assembly by Shotgun and Hybridization Sequencing Techniques

    Prof.Narayan Kumar Sahu; Prof.Somesh Dewangan; Prof.Akash Wanjari

    2012-01-01

    Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion of pair wise fragment overlap. While shotgun sequencing infers a DNA sequence given the sequences of overlapping fragments, a recent and complementary method, called sequencing by hy...

  15. Pythagorean Triples from Harmonic Sequences.

    DiDomenico, Angelo S.; Tanner, Randy J.

    2001-01-01

    Shows how all primitive Pythagorean triples can be generated from harmonic sequences. Use inductive and deductive reasoning to explore how Pythagorean triples are connected with another area of mathematics. (KHR)

  16. On certain sequence spaces II

    Hüsnü Kizmaz

    1995-12-01

    Full Text Available In this paper we define the space co(Δ={x=(xk/xk−kk−1→0(k→∞,xo=0,xk∈ℂ} and compute its duals (Continuous dual, β-dual and N-dual The aim of this paper is to give same results about matrix mapping of co(Δ into other sequence spaces including the convergent sequences, null sequences and bounded sequences.

  17. Guitars, Violins, and Geometric Sequences

    Barger, Rita; Haehl, Martha

    2007-01-01

    This article describes middle school mathematics activities that relate measurement, ratios, and geometric sequences to finger positions or the placement of frets on stringed musical instruments. (Contains 2 figures and 2 tables.)

  18. Mining protein sequences for motifs.

    Narasimhan, Giri; Bu, Changsong; Gao, Yuan; Wang, Xuning; Xu, Ning; Mathee, Kalai

    2002-01-01

    We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs in protein sequences. The algorithm assumes that a motif is constituted by the presence of a "good" combination of residues in appropriate locations of the motif. The algorithm attempts to compile such good combinations into a "pattern dictionary" by processing an aligned training set of protein sequences. The dictionary is subsequently used to detect motifs in new protein sequences. Statistical significance of the detection results are ensured by statistically determining the various parameters of the algorithm. Based on this approach, we have implemented a program called GYM. The Helix-Turn-Helix motif was used as a model system on which to test our program. The program was also extended to detect Homeodomain motifs. The detection results for the two motifs compare favorably with existing programs. In addition, the GYM program provides a lot of useful information about a given protein sequence. PMID:12487759

  19. Sequencer for n accelerator facilities

    Operation of machines like telescopes and accelerators requires the efficient and reproducible execution of many different types of procedures. These machines consist of different sub-systems whose operation entail the execution of many tasks with strict compulsion on the order and duration of the execution. To improve operational reliability and efficiency, automated execution of procedures is required. Creation of a single robust sequencing application permits the streamlining of this process and offers many benefits. At the same time, a drive for greater efficiency, a tendency for more complex accelerator operations and a need to reduce the risk of 'operator error' have rendered these tools essential. This paper presents the design of Sequencer tool for Indian Accelerator facility. It sites an examples of such tools used at different international accelerator facilities. The features considered desirable in a good sequencer and a description of the tools created to aid in sequence construction and diagnosis are discussed. (author)

  20. On train track splitting sequences

    Masur, Howard; Schleimer, Saul

    2010-01-01

    We show that the subsurface projection of a train track splitting sequence is an unparameterized quasi-geodesic in the curve complex of the subsurface. For the proof we introduce induced tracks, efficient position, and wide curves. This result is an important step in the proof that the disk complex is Gromov hyperbolic. As another application we show that train track sliding and splitting sequences give quasi-geodesics in the train track graph, generalizing a result of Hamenstaedt [Invent. Math.].

  1. Integrated sequence analysis. Final report

    The NKS/RAK subprojet 3 'integrated sequence analysis' (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term 'methodology' denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  2. Sequence diversity and functional conformity

    de Lange, Orlando

    2015-01-01

    At least four phylogenetically distinct groups of bacteria encode repeat proteins with the common ability to bind specific DNA sequences with a unique but conserved code. Each repeat binds a single DNA base, and specificity is determined by the amino acid residue at position 13 of each repeat. Repeats are typically 33-35 amino acids long. Comparing repeat sequences across all groups reveals that only three positions are hyper-conserved. Repeats are in most cases functionally compatible such t...

  3. Thread extraction for polyadic instruction sequences

    J.A. Bergstra; C.A. Middelburg

    2009-01-01

    Instruction sequences are often fragmented. An important reason for instruction sequence fragmentation is that the execution architecture at hand to execute instruction sequences sets bounds to the size of instruction sequences. In this paper, we study instruction sequences that have been split into

  4. Optimization of sequence alignment for simple sequence repeat regions

    Ogbonnaya Francis C

    2011-07-01

    Full Text Available Abstract Background Microsatellites, or simple sequence repeats (SSRs, are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs. SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. Findings To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type. When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. Conclusions The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic

  5. ARC Code TI: sequenceMiner

    National Aeronautics and Space Administration — The sequenceMiner was developed to address the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. sequenceMiner works...

  6. A simple method for global sequence comparison.

    Pizzi, E; Attimonelli, M; Liuni, S; Frontali, C.; Saccone, C.

    1992-01-01

    A simple method of sequence comparison, based on a correlation analysis of oligonucleotide frequency distributions, is here shown to be a reliable test of overall sequence similarity. The method does not involve sequence alignment procedures and permits the rapid screening of large amounts of sequence data. It identifies those sequences which deserve more careful analysis of sequence similarity at the level of resolution of the single nucleotide. It uses observed quantities only and does not ...

  7. Sequence classification using statistical pattern recognition

    Iglesias, José Antonio; Ledezma, Agapito; Sanchis, Araceli

    2007-01-01

    Sequence classification is a significant problem that arises in many different real-world applications. The purpose of a sequence classifier is to assign a class label to a given sequence. Also, to obtain the pattern that characterizes the sequence is usually very useful. In this paper, a technique to discover a pattern from a given sequence is presented followed by a general novel method to classify the sequence. This method considers mainly the dependencies among the neighbouring elements o...

  8. Supercyclicity with Respect to a Sequence on Special Sequence Spaces

    S. Foroutan, and B. Yousef

    2007-06-01

    Full Text Available Let {β(n}∞ n=−∞ be a sequence of positive numbers such that β(0 = 1 and let 1 < p < ∞. We consider the space of all formal Laurent series f(z = P∞ n=−∞ ˆf(nz n such that X∞ n=−∞ | ˆf(n| pβ(n p < ∞. We investigate the supercyclicity with respect to a sequence on the Banach spaces of formal Laurent series

  9. Statistical properties of DNA sequences

    Peng, C.-K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-02-01

    We review evidence supporting the idea that the DNA sequence in genese containing non-coding regions is correlated, and that the correlation is remarkably long range - indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the “non-stationarity” feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33 301 coding and 29 453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  10. Sequencing Needs for Viral Diagnostics

    Gardner, S N; Lam, M; Mulakken, N J; Torres, C L; Smith, J R; Slezak, T

    2004-01-26

    We built a system to guide decisions regarding the amount of genomic sequencing required to develop diagnostic DNA signatures, which are short sequences that are sufficient to uniquely identify a viral species. We used our existing DNA diagnostic signature prediction pipeline, which selects regions of a target species genome that are conserved among strains of the target (for reliability, to prevent false negatives) and unique relative to other species (for specificity, to avoid false positives). We performed simulations, based on existing sequence data, to assess the number of genome sequences of a target species and of close phylogenetic relatives (''near neighbors'') that are required to predict diagnostic signature regions that are conserved among strains of the target species and unique relative to other bacterial and viral species. For DNA viruses such as variola (smallpox), three target genomes provide sufficient guidance for selecting species-wide signatures. Three near neighbor genomes are critical for species specificity. In contrast, most RNA viruses require four target genomes and no near neighbor genomes, since lack of conservation among strains is more limiting than uniqueness. SARS and Ebola Zaire are exceptional, as additional target genomes currently do not improve predictions, but near neighbor sequences are urgently needed. Our results also indicate that double stranded DNA viruses are more conserved among strains than are RNA viruses, since in most cases there was at least one conserved signature candidate for the DNA viruses and zero conserved signature candidates for the RNA viruses.

  11. Sequence Patterns of Identity Authentication Protocols

    Tao Hongcai; He Dake

    2006-01-01

    From the viewpoint of protocol sequence, analyses are made of the sequence patterns of possible identity authentication protocol under two cases: with or without the trusted third party (TTP). Ten feasible sequence patterns of authentication protocol with TTP and 5 sequence patterns without TTP are gained. These gained sequence patterns meet the requirements for identity authentication,and basically cover almost all the authentication protocols with TTP and without TTP at present. All of the sequence patterns gained are classified into unilateral or bilateral authentication. Then , according to the sequence symmetry, several good sequence patterns with TTP are evaluated. The accompolished results can provide a reference to design of new identity authentication protocols.

  12. Vector sequences - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Full Text Available Budding yeast cDNA sequencing project Vector sequences Data detail Data name Vector sequences Description of data contents Vector seq...wnload License Update History of This Database Site Policy | Contact Us Vector sequences - Budding yeast cDNA sequencing project | LSDB Archive ... ...uences used for sequencing. Multi FASTA format. 7 entries. Data file File name: vec

  13. Comparison of novel decoy database designs for optimizing protein identification searches using ABRF sPRG2006 standard MS/MS data sets.

    Blanco, Luca; Mead, Jennifer A; Bessant, Conrad

    2009-04-01

    Decoy database searches are used to filter out false positive protein identifications derived from search engines, but there is no consensus about which decoy is "the best". We evaluate nine different decoy designs using public data sets from samples of known composition. Statistically significant performance differences were found, but no single decoy stood out among the best performers. Ultimately, we recommend peptide level reverse decoys searched independently from the target. PMID:19714810

  14. Sequences and series involving the sequence of composite numbers

    Panayiotis Vlamos

    2002-01-01

    Full Text Available Denoting by pn and cn the nth prime number and the nth composite number, respectively, we prove that both the sequence (xnn≥1, defined by xn=∑k=1n (ck+1−ck / k−pn / n, and the series ∑n=1∞ (pcn−cpn / npn are convergent.

  15. A repetitive sequence assembler based on next-generation sequencing.

    Lian, S; Tu, Y; Wang, Y; Chen, X; Wang, L

    2016-01-01

    Repetitive sequences of variable length are common in almost all eukaryotic genomes, and most of them are presumed to have important biomedical functions and can cause genomic instability. Next-generation sequencing (NGS) technologies provide the possibility of identifying capturing these repetitive sequences directly from the NGS data. In this study, we assessed the performances in identifying capturing repeats of leading assemblers, such as Velvet, SOAPdenovo, SGA, MSR-CA, Bambus2, ALLPATHS-LG, and AByss using three real NGS datasets. Our results indicated that most of them performed poorly in capturing the repeats. Consequently, we proposed a repetitive sequence assembler, named NGSReper, for capturing repeats from NGS data. Simulated datasets were used to validate the feasibility of NGSReper. The results indicate that the completeness of capturing repeat is up to 99%. Cross validation was performed in three real NGS datasets, and extensive comparisons indicate that NGSReper performed best in terms of completeness and accuracy in capturing repeats. In conclusion, NGSReper is an appropriate and suitable tool for capturing repeats directly from NGS data. PMID:27525861

  16. DNA Sequencing Using capillary Electrophoresis

    Dr. Barry Karger

    2011-05-09

    The overall goal of this program was to develop capillary electrophoresis as the tool to be used to sequence for the first time the Human Genome. Our program was part of the Human Genome Project. In this work, we were highly successful and the replaceable polymer we developed, linear polyacrylamide, was used by the DOE sequencing lab in California to sequence a significant portion of the human genome using the MegaBase multiple capillary array electrophoresis instrument. In this final report, we summarize our efforts and success. We began our work by separating by capillary electrophoresis double strand oligonucleotides using cross-linked polyacrylamide gels in fused silica capillaries. This work showed the potential of the methodology. However, preparation of such cross-linked gel capillaries was difficult with poor reproducibility, and even more important, the columns were not very stable. We improved stability by using non-cross linked linear polyacrylamide. Here, the entangled linear chains could move when osmotic pressure (e.g. sample injection) was imposed on the polymer matrix. This relaxation of the polymer dissipated the stress in the column. Our next advance was to use significantly lower concentrations of the linear polyacrylamide that the polymer could be automatically blown out after each run and replaced with fresh linear polymer solution. In this way, a new column was available for each analytical run. Finally, while testing many linear polymers, we selected linear polyacrylamide as the best matrix as it was the most hydrophilic polymer available. Under our DOE program, we demonstrated initially the success of the linear polyacrylamide to separate double strand DNA. We note that the method is used even today to assay purity of double stranded DNA fragments. Our focus, of course, was on the separation of single stranded DNA for sequencing purposes. In one paper, we demonstrated the success of our approach in sequencing up to 500 bases. Other

  17. Integrated sequence analysis. Final report

    Andersson, K.; Pyy, P

    1998-02-01

    The NKS/RAK subprojet 3 `integrated sequence analysis` (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term `methodology` denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  18. Hahn Sequence Space of Modals

    Balasubramanian, T.; Zion Chella Ruth, S.

    2014-01-01

    The history of modal intervals goes back to the very first publications on the topic of interval calculus. The modal interval analysis is used in Computer graphics and Computer Aided Design (CAD), namely, the computation of narrow bounds on Bezier and B-Spline curves. Since modal intervals are used in many fields, we introduce a new sequence space h(gI) called the Hahn sequence space of modal intervals. We have given some new definitions and theorems. Some inclusion relation and some topologi...

  19. $delta$-Quasi Cauchy Sequences

    Cakalli, Huseyin

    2010-01-01

    Recently, a concept of forward continuity and a concept of forward compactness are introduced in the senses that a function $f$ is forward continuous if $\\lim_{n\\to\\infty} \\Delta f(x_{n})=0$ whenever $\\lim_{n\\to\\infty} \\Delta x_{n}=0$,\\; and a subset $E$ of $\\textbf{R}$ is forward compact if any sequence $\\textbf{x}=(x_{n})$ of points in $E$ has a subsequence $\\textbf{z}=(z_{k})=(x_{n_{k}})$ of the sequence $\\textbf{x}$ such that $\\lim_{k\\to \\infty} \\Delta z_{k}=0$ where $\\Delta z_{k}=z_{k+1}...

  20. Farey Sequences and Resistor Networks

    Sameen Ahmed Khan

    2012-05-01

    In this article, we employ the Farey sequence and Fibonacci numbers to establish strict upper and lower bounds for the order of the set of equivalent resistances for a circuit constructed from equal resistors combined in series and in parallel. The method is applicable for networks involving bridge and non-planar circuits.

  1. Ideal-quasi-Cauchy sequences

    Cakalli, Huseyin

    2012-01-01

    An ideal $I$ is a family of subsets of positive integers $\\textbf{N}$ which is closed under taking finite unions and subsets of its elements. A sequence $(x_n)$ of real numbers is said to be $I$-convergent to a real number $L$, if for each \\;$ \\varepsilon> 0$ the set $\\{n:|x_{n}-L|\\geq \\varepsilon\\}$ belongs to $I$. We introduce $I$-ward compactness of a subset of $\\textbf{R}$, the set of real numbers, and $I$-ward continuity of a real function in the senses that a subset $E$ of $\\textbf{R}$ is $I$-ward compact if any sequence $(x_{n})$ of points in $E$ has an $I$-quasi-Cauchy subsequence, and a real function is $I$-ward continuous if it preserves $I$-quasi-Cauchy sequences where a sequence $(x_{n})$ is called to be $I$-quasi-Cauchy when $(\\Delta x_{n})$ is $I$-convergent to 0. We obtain results related to $I$-ward continuity, $I$-ward compactness, ward continuity, ward compactness, ordinary compactness, ordinary continuity, $\\delta$-ward continuity, and slowly oscillating continuity.

  2. Sequences in language and text

    Mikros, George K

    2015-01-01

    The aim of this volume is to present the diverse but highly interesting area of the quantitative analysis of the sequence of various linguistic structures. The collected articles present a wide spectrum of quantitative analyses of linguistic syntagmatic structures and explore novel sequential linguistic entities. This volume will be interesting to all researchers studying linguistics using quantitative methods.

  3. On primes in Lucas sequences

    Křížek, Michal; Somer, L.

    2015-01-01

    Roč. 53, č. 1 (2015), s. 1-23. ISSN 0015-0517 R&D Projects: GA ČR GA14-02067S Institutional support: RVO:67985840 Keywords : Lucas sequence * primes Subject RIV: BA - General Math ematics http://www.fq. math .ca/Abstracts/53-1/somer.pdf

  4. Fractals in DNA sequence analysis

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  5. Epigenome Sequencing Comes of Age

    Zhu, Jian-Kang

    2008-01-01

    Epigenetic states are responsive to developmental and environmental signals, and as a consequence a eukaryotic cell can have many different epigenomes. In this issue of Cell, Lister et al. (2008) present the floral epigenome of Arabidopsis using next-generation sequencing technology to analyze both DNA methylation at single-base resolution and the expression of small RNAs.

  6. A virtual substitution of Brouwer choice sequence

    Lange, Klaus

    2009-01-01

    Step by step a substitution of the well known Brouwer choice sequence will be constructed. It begins with an establishing of quasi alternating prime number series followed by a construction of a virtual sequence in sense of the virtual set definition. The last step gives reasons for why this virtual sequence substitutes the choice sequence created by L. E. J. Brouwer.

  7. Efficient algorithms for molecular sequence analysis.

    Karlin, S.; Morris, M.; Ghandour, G; Leung, M Y

    1988-01-01

    Efficient (linear time) algorithms are described for identifying global molecular sequence features allowing for errors including repeats, matches between sequences, dyad symmetry pairings, and other sequence patterns. A multiple sequence alignment algorithm is also described. Specific applications are given to hepatitis B viruses and the J5-C (J, joining; C, constant) region of the immunoglobulin kappa gene.

  8. Sequence-structure relations of biopolymers

    Barrett, Christopher; Reidys, Christian M

    2015-01-01

    Motivation: DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identify more general embedded "patterns" in DNA and RNA sequences. Results: We compute the partition function of sequences with respect to a fixed structure and connect this computation to the mutual information of a sequence-structure pair for RNA secondary structures. We present a Boltzmann sampler and obtain the a priori probability of specific sequence patterns. We present a detailed analysis for the three PDB-structures, 2JXV (hairpin), 2N3R (3-branch multi-loop) and 1EHZ (tRNA). We localize specific sequence patterns, contrast the energy spectrum of the Boltzmann sampled sequences versus those sequences that refold ...

  9. Analysis of repetitive sequence elements containing tRNA-like sequences.

    Lawrence, C B; McDonnell, D P; Ramsey, W J

    1985-01-01

    Several repetitive sequence elements from diverse species share extensive sequence homology with tRNA molecules. Analysis of the tRNA-like sequences within these elements suggest that they have originated from authentic tRNA sequences. Elements containing tRNA-like sequences can be divided into three distinct groups whose members share extensive sequence homology, have similar sequence organization and have unique species distribution. We suggest that these three groups represent independent ...

  10. Picoeukaryotic sequences in the Sargasso Sea metagenome

    Piganeau, Gwenael; Desdevises, Yves; Derelle, Evelyne; Moreau, Herve

    2008-01-01

    Background With genome sequencing becoming more and more affordable, environmental shotgun sequencing of the microorganisms present in an environment generates a challenging amount of sequence data for the scientific community. These sequence data enable the diversity of the microbial world and the metabolic pathways within an environment to be investigated, a previously unthinkable achievement when using traditional approaches. DNA sequence data assembled from extracts of 0.8 μm filtered Sar...

  11. Orthogonal Basis Spreading Sequence for Optimal CDMA

    Tsuda, Hirofumi; Umeno, Ken

    2016-01-01

    Recently, new spreading sequences have been proposed to multiplex the capacity of users. In particular, Weyl spreading sequences have the larger capacity of users than Gold code. This paper shows that Weyl spreading sequences appear in bit restoring model and they are orthogonal basis in the particular situation. This result shows the reason why they have the large capacity and that any spreading sequence are expressed as the sum of Weyl spreading sequences.

  12. Hardware Acceleration of Bioinformatics Sequence Alignment Applications

    Hasan, L.

    2011-01-01

    Biological sequence alignment is an important and challenging task in bioinformatics. Alignment may be defined as an arrangement of two or more DNA or protein sequences to highlight the regions of their similarity. Sequence alignment is used to infer the evolutionary relationship between a set of protein or DNA sequences. An accurate alignment can provide valuable information for experimentation on the newly found sequences. It is indispensable in basic research as well as in practical applic...

  13. Hardware Accelerated Sequence Alignment with Traceback

    Scott Lloyd; Snell, Quinn O

    2009-01-01

    Biological sequence alignment is an essential tool used in molecular biology and biomedical applications. The growing volume of genetic data and the complexity of sequence alignment present a challenge in obtaining alignment results in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is pres...

  14. Some Special Sequences in Fuzzy Graphs

    Jill K. Mathew

    2016-03-01

    Full Text Available In a fuzzy graph, the edges are mainly classified into α, β and δ. In this paper, some sequences in fuzzy graphs are introduced, whose concepts are based on the classification of edges. Besides, characterizations for blocks in fuzzy graphs and fuzzy trees are obtained. It is shown that β sequence of a fuzzy tree is a zero sequence and α sequence of a block is a binary sequence.

  15. Sequence analysis and editing for bisulphite genomic sequencing projects

    Carr, IM; Valleley, EMA; Cordery, SF; Markham, AF; Bonthron, DT

    2007-01-01

    Bisulphite genomic sequencing is a widely used technique for detailed analysis of the methylation status of a region of DNA. It relies upon the selective deamination of unmethylated cytosine to uracil after treatment with sodium bisulphite, usually followed by PCR amplification of the chosen target region. Since this two-step procedure replaces all unmethylated cytosine bases with thymine, PCR products derived from unmethylated templates contain only three types of nucleotide, in unequal prop...

  16. Nonparametric Inference for Periodic Sequences

    Sun, Ying

    2012-02-01

    This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.

  17. Cassini Mission Sequence Subsystem (MSS)

    Alland, Robert

    2011-01-01

    This paper describes my work with the Cassini Mission Sequence Subsystem (MSS) team during the summer of 2011. It gives some background on the motivation for this project and describes the expected benefit to the Cassini program. It then introduces the two tasks that I worked on - an automatic system auditing tool and a series of corrections to the Cassini Sequence Generator (SEQ_GEN) - and the specific objectives these tasks were to accomplish. Next, it details the approach I took to meet these objectives and the results of this approach, followed by a discussion of how the outcome of the project compares with my initial expectations. The paper concludes with a summary of my experience working on this project, lists what the next steps are, and acknowledges the help of my Cassini colleagues.

  18. Text analysis with sequence matching

    Ferme, Marko; Ojsteršek, Milan

    2012-01-01

    This article describes some common problems faced in natural language processing. The main problem consist of a user given sentence, which has to be matched against an existing knowledge base, consisting of semantically described words or phrases. Some main problems in this process are outlined and the most common solutions used in natural language processing are overviewed. A sequence matching algorithm is introduced as an alternative solution and its advantages over the existing approaches ...

  19. Computing with Hereditarily Finite Sequences

    Tarau, Paul

    2011-01-01

    e use Prolog as a flexible meta-language to provide executable specifications of some fundamental mathematical objects and their transformations. In the process, isomorphisms are unraveled between natural numbers and combinatorial objects (rooted ordered trees representing hereditarily finite sequences and rooted ordered binary trees representing G\\"odel's System {\\bf T} types). This paper focuses on an application that can be seen as an unexpected "paradigm shift": we provide recursive defin...

  20. Sequencing Trade and Monetary Integration

    Richard Pomfret

    2005-01-01

    Regional integration for at least the last sixty years has focused on trade integration. Balassa’s canonical taxonomy of regional trading arrangements is often interpreted as a sequence from free trade area through customs union and common market to economic union. In the 1980s the concept of deep integration went beyond trade with its focus on policy harmonization, which came to include monetary integration, but it presupposed trade integration as the first step in the regional integration s...

  1. Biological sequence classification with multivariate string kernels.

    Kuksa, Pavel P

    2013-01-01

    String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on the analysis of discrete 1D string data (e.g., DNA or amino acid sequences). In this paper, we address the multiclass biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physicochemical descriptors) and a class of multivariate string kernels that exploit these representations. On three protein sequence classification tasks, the proposed multivariate representations and kernels show significant 15-20 percent improvements compared to existing state-of-the-art sequence classification methods. PMID:24384708

  2. Fluorescence-detected DNA sequencing

    Haugland, R.P.

    1990-01-01

    Our research effort funded by this grant primarily focused on development of suitable fluorescent dyes for DNA sequencing studies. Prior to our efforts, the dyes being sued in commercial DNA sequencers were various versions of fluorescein dyes for the shorter wavelengths and of rhodamine dyes for the longer wavelengths. Our initial goal was to synthesize a set of four dyes that could all be excited by the 488 and 514 nm line of the argon laser lines and that have emission spectra that minimize spectral overlap. The specific result sought was higher fluorescent intensity, particularly of the longest wavelength dyes than was available using existing dyes. Another important property of the desired set of dyes was uniform ionic charge in order to have minimum interference on the electrophoretic mobility during the sequencing. During the period of this grant we prepared and characterized four types of dyes: fluorescent bifluorophores, derivatives of rhodamine dyes, derivatives of rhodol dyes and derivatives of boron dipyrromethene difluoride (BODIPY{trademark}) dyes.

  3. Sequencing technologies to maximize recovery

    Dusseault, M.B. [Waterloo Univ., ON (Canada)

    2006-07-01

    The deliberate sequencing of extraction technologies for viscous oil may optimize financial returns. Sequencing is based on an understanding that some technologies improve reservoir transport parameters through phenomena such as shear dilation; fracturing of shales through high pressure injection; thermal consolidation; shear rupture of clay dustings and shale laminae; and the dislodging of pore blocking agents. Cold heavy oil production with sand (CHOPS) increases reservoir permeability, porosity and compressibility, and can alter stress fields and flow paths, which can in turn lead to more effective subsequent thermal or gravity methods of extraction. High pressure thermal extraction processes such as steam flood (SF) and cyclic steam stimulation (CSS) generate reservoir dilation and viscosity reduction which mean that subsequent gravity methods such as VAPEX will achieve increased extraction efficiency. Massive dilation during cyclic injection phases means that recompaction drive mechanisms will improve recovery rates and recovery factors. Successful planning means that the physics and impacts on the reservoir of all commercial technologies must be understood, and initial systems of exploitation should be designed to minimize future well needs. It was concluded that if implemented correctly, the impact of sequencing on technically recoverable reserves estimates in heavy oil will be considerable. Recoverable reserves increases exceeding a trillion barrels are anticipated. 12 refs., 8 figs.

  4. Sequencing technologies to maximize recovery

    The deliberate sequencing of extraction technologies for viscous oil may optimize financial returns. Sequencing is based on an understanding that some technologies improve reservoir transport parameters through phenomena such as shear dilation; fracturing of shales through high pressure injection; thermal consolidation; shear rupture of clay dustings and shale laminae; and the dislodging of pore blocking agents. Cold heavy oil production with sand (CHOPS) increases reservoir permeability, porosity and compressibility, and can alter stress fields and flow paths, which can in turn lead to more effective subsequent thermal or gravity methods of extraction. High pressure thermal extraction processes such as steam flood (SF) and cyclic steam stimulation (CSS) generate reservoir dilation and viscosity reduction which mean that subsequent gravity methods such as VAPEX will achieve increased extraction efficiency. Massive dilation during cyclic injection phases means that recompaction drive mechanisms will improve recovery rates and recovery factors. Successful planning means that the physics and impacts on the reservoir of all commercial technologies must be understood, and initial systems of exploitation should be designed to minimize future well needs. It was concluded that if implemented correctly, the impact of sequencing on technically recoverable reserves estimates in heavy oil will be considerable. Recoverable reserves increases exceeding a trillion barrels are anticipated. 12 refs., 8 figs

  5. Multineuronal Spike Sequences Repeat with Millisecond Precision

    Yuji Ikegaya

    2013-06-01

    Full Text Available Cortical microcircuits are nonrandomly wired by neurons. As a natural consequence, spikes emitted by microcircuits are also nonrandomly patterned in time and space. One of the prominent spike organizations is a repetition of fixed patterns of spike series across multiple neurons. However, several questions remain unsolved, including how precisely spike sequences repeat, how the sequences are spatially organized, how many neurons participate in sequences, and how different sequences are functionally linked. To address these questions, we monitored spontaneous spikes of hippocampal CA3 neurons ex vivo using a high-speed functional multineuron calcium imaging technique that allowed us to monitor spikes with millisecond resolution and to record the location of spiking and nonspiking neurons. Multineuronal spike sequences were overrepresented in spontaneous activity compared to the statistical chance level. Approximately 75% of neurons participated in at least one sequence during our observation period. The participants were sparsely dispersed and did not show specific spatial organization. The number of sequences relative to the chance level decreased when larger time frames were used to detect sequences. Thus, sequences were precise at the millisecond level. Sequences often shared common spikes with other sequences; parts of sequences were subsequently relayed by following sequences, generating complex chains of multiple sequences.

  6. Static multiplicities in heterogeneous azeotropic distillation sequences

    Esbjerg, Klavs; Andersen, Torben Ravn; Jørgensen, Sten Bay; Müller, Dirk; Marquardt, Wolfgang

    In this paper the results of a bifurcation analysis on heterogeneous azeotropic distillation sequences are given. Two sequences suitable for ethanol dehydration are compared: The 'direct' and the 'indirect' sequence. It is shown, that the two sequences, despite their similarities, exhibit very...... different static behavior. The method of Petlyuk and Avet'yan (1971), Bekiaris et al. (1993), which assumes infinite reflux and infinite number of stages, is extended to and applied on heterogeneous azeotropic distillation sequences. The predictions are substantiated through simulations. The static sequence...

  7. Dog Y chromosomal DNA sequence: identification, sequencing and SNP discovery

    Kirkness Ewen; Lundeberg Joakim; Angleby Helen; Oskarsson Mattias CR; Natanaelsson Christian; Savolainen Peter

    2006-01-01

    Abstract Background Population genetic studies of dogs have so far mainly been based on analysis of mitochondrial DNA, describing only the history of female dogs. To get a picture of the male history, as well as a second independent marker, there is a need for studies of biallelic Y-chromosome polymorphisms. However, there are no biallelic polymorphisms reported, and only 3200 bp of non-repetitive dog Y-chromosome sequence deposited in GenBank, necessitating the identification of dog Y chromo...

  8. Direct Chloroplast Sequencing: Comparison of Sequencing Platforms and Analysis Tools for Whole Chloroplast Barcoding

    Marta Brozynska; Agnelo Furtado; Robert James Henry

    2014-01-01

    Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genom...

  9. Arithmetic Self-Similarity of Infinite Sequences

    Hendriks, Dimitri; Endrullis, Joerg; Dow, Mark; Klop, Jan Willem

    2012-01-01

    We define the arithmetic self-similarity (AS) of a one-sided infinite sequence sigma to be the set of arithmetic progressions through sigma which are a vertical shift of sigma. We classify the AS of several well-known sequences, such as the Thue-Morse sequence, the period doubling sequence, and the regular paperfolding sequence. The latter two are examples of (completely) additive sequences as well as of Toeplitz words. We investigate the intersection of these families. We give a complete characterization of single-gap patterns that yield additive Toeplitz words, and classify their AS. Moreover, we show that every arithmetic progression through a Toeplitz word generated by a one-gap pattern is again a Toeplitz word. Finally, we establish that generalized Morse sequences are specific sum-of-digits sequences, and show that their first difference is a Toeplitz word.

  10. On Paranorm Zweier -Convergent Sequence Spaces

    Vakeel A. Khan

    2013-01-01

    Full Text Available In this paper, we introduce the paranorm Zweier -convergent sequence spaces , , and , a sequence of positive real numbers. We study some topological properties, prove the decomposition theorem, and study some inclusion relations on these spaces.

  11. The Art of Gymnastics: Creating Sequences.

    Rovegno, Inez

    1988-01-01

    Offering students opportunities for creating movement sequences in gymnastics allows them to understand the essence of gymnastics, have creative experiences, and learn about themselves. The process of creating sequences is described. (MT)

  12. Active learning of manipulation sequences

    Martínez Martínez, David; Alenyà Ribas, Guillem; Jimenez Schlegl, Pablo; Torras, Carme; Rossmann, Jürgen; Wantia, Nils; Eren Erdal, Aksoy; Haller, Simon; Piater, Justus

    2014-01-01

    We describe a system allowing a robot to learn goal-directed manipulation sequences such as steps of an assembly task. Learning is based on a free mix of exploration and instruction by an external teacher, and may be active in the sense that the system tests actions to maximize learning progress and asks the teacher if needed. The main component is a symbolic planning engine that operates on learned rules, defined by actions and their pre- and postconditions. Learned by model-based reinforcem...

  13. Infinite matrices and sequence spaces

    Cooke, Richard G

    2014-01-01

    This clear and correct summation of basic results from a specialized field focuses on the behavior of infinite matrices in general, rather than on properties of special matrices. Three introductory chapters guide students to the manipulation of infinite matrices, covering definitions and preliminary ideas, reciprocals of infinite matrices, and linear equations involving infinite matrices.From the fourth chapter onward, the author treats the application of infinite matrices to the summability of divergent sequences and series from various points of view. Topics include consistency, mutual consi

  14. Mappings of Type Special Space of Sequences

    Awad A. Bakery

    2016-01-01

    Full Text Available We give sufficient conditions on a special space of sequences defined by Mohamed and Bakery (2013 such that the finite rank operators are dense in the complete space of operators whose approximation numbers belong to this sequence space. Hence, under a few conditions, every compact operator would be approximated by finite rank operators. We apply it on the sequence space defined by Tripathy and Mahanta (2003. Our results match those known for p-absolutely summable sequences of reals.

  15. Goldbach conjecture sequences in quantum mechanics

    Prudencio, Thiago; Silva, Edilberto O.

    2013-01-01

    We show that there is a correspondence between Goldbach conjecture sequences (GCS) and expectation values of the number operator in Fock states. We demonstrate that depending on the normalization or not of Fock state superpositions, we have sequences that are equivalent and sequences that are not equivalent to GCS. We propose an algorithm where sequences equivalent to GCS can be derived in terms of expectation values with normalized states. Defining states whose projections generate GCS, we r...

  16. Automated preparation of DNA sequences for publication.

    Shapiro, M B; Senapathy, P

    1986-01-01

    A computer program which draws DNA sequences is described. A simple method is used which enables the user to highlight or annotate specific parts of a sequence. The sizes of the characters in the sequence to be drawn are specified by the user. In addition, vertical spacing between lines and horizontal spacing between characters can be specified. Sequences can be prepared and high quality output produced on a plotter in a short period of time, making the program advantageous to use over typing...

  17. Discovering motifs that induce sequencing errors

    Allhoff, Manuel; Schoenhuth, Alexander; Martin, M.(Deutsches Elektronen-Synchrotron, Hamburg, Germany); Costa, I.G.; Rahmann, S.; Marschall, Tobias

    2013-01-01

    Background Elevated sequencing error rates are the most predominant obstacle in single-nucleotide polymorphism (SNP) detection, which is a major goal in the bulk of current studies using next-generation sequencing (NGS). Beyond routinely handled generic sources of errors, certain base calling errors relate to specific sequence patterns. Statistically principled ways to associate sequence patterns with base calling errors have not been previously described. Extant approaches either incur decis...

  18. Analysis of Sequence Conservation at Nucleotide Resolution

    Asthana, Saurabh; Roytberg, Mikhail; Stamatoyannopoulos, John; Sunyaev, Shamil R.

    2007-01-01

    One of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting the functional significance of human genetic variation and for analyzing the sequence substructure of cis-regulatory sequences and other functional elements. Current methods for analysis of sequence ...

  19. Overcoming Sequence Misalignments with Weighted Structural Superposition

    Khazanov, Nickolay A.; Damm-Ganamet, Kelly L.; Quang, Daniel X.; Carlson, Heather A.

    2012-01-01

    An appropriate structural superposition identifies similarities and differences between homologous proteins that are not evident from sequence alignments alone. We have coupled our Gaussian-weighted RMSD (wRMSD) tool with a sequence aligner and seed extension (SE) algorithm to create a robust technique for overlaying structures and aligning sequences of homologous proteins (HwRMSD). HwRMSD overcomes errors in the initial sequence alignment that would normally propagate into a standard RMSD ov...

  20. Operational analysis of sequence diagram specifications

    2007-01-01

    This thesis is concerned with operational analysis of UML 2.x sequence diagram specifications. By operational analysis we mean analysis based on a characterization of the executions of sequence diagrams, or in other words an operational semantics for sequence diagrams. We define two methods for analysis of sequence diagram specifications – refinement verification and refinement testing – and both are implemented in an analysis tool we have named ‘Escalator’. Further, we make the first steps i...

  1. Quantitative phenotyping via deep barcode sequencing

    SMITH, ANDREW M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey

    2009-01-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied...

  2. Nanopore DNA sequencing with MspA

    Derrington, Ian M.; Butler, Tom Z.; Collins, Marcus D.; Manrao, Elizabeth; Pavlenok, Mikhail; Niederweis, Michael; Gundlach, Jens H.

    2010-01-01

    Nanopore sequencing has the potential to become a direct, fast, and inexpensive DNA sequencing technology. The simplest form of nanopore DNA sequencing utilizes the hypothesis that individual nucleotides of single-stranded DNA passing through a nanopore will uniquely modulate an ionic current flowing through the pore, allowing the record of the current to yield the DNA sequence. We demonstrate that the ionic current through the engineered Mycobacterium smegmatis porin A, MspA, has the ability...

  3. Cutting sequences on square-tiled surfaces

    Johnson, Charles C.

    2016-01-01

    We characterize cutting sequences of infinite geodesics on square-tiled surfaces by considering interval exchanges on specially chosen intervals on the surface. These interval exchanges can be thought of as skew products over a rotation, and we convert cutting sequences to symbolic trajectories of these interval exchanges to show that special types of combinatorial lifts of Sturmian sequences completely describe all cutting sequences on a square-tiled surface. Our results extend the list of f...

  4. Approximate word matches between two random sequences

    Burden, Conrad J.; Kantorovitz, Miriam R; Wilson, Susan R

    2008-01-01

    Given two sequences over a finite alphabet $\\mathcal{L}$ , the D2 statistic is the number of m-letter word matches between the two sequences. This statistic is used in bioinformatics for expressed sequence tag database searches. Here we study a generalization of the D2 statistic in the context of DNA sequences, under the assumption of strand symmetric Bernoulli text. For k

  5. Comparative genomics beyond sequence-based alignments

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.;

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...

  6. PacBio Sequencing and Its Applications

    Anthony Rhoads; Kin Fai Au

    2015-01-01

    Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with dis-eases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Addition-ally, PacBio’s sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone.

  7. RNAome sequencing delineates the complete RNA landscape

    K.W.J. Derks (Kasper); J. Pothof (Joris)

    2015-01-01

    textabstractStandard RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species. For example, small and large RNAs from the same sample cannot be sequenced in a single sequence run. We designed RNAome sequencing, which is a strand-specif

  8. The recurrence sequence via the Fibonacci groups

    Aküzüm, Yeşim; Deveci, Ömür

    2016-04-01

    This work develops properties of the recurrence sequence defined by the aid of the relation matrix of the Fibonacci groups. The study of this sequence modulo m yields cyclic groups and semigroups from generating matrix. Finally, we extend the sequence defined to groups and then, we obtain its period in the Fibonacci groups.

  9. Artificial sequences and complexity measures

    Baronchelli, Andrea; Caglioti, Emanuele; Loreto, Vittorio

    2005-04-01

    In this paper we exploit concepts of information theory to address the fundamental problem of identifying and defining the most suitable tools for extracting, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to define a measure of remoteness and distance between pairs of sequences of characters (e.g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of artificial text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self-consistent classification.

  10. Problems of Sequence Stratigraphy in China

    2000-01-01

    A comprehensive study of outcrop sequence stratigraphy in China began in the early 1990s.The investigated strata range from Mesoproterozoic to Quaternary and the studied areas cover the three platforms and margins, the Southern Himalayas and the East China and South China seas.Problems of general concern in the sequence stratigraphy of China are discussed. These are: the hierarchy for sequence stratigraphy, the third-order Sequence and eustasy, the chronostratigraphic boundaries and GSSP, and the International Stratigraphic Chart and the sequence chronostratigraphy of China. The average time interval of Mesosequence (25-40 Ma) and of the Sequence (2-5 Ma) is suggested and the minor sequences below the Sequence are discussed. The time interval of the Sequence shows no evident decrease with time, but several epochs with remarkable short intervals occur in the Phanerozoic, which may represent a planetary behavior denoting the special development stages in earth's evolution. Sea level change curves are given separately for the three platforms and the different regions. The Global Stratotype Section and Point (GSSP) concept and practice are discussed, and a comparison between the first appearance point of biozone and the first flooding surface in the Sequence is made for designation of the chronostratigraphic boundary.It is suggested that the chronostratigraphic boundaries might be set at the first flooding surface in the Sequence for easy recognition. The idea of sequence chronostratigraphy is recommended, and a comparison between the International Stratigraphic Chart and the sequence chronostratigraphy of China is made. The close relation between chronostratigraphy and sequence stratigraphy makes it possible for sequence stratigraphy to improve chronostratigraphic research. It is pointed out that multidisciplinary study in chronostratigraphy is necessary and should be promising and profitable.

  11. Chip-based sequencing nucleic acids

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  12. Permutation Entropy for Random Binary Sequences

    Lingfeng Liu

    2015-12-01

    Full Text Available In this paper, we generalize the permutation entropy (PE measure to binary sequences, which is based on Shannon’s entropy, and theoretically analyze this measure for random binary sequences. We deduce the theoretical value of PE for random binary sequences, which can be used to measure the randomness of binary sequences. We also reveal the relationship between this PE measure with other randomness measures, such as Shannon’s entropy and Lempel–Ziv complexity. The results show that PE is consistent with these two measures. Furthermore, we use PE as one of the randomness measures to evaluate the randomness of chaotic binary sequences.

  13. MatrixPlot: visualizing sequence constraints

    Gorodkin, Jan; Stærfeldt, Hans Henrik; Lund, Ole;

    1999-01-01

    MatrixPlot: visualizing sequence constraints. Sub-title Abstract Summary : MatrixPlot is a program for making high-quality matrix plots, such as mutual information plots of sequence alignments and distance matrices of sequences with known three-dimensional coordinates. The user can add information...... about the sequences (e.g. a sequence logo profile) along the edges of the plot, as well as zoom in on any region in the plot. Availability : MatrixPlot can be obtained on request, and can also be accessed online at http://www. cbs.dtu.dk/services/MatrixPlot. Contact : gorodkin@cbs.dtu.dk...

  14. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    de Souza, S J; Camargo, A A; Briones, M R;

    2000-01-01

    by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48...

  15. Nimble Protein Sequence Alignment in Grid (NPSAG

    K. Somasundaram

    2008-01-01

    Full Text Available In Bio-Informatics application, the analysis of protein sequence is a kind of computation driven science which has rapidly and quickly growing biological data. Also databases used in these applications are heterogeneous in nature and alignment of protein sequence using physical techniques is expensive, slow and results are not always guaranteed/accurate. So this application requires cross-platform, cost-effective and more computing power algorithm for sequence matching and searching a sequence in database. Grid is one of the most emerging technologies of cost effective computing paradigm for large class of data and compute intensive application which enables large-scale aggregation and sharing of computational data and other resources across institutional boundaries. We proposed the Grid architecture for searching of distributed, heterogeneous genomic databases which contained protein sequences to speed up the analysis of large scale sequence data and performed sequence alignment for residues match.

  16. Design of Digital Hybrid Chaotic Sequence Generator

    RAO Nini; ZENG Dong

    2004-01-01

    The feasibility of the hybrid chaotic sequences as the spreading codes in code divided multiple access(CDMA) system is analyzed.The design and realization of the digital hybrid chaotic sequence generator by very high speed integrated circuit hardware description language(VHDL) are described.A valid hazard canceledl method is presented.Computer simulations show that the stable digital sequence waveforms can be produced.The correlations of the digital hybrid chaotic sequences are compared with those of m-sequences.The results show that the correlations of the digital hybrid chaotic sequences are almost as good as those of m-sequences.The works in this paper explored a road for the practical applications of chaos.

  17. GATA: a graphic alignment tool for comparative sequence analysis

    Nix David A; Eisen Michael B

    2005-01-01

    Abstract Background Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collineari...

  18. Secondary-task effects on sequence learning.

    Heuer, H; Schmidtke, V

    1996-01-01

    With a repeated sequence of stimuli, performance in a serial reaction-time task improves more than with a random sequence. The difference has been taken as a measure of implicit sequence learning. Implicit sequence learning is impaired when a secondary task is added to the serial RT task. In the first experiment, secondary-task effects on different types of sequences were studied to test the hypothesis that the learning of unique sequences (where each sequence element has a unique relation to the following one) is not impaired by the secondary task, while the learning of ambiguous sequences is. The sequences were random up to a certain order of sequential dependencies, where they became deterministic. Contrary to the hypothesis, secondary-task effects on the learning of unique sequences were as strong or stronger than such effects on the learning of ambiguous sequences. In the second experiment a hybrid sequence (with unique as well as ambiguous transitions) was used with different secondary tasks. A visuo-spatial and a verbal memory task did not interfere with the learning of the sequence, but interference was observed with an auditory go/no-go task in which high- and low-pitched tones were presented after each manual response and a foot pedal had to be pressed in response to high-pitched tones. Thus, interference seems to be specific to certain secondary tasks and may be related to memory processes (but most likely not to visuo-spatial and verbal memory) or to the organizations of sequences, consistent with previous suggestions. PMID:8810586

  19. The RNA world, automatic sequences and oncogenetics

    We construct a model of the RNA world in terms of naturally evolving nucleotide sequences assuming only Crick-Watson base pairing and self-cleaving/splicing capability. These sequences have the following properties. 1) They are recognizable by an automation (or automata). That is, to each k-sequence, there exist a k-automation which accepts, recognizes or generates the k-sequence. These are known as automatic sequences. Fibonacci and Morse-Thue sequences are the most natural outcome of pre-biotic chemical conditions. 2) Infinite (resp. large) sequences are self-similar (resp. nearly self-similar) under certain rewrite rules and consequently give rise to fractal (resp.fractal-like) structures. Computationally, such sequences can also be generated by their corresponding deterministic parallel re-write system, known as a DOL system. The self-similar sequences are fixed points of their respective rewrite rules. Some of these automatic sequences have the capability that they can read or 'accept' other sequences while others can detect errors and trigger error-correcting mechanisms. They can be enlarged and have block and/or palindrome structure. Linear recurring sequences such as Fibonacci sequence are simply Feed-back Shift Registers, a well know model of information processing machines. We show that a mutation of any rewrite rule can cause a combinatorial explosion of error and relates this to oncogenetical behavior. On the other hand, a mutation of sequences that are not rewrite rules, leads to normal evolutionary change. Known experimental results support our hypothesis. (author). Refs

  20. Computing with Hereditarily Finite Sequences

    Tarau, Paul

    2011-01-01

    e use Prolog as a flexible meta-language to provide executable specifications of some fundamental mathematical objects and their transformations. In the process, isomorphisms are unraveled between natural numbers and combinatorial objects (rooted ordered trees representing hereditarily finite sequences and rooted ordered binary trees representing G\\"odel's System {\\bf T} types). This paper focuses on an application that can be seen as an unexpected "paradigm shift": we provide recursive definitions showing that the resulting representations are directly usable to perform symbolically arbitrary-length integer computations. Besides the theoretically interesting fact of "breaking the arithmetic/symbolic barrier", the arithmetic operations performed with symbolic objects like trees or types turn out to be genuinely efficient -- we derive implementations with asymptotic performance comparable to ordinary bitstring implementations of arbitrary-length integer arithmetic. The source code of the paper, organized as a ...

  1. Automatic Sequencing for Experimental Protocols

    Hsieh, Paul F.; Stern, Ivan

    We present a paradigm and implementation of a system for the specification of the experimental protocols to be used for the calibration of AXAF mirrors. For the mirror calibration, several thousand individual measurements need to be defined. For each measurement, over one hundred parameters need to be tabulated for the facility test conductor and several hundred instrument parameters need to be set. We provide a high level protocol language which allows for a tractable representation of the measurement protocol. We present a procedure dispatcher which automatically sequences a protocol more accurately and more rapidly than is possible by an unassisted human operator. We also present back-end tools to generate printed procedure manuals and database tables required for review by the AXAF program. This paradigm has been tested and refined in the calibration of detectors to be used in mirror calibration.

  2. An enhanced algorithm for multiple sequence alignment of protein sequences using genetic algorithm

    Kumar, Manish

    2015-01-01

    One of the most fundamental operations in biological sequence analysis is multiple sequence alignment (MSA). The basic of multiple sequence alignment problems is to determine the most biologically plausible alignments of protein or DNA sequences. In this paper, an alignment method using genetic algorithm for multiple sequence alignment has been proposed. Two different genetic operators mainly crossover and mutation were defined and implemented with the proposed method in order to know the pop...

  3. Software for pre-processing Illumina next-generation sequencing short read sequences

    Chen, Chuming; Khaleel, Sari S; Huang, Hongzhan; Cathy H Wu

    2014-01-01

    Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been devel...

  4. Next-Generation Sequencing Techniques for Eukaryotic Microorganisms: Sequencing-Based Solutions to Biological Problems▿

    Nowrousian, Minou

    2010-01-01

    Over the past 5 years, large-scale sequencing has been revolutionized by the development of several so-called next-generation sequencing (NGS) technologies. These have drastically increased the number of bases obtained per sequencing run while at the same time decreasing the costs per base. Compared to Sanger sequencing, NGS technologies yield shorter read lengths; however, despite this drawback, they have greatly facilitated genome sequencing, first for prokaryotic genomes and within the las...

  5. A fast algorithm for exact sequence search in biological sequences using polyphase decomposition

    Srikantha, Abhilash; Bopardikar, Ajit S.; Kaipa, Kalyan Kumar; Venkataraman, Parthasarathy; Lee, KyuSang; Ahn, TaeJin; Narayanan, Rangavittal

    2010-01-01

    Motivation: Exact sequence search allows a user to search for a specific DNA subsequence in a larger DNA sequence or database. It serves as a vital block in many areas such as Pharmacogenetics, Phylogenetics and Personal Genomics. As sequencing of genomic data becomes increasingly affordable, the amount of sequence data that must be processed will also increase exponentially. In this context, fast sequence search algorithms will play an important role in exploiting the information contained i...

  6. Underlying Data for Sequencing the Mitochondrial Genome with the Massively Parallel Sequencing Platform Ion Torrent™ PGM™

    Seo, Seung Bum; Zeng, Xiangpei; King, Jonathan L.; Larue, Bobby L; Assidi, Mourad; Al-Qahtani, Mohamed H; Sajantila, Antti; Budowle, Bruce

    2015-01-01

    Abstract Background Massively parallel sequencing (MPS) technologies have the capacity to sequence targeted regions or whole genomes of multiple nucleic acid samples with high coverage by sequencing millions of DNA fragments simultaneously. Compared with Sanger sequencing, MPS also can reduce labor and cost on a per nucleotide basis and indeed on a per sample basis. In this study, whole genomes of human mitochondria (mtGenome) were sequenced on the Personal Genome Machine (PGMTM) (L...

  7. Underlying Data for Sequencing the Mitochondrial Genome with the Massively Parallel Sequencing Platform Ion Torrent™ PGM™

    Seo, Seung Bum; Zeng, Xiangpei; King, Jonathan L.; Larue, Bobby L; Assidi, Mourad; Al-Qahtani, Mohamed H; Sajantila, Antti; Budowle, Bruce

    2015-01-01

    Background Massively parallel sequencing (MPS) technologies have the capacity to sequence targeted regions or whole genomes of multiple nucleic acid samples with high coverage by sequencing millions of DNA fragments simultaneously. Compared with Sanger sequencing, MPS also can reduce labor and cost on a per nucleotide basis and indeed on a per sample basis. In this study, whole genomes of human mitochondria (mtGenome) were sequenced on the Personal Genome Machine (PGMTM) (Life Technologies, S...

  8. Lygus hesperus polygalacturonase Characterization and Role in Plant Damage

    The amino terminus, of a Lygus hesperus salivary gland protein revealing polygalacturonase (PG) activity in an SDS-PAGE activity gel assay, has been sequenced via Edman degradation. The N-terminal amino acid sequence shares homology with the predicted amino acid sequence for putative L. lineolaris P...

  9. Variable copy number DNA sequences in rice.

    Kikuchi, S; Takaiwa, F; Oono, K

    1987-12-01

    We have cloned two types of variable copy number DNA sequences from the rice embryo genome. One of these sequences, which was cloned in pRB301, was amplified about 50-fold during callus formation and diminished in copy number to the embryonic level during regeneration. The other clone, named pRB401, showed the reciprocal pattern. The copy numbers of both sequences were changed even in the early developmental stage and eliminated from nuclear DNA along with growth of the plant. Sequencing analysis of the pRB301 insert revealed some open reading frames and direct repeat structures, but corresponding sequences were not identified in the EMBL and LASL DNA databases. Sequencing of the nuclear genomic fragment cloned in pRB401 revealed the presence of the 3'rps12-rps7 region of rice chloroplast DNA. Our observations suggest that during callus formation (dedifferentiation), regeneration and the growth process the copy numbers of some DNA sequences are variable and that nuclear integrated chloroplast DNA acts as a variable copy number sequence in the rice genome. Based on data showing a common sequence in mitochondria and chloroplast DNA of maize (Stern and Lonsdale 1982) and that the rps12 gene of tobacco chloroplast DNA is a divided gene (Torazawa et al. 1986), it is suggested that the sequence on the inverted repeat structure of chloroplast DNA may have the character of a movable genetic element. PMID:3481021

  10. Assembly Sequence Planning for Mechanical Products

    1999-01-01

    A method for assembly sequence planning is proposed in this paper. First, two methods for assembly sequence planning are compared, which are indirect method and direct method. Then, the limits of the previous assembly planning system are pointed out. On the basis of indirect method, an improved method for assembly sequence planning is put forward. This method is composed of four parts, which are assembly modeling for products, assembly sequence representing, assembly sequence planning, and evaluation and optimization. The assembly model is established by human machine interaction, and the assembly model contains components' information and the assembly relation among the components. The assembly sequence planning is based on the breaking up of the assembly model. And/or graph is used to represent assembly sequence set. Every component which satisfies the disassembly condition is recorded as a node of an and/or graph. After the disassembly sequence and/or graph is generated, heuristic algorithm - AO* algorithm is used to search the disassembly sequence and/or graph, and the optimum assembly sequence planning is realized. This method is proved to be effective in a prototype system which is a sub-project of a state 863/CIMS research project of China - ‘Concurrent Engineering’.

  11. Randomness in Sequence Evolution Increases over Time.

    Wang, Guangyu; Sun, Shixiang; Zhang, Zhang

    2016-01-01

    The second law of thermodynamics states that entropy, as a measure of randomness in a system, increases over time. Although studies have investigated biological sequence randomness from different aspects, it remains unknown whether sequence randomness changes over time and whether this change consists with the second law of thermodynamics. To capture the dynamics of randomness in molecular sequence evolution, here we detect sequence randomness based on a collection of eight statistical random tests and investigate the randomness variation of coding sequences with an application to Escherichia coli. Given that core/essential genes are more ancient than specific/non-essential genes, our results clearly show that core/essential genes are more random than specific/non-essential genes and accordingly indicate that sequence randomness indeed increases over time, consistent well with the second law of thermodynamics. We further find that an increase in sequence randomness leads to increasing randomness of GC content and longer sequence length. Taken together, our study presents an important finding, for the first time, that sequence randomness increases over time, which may provide profound insights for unveiling the underlying mechanisms of molecular sequence evolution. PMID:27224236

  12. Novel bioinformatic developments for exome sequencing.

    Lelieveld, Stefan H; Veltman, Joris A; Gilissen, Christian

    2016-06-01

    With the widespread adoption of next generation sequencing technologies by the genetics community and the rapid decrease in costs per base, exome sequencing has become a standard within the repertoire of genetic experiments for both research and diagnostics. Although bioinformatics now offers standard solutions for the analysis of exome sequencing data, many challenges still remain; especially the increasing scale at which exome data are now being generated has given rise to novel challenges in how to efficiently store, analyze and interpret exome data of this magnitude. In this review we discuss some of the recent developments in bioinformatics for exome sequencing and the directions that this is taking us to. With these developments, exome sequencing is paving the way for the next big challenge, the application of whole genome sequencing. PMID:27075447

  13. Predicting Contextual Sequences via Submodular Function Maximization

    Dey, Debadeepta; Hebert, Martial; Bagnell, J Andrew

    2012-01-01

    Sequence optimization, where the items in a list are ordered to maximize some reward has many applications such as web advertisement placement, search, and control libraries in robotics. Previous work in sequence optimization produces a static ordering that does not take any features of the item or context of the problem into account. In this work, we propose a general approach to order the items within the sequence based on the context (e.g., perceptual information, environment description, and goals). We take a simple, efficient, reduction-based approach where the choice and order of the items is established by repeatedly learning simple classifiers or regressors for each "slot" in the sequence. Our approach leverages recent work on submodular function maximization to provide a formal regret reduction from submodular sequence optimization to simple cost-sensitive prediction. We apply our contextual sequence prediction algorithm to optimize control libraries and demonstrate results on two robotics problems: ...

  14. Evolutionarily conserved sequences on human chromosome 21

    Frazer, Kelly A.; Sheehan, John B.; Stokowski, Renee P.; Chen, Xiyin; Hosseini, Roya; Cheng, Jan-Fang; Fodor, Stephen P.A.; Cox, David R.; Patil, Nila

    2001-09-01

    Comparison of human sequences with the DNA of other mammals is an excellent means of identifying functional elements in the human genome. Here we describe the utility of high-density oligonucleotide arrays as a rapid approach for comparing human sequences with the DNA of multiple species whose sequences are not presently available. High-density arrays representing approximately 22.5 Mb of nonrepetitive human chromosome 21 sequence were synthesized and then hybridized with mouse and dog DNA to identify sequences conserved between humans and mice (human-mouse elements) and between humans and dogs (human-dog elements). Our data show that sequence comparison of multiple species provides a powerful empiric method for identifying actively conserved elements in the human genome. A large fraction of these evolutionarily conserved elements are present in regions on chromosome 21 that do not encode known genes.

  15. Nanopore DNA sequencing with MspA.

    Derrington, Ian M; Butler, Tom Z; Collins, Marcus D; Manrao, Elizabeth; Pavlenok, Mikhail; Niederweis, Michael; Gundlach, Jens H

    2010-09-14

    Nanopore sequencing has the potential to become a direct, fast, and inexpensive DNA sequencing technology. The simplest form of nanopore DNA sequencing utilizes the hypothesis that individual nucleotides of single-stranded DNA passing through a nanopore will uniquely modulate an ionic current flowing through the pore, allowing the record of the current to yield the DNA sequence. We demonstrate that the ionic current through the engineered Mycobacterium smegmatis porin A, MspA, has the ability to distinguish all four DNA nucleotides and resolve single-nucleotides in single-stranded DNA when double-stranded DNA temporarily holds the nucleotides in the pore constriction. Passing DNA with a series of double-stranded sections through MspA provides proof of principle of a simple DNA sequencing method using a nanopore. These findings highlight the importance of MspA in the future of nanopore sequencing. PMID:20798343

  16. Locomotor sequence learning in visually guided walking

    Choi, Julia T; Jensen, Peter; Nielsen, Jens Bo

    2016-01-01

    Voluntary limb modifications must be integrated with basic walking patterns during visually guided walking. Here we tested whether voluntary gait modifications can become more automatic with practice. We challenged walking control by presenting visual stepping targets that instructed subjects to...... modify step length from one trial to the next. Our sequence learning paradigm is derived from the serial reaction-time (SRT) task that has been used in upper limb studies. Both random and ordered sequences of step lengths were used to measure sequence-specific and sequence non-specific learning during...... walking. In addition, we determined how age (i.e., healthy young adults vs. children) and biomechanical factors (i.e., walking speed) affected the rate and magnitude of locomotor sequence learning. The results showed that healthy young adults (age 24 ± 5 years, N = 20) could learn a specific sequence of...

  17. Sequencing intractable DNA to close microbial genomes.

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  18. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084

  19. Improving Recurrent Neural Networks For Sequence Labelling

    Dinarelli, Marco; Tellier, Isabelle

    2016-01-01

    In this paper we study different types of Recurrent Neural Networks (RNN) for sequence labeling tasks. We propose two new variants of RNNs integrating improvements for sequence labeling, and we compare them to the more traditional Elman and Jordan RNNs. We compare all models, either traditional or new, on four distinct tasks of sequence labeling: two on Spoken Language Understanding (ATIS and MEDIA); and two of POS tagging for the French Treebank (FTB) and the Penn Treebank (PTB) corpora. The...

  20. A classification of periodic turtle sequences

    J. Holdener; A. Wagaman

    2003-01-01

    A turtle sequence is a word constructed from an alphabet of two letters: F, which represents the forward motion of a turtle in the plane, and L, which represents a counterclockwise turn. In this paper, we investigate such sequences and establish links between the combinatoric properties of words and the geometric properties of the curves they generate. In particular, we classify periodic turtle sequences in terms of their closure (or lack thereof).

  1. Mutations in the K+ channel signature sequence.

    Heginbotham, L; Lu, Z; Abramson, T; MacKinnon, R

    1994-01-01

    Potassium channels share a highly conserved stretch of eight amino acids, a K+ channel signature sequence. The conserved sequence falls within the previously defined P-region of voltage-activated K+ channels. In this study we investigate the effect of mutations in the signature sequence of the Shaker channel on K+ selectivity determined under bi-ionic conditions. Nonconservative substitutions of two threonine residues and the tyrosine residue leave selectivity intact. In contrast, mutations a...

  2. Predicting Contextual Sequences via Submodular Function Maximization

    Dey, Debadeepta; Liu, Tian Yu; Hebert, Martial; Bagnell, J. Andrew

    2012-01-01

    Sequence optimization, where the items in a list are ordered to maximize some reward has many applications such as web advertisement placement, search, and control libraries in robotics. Previous work in sequence optimization produces a static ordering that does not take any features of the item or context of the problem into account. In this work, we propose a general approach to order the items within the sequence based on the context (e.g., perceptual information, environment description, ...

  3. A statistical study of aftershock sequences

    Giorgio Ranalli

    2010-01-01

    A comprehensive statistical study of the phenomenology of aftershock sequences is made in this paper. The spatial distribution of aftershocks indicates that they are mainly crustal events; however, deeper sequences also take place. The analysis of the distribution of aftershocks in 15 sequences with respect to time and magnitude leads to the statistical confirmation of a set of phenomenological laws describing the process, namely, the time-frequency law of hyperbolic decay of aftershock activ...

  4. Principles for modelling of manufacturing sequences

    Werke, Mats

    2015-01-01

    The manufacturing sequence influence, to a large extent, component properties like fatigue life, shape accuracy and manufacturability. By simulating the manufacturing sequence, using numerical or empirical models, and extracting important accumulated data, like residual stress, hardness and shape, the possibilities of early analysis of a design concept and the associated manufacturing sequence will increase. An established methodology has the potential of reducing physical testing and the tim...

  5. Scalable Transcriptome Preparation for Massive Parallel Sequencing

    Henrik Stranneheim; Beata Werne; Ellen Sherwood; Joakim Lundeberg

    2011-01-01

    BACKGROUND: The tremendous output of massive parallel sequencing technologies requires automated robust and scalable sample preparation methods to fully exploit the new sequence capacity. METHODOLOGY: In this study, a method for automated library preparation of RNA prior to massively parallel sequencing is presented. The automated protocol uses precipitation onto carboxylic acid paramagnetic beads for purification and size selection of both RNA and DNA. The automated sample preparation was co...

  6. Accurate and comprehensive sequencing of personal genomes

    Ajay, Subramanian S.; Parker, Stephen C.J.; Ozel Abaan, Hatice; Fuentes Fajardo, Karin V.; Margulies, Elliott H.

    2011-01-01

    As whole-genome sequencing becomes commoditized and we begin to sequence and analyze personal genomes for clinical and diagnostic purposes, it is necessary to understand what constitutes a complete sequencing experiment for determining genotypes and detecting single-nucleotide variants. Here, we show that the current recommendation of ∼30× coverage is not adequate to produce genotype calls across a large fraction of the genome with acceptably low error rates. Our results are based on analyses...

  7. Scalable Transcriptome Preparation for Massive Parallel Sequencing

    Stranneheim, Henrik; Werne, Beata; Sherwood, Ellen; Lundeberg, Joakim

    2011-01-01

    Background The tremendous output of massive parallel sequencing technologies requires automated robust and scalable sample preparation methods to fully exploit the new sequence capacity. Methodology In this study, a method for automated library preparation of RNA prior to massively parallel sequencing is presented. The automated protocol uses precipitation onto carboxylic acid paramagnetic beads for purification and size selection of both RNA and DNA. The automated sample preparation was comp...

  8. Effects of Sequence Partitioning on Compression Rate

    Alagoz, B. Baykant

    2010-01-01

    In the paper, a theoretical work is done for investigating effects of splitting data sequence into packs of data set. We proved that a partitioning of data sequence is possible to find such that the entropy rate at each subsequence is lower than entropy rate of the source. Effects of sequence partitioning on overall compression rate are argued on the bases of partitioning statistics, and then, an optimization problem for an optimal partition is defined to improve overall compression rate of a...

  9. Some properties of generalized Fibonacci sequence

    Chong, Chin-Yoon; Ho, C. K.

    2015-12-01

    For all non-negative integer n and real constants a, b, p and q, the generalized Fibonacci sequence {U n } is defined by Un+2 = pUn+1 + qUn with the initial values U0 = a and U1 = b. Throughout the paper, we study some properties of the generalized Fibonacci sequence. Our results will motivate some new research problems concerning the contribution of the generalized sequence.

  10. The diploid genome sequence of Candida albicans

    Jones, Ted; Federspiel, Nancy A.; Chibana, Hiroji; Dungan, Jan; Kalman, Sue; Magee, B. B.; Newport, George; Thorstenson, Yvonne R.; Agabian, Nina; Magee, P T; Davis, Ronald W.; Scherer, Stewart

    2004-01-01

    We present the diploid genome sequence of the fungal pathogen Candida albicans. Because C. albicans has no known haploid or homozygous form, sequencing was performed as a whole-genome shotgun of the heterozygous diploid genome in strain SC5314, a clinical isolate that is the parent of strains widely used for molecular analysis. We developed computational methods to assemble a diploid genome sequence in good agreement with available physical mapping data. We provide a whole-genome description ...

  11. cis sequence effects on gene expression

    Jacobs Kevin

    2007-08-01

    Full Text Available Abstract Background Sequence and transcriptional variability within and between individuals are typically studied independently. The joint analysis of sequence and gene expression variation (genetical genomics provides insight into the role of linked sequence variation in the regulation of gene expression. We investigated the role of sequence variation in cis on gene expression (cis sequence effects in a group of genes commonly studied in cancer research in lymphoblastoid cell lines. We estimated the proportion of genes exhibiting cis sequence effects and the proportion of gene expression variation explained by cis sequence effects using three different analytical approaches, and compared our results to the literature. Results We generated gene expression profiling data at N = 697 candidate genes from N = 30 lymphoblastoid cell lines for this study and used available candidate gene resequencing data at N = 552 candidate genes to identify N = 30 candidate genes with sufficient variance in both datasets for the investigation of cis sequence effects. We used two additive models and the haplotype phylogeny scanning approach of Templeton (Tree Scanning to evaluate association between individual SNPs, all SNPs at a gene, and diplotypes, with log-transformed gene expression. SNPs and diplotypes at eight candidate genes exhibited statistically significant (p cis sequence effects in our study, respectively. Conclusion Based on analysis of our results and the extant literature, one in four genes exhibits significant cis sequence effects, and for these genes, about 30% of gene expression variation is accounted for by cis sequence variation. Despite diverse experimental approaches, the presence or absence of significant cis sequence effects is largely supported by previously published studies.

  12. EGNAS: an exhaustive DNA sequence design algorithm

    Kick Alfred

    2012-06-01

    Full Text Available Abstract Background The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of sequences with defined properties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences offers the possibility of controlling both interstrand and intrastrand properties. The guanine-cytosine content can be adjusted. Sequences can be forced to start and end with guanine or cytosine. This option reduces the risk of “fraying” of DNA strands. It is possible to limit cross hybridizations of a defined length, and to adjust the uniqueness of sequences. Self-complementarity and hairpin structures of certain length can be avoided. Sequences and subsequences can optionally be forbidden. Furthermore, sequences can be designed to have minimum interactions with predefined strands and neighboring sequences. Results The algorithm is realized in a C++ program. TAG sequences can be generated and combined with primers for single-base extension reactions, which were described for multiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldback through intrastrand interaction of TAG-primer pairs can be limited. The design of sequences for specific attachment of molecular constructs to DNA origami is presented. Conclusions We developed a new software tool called EGNAS for the design of unique nucleic acid sequences. The presented exhaustive algorithm allows to generate greater sets of sequences than with previous software and equal constraints. EGNAS is freely available for noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS.

  13. Automated correction of genome sequence errors

    Gajer, Pawel; Schatz, Michael; Salzberg, Steven L

    2004-01-01

    By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the overall accuracy of genome sequences and facilitates the use of these sequences for polymorphism discovery. We describe the algorithm and its application in a large set of recent genome sequencing projects. The number of erroneous base calls in these projects was reduced by 80%. In an analysis of ov...

  14. Repetitive sequence environment distinguishes housekeeping genes

    Eller, C. Daniel; Regelson, Moira; Merriman, Barry; Nelson, Stan,; Horvath, Steve; Marahrens, York

    2006-01-01

    Housekeeping genes are expressed across a wide variety of tissues. Since repetitive sequences have been reported to influence the expression of individual genes, we employed a novel approach to determine whether housekeeping genes can be distinguished from tissue-specific genes their repetitive sequence context. We show that Alu elements are more highly concentrated around housekeeping genes while various longer (>400-bp) repetitive sequences ("repeats"), including Long Interspersed Nuclear E...

  15. Next-generation sequencing: applications beyond genomes

    Marguerat, Samuel; Wilhelm, Brian T.; Bähler, Jürg

    2008-01-01

    The development of DNA sequencing more than 30 years ago has profoundly impacted biological research. In the last couple of years, remarkable technological innovations have emerged that allow the direct and cost-effective sequencing of complex samples at unprecedented scale and speed. These next-generation technologies make it feasible to sequence not only static genomes, but also entire transcriptomes expressed under different conditions. These and other powerful applications of next-generat...

  16. BAC ends library generation for Illumina sequencing

    sprotocols

    2015-01-01

    Bacterial artificial chromosome (BAC) libraries are still a valuable tool for de novo assembly of complex genomes, such as many plants genomes. Shotgun sequencing of BACs, individually or by pools, produces first assemblies which usually need further improvement towards finished quality. We developed a new approach to obtain BAC ends libraries for Illumina sequencing (BES), overcoming the expensive and time consuming BAC ends Sanger sequencing. This new method could be useful for improving de...

  17. Compressing DNA sequence databases with coil

    Hendy Michael D; White W Timothy J

    2008-01-01

    Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequenc...

  18. On the Origin of Sequence

    Peter T. S. van der Gulik

    2015-11-01

    Full Text Available Three aspects which make planet Earth special, and which must be taken in consideration with respect to the emergence of peptides, are the mineralogical composition, the Moon which is in the same size class, and the triple environment consisting of ocean, atmosphere, and continent. GlyGly is a remarkable peptide because it stimulates peptide bond formation in the Salt-Induced Peptide Formation reaction. The role glycine and aspartic acid play in the active site of RNA polymerase is remarkable too. GlyGly might have been the original product of coded peptide synthesis because of its importance in stimulating the production of oligopeptides with a high aspartic acid content, which protected small RNA molecules by binding Mg2+ ions. The feedback loop, which is closed by having RNA molecules producing GlyGly, is proposed as the essential element fundamental to life. Having this system running, longer sequences could evolve, gradually solving the problem of error catastrophe. The basic structure of the standard genetic code (8 fourfold degenerate codon boxes and 8 split codon boxes is an example of the way information concerning the emergence of life is frozen in the biological constitution of organisms: the structure of the code contains historical information.

  19. Interfaces in sequence permutated multilayers

    Balogh, J; Bujdoso, L; Kaptas, D; Kiss, L F; Kemeny, T; Vincze, I, E-mail: baloghj@szfki.h [Research Institute for Solid State Physics and Optics, 1525 Budapest PO Box 49 (Hungary)

    2010-03-01

    Sequence permutation of three building block multilayers was recently suggested as a new approach in studying bottom and top interfaces formed of a given layer with either of the other two elements. It was applied to Fe-B-Ag multilayers with 5 nm Ag layers separating the Fe and the B layers. Now we examine the dependence of the chemical mixing and the consequent amorphous phase formation on the nominal thickness of the Ag layers in [2 nm B / 2nm Fe / x nm Ag]{sub 4}, 0.2{<=}x{<=}10, multilayers. The ratio of the non-alloyed Fe layer and the amorphous Fe-B interface compound changes only below x=5 nm. It is attributed to discontinuities of the Ag layer due to its three dimensional island growth over the bcc-Fe layer. The results obtained on the variation of the hyperfine field distribution of the amorhous Fe-B layers also confirm that the top interfaces of Fe with B are more B-rich than the bottom ones.

  20. Genomic sequencing of Pleistocene cave bears

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  1. Hardware Accelerated Sequence Alignment with Traceback

    Scott Lloyd

    2009-01-01

    in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations. With 256 processing elements in FPGA technology, a performance gain over 300 times that of a desktop computer is demonstrated on sequence lengths of 16000. For greater performance, the architecture is scalable to more processing elements.

  2. Researches on Sequence of Plant Cystatin: Phytocystatin

    QINQingfeng; HEWei; LIANGJun; ZHANGXingyao

    2005-01-01

    Plant cystatins or phytocystatins are cysteine proteinase inhibitors exist widely in different plant species. Because they can kill insects by inhibiting the digestive function of the cysteine proteinase in gut, they are believed to play an important role in plant's defense against pests. Phytocystatins contain the conserved QXVXG motif and show some features on their sequence different to animal cystatins.After sequencing the protein directly and the cDNA clone, a large number of plant cystatins have been characterized. A multialignment with BLAST software and a detail analysis of 38 phytocystatins show that phytocystatins possess a specific conserved amino acid sequence [LRVI]-[AGT]-[RQKE]-[FY]-[AS]-[VI]-X-[EGHDQV]-[HYFQ]-N different to the conserved sequence demonstrated by Margis in 1998. This conserved sequence can be enough to detect with exclusivity phytocystatin sequences on protein data banks. A classification of these phytocystatins is performed and they can be divided into 3 groups according to their features on amino acid sequence, and the group-I can be still divided into 3 subgroups based on the feature of their amino acid and genomic sequence. By the CLUSTALX software,the most conserved nucleotide sequences of phytocystatins were found, which could be used to design the degenerate premiers to search new phytocystatins with PCR reaction.

  3. Detecting Emotions from Connected Action Sequences

    Bernhardt, Daniel; Robinson, Peter

    In this paper we deal with the problem of detecting emotions from the body movements produced by naturally connected action sequences. Although action sequences are one of the most common forms of body motions in everyday scenarios their potential for emotion recognition has not been explored in the past. We show that there are fundamental differences between actions recorded in isolation and in natural sequences and demonstrate a number of techniques which allow us to correctly label action sequences with one of four emotions up to 86% of the time. Our results bring us an important step closer to recognizing emotions from body movements in natural scenarios.

  4. A note on bifix-free sequences

    Nielsen, Peter Tolstrup

    1973-01-01

    A bifix of anL-aryn-tuple is a sequence which is both a prefix and a suffix of thatn-tuple. The practical importance of bifix-free patterns is emphasized, and we devise a systematic way of generating all such sequences and determine their number.......A bifix of anL-aryn-tuple is a sequence which is both a prefix and a suffix of thatn-tuple. The practical importance of bifix-free patterns is emphasized, and we devise a systematic way of generating all such sequences and determine their number....

  5. Massively parallel sequencing of forensic STRs

    Parson, Walther; Ballard, David; Budowle, Bruce;

    2016-01-01

    The DNA Commission of the International Society for Forensic Genetics (ISFG) is reviewing factors that need to be considered ahead of the adoption by the forensic community of short tandem repeat (STR) genotyping by massively parallel sequencing (MPS) technologies. MPS produces sequence data...... accessible genome assembly, and in place before the uptake of MPS by the general forensic community starts to generate sequence data on a large scale. While the established nomenclature for CE-based STR analysis will remain unchanged in the future, the nomenclature of sequence-based STR genotypes will need...

  6. Generalized Identities of Companion Fibonacci-Like Sequences

    Shikha Bhatnagar

    2013-06-01

    Full Text Available The Fibonacci sequence, Lucas sequence, Pell sequence, Pell-Lucas sequence, Jacobsthalsequence and Jacobsthal-Lucas sequence are most prominent examples of second order recursivesequences. In this paper, we deal with two companion Fibonacci- Like sequences which aregeneralization of Fibonacci-Like sequence. Further we obtain some generalized identities amongthe terms of companion Fibonacci-Like sequences, Jacobsthal and Jacobsthal-Lucas sequencesthrough Binet’s formulae.

  7. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding.

    Marta Brozynska

    Full Text Available Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina and Ion Torrent (Life Technology sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare. Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.

  8. Compressing DNA sequence databases with coil

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  9. Representing objects, relations, and sequences.

    Gallant, Stephen I; Okaywe, T Wendy

    2013-08-01

    Vector symbolic architectures (VSAs) are high-dimensional vector representations of objects (e.g., words, image parts), relations (e.g., sentence structures), and sequences for use with machine learning algorithms. They consist of a vector addition operator for representing a collection of unordered objects, a binding operator for associating groups of objects, and a methodology for encoding complex structures. We first develop constraints that machine learning imposes on VSAs; for example, similar structures must be represented by similar vectors. The constraints suggest that current VSAs should represent phrases ("The smart Brazilian girl") by binding sums of terms, in addition to simply binding the terms directly. We show that matrix multiplication can be used as the binding operator for a VSA, and that matrix elements can be chosen at random. A consequence for living systems is that binding is mathematically possible without the need to specify, in advance, precise neuron-to-neuron connection properties for large numbers of synapses. A VSA that incorporates these ideas, Matrix Binding of Additive Terms (MBAT), is described that satisfies all constraints. With respect to machine learning, for some types of problems appropriate VSA representations permit us to prove learnability rather than relying on simulations. We also propose dividing machine (and neural) learning and representation into three stages, with differing roles for learning in each stage. For neural modeling, we give representational reasons for nervous systems to have many recurrent connections, as well as for the importance of phrases in language processing. Sizing simulations and analyses suggest that VSAs in general, and MBAT in particular, are ready for real-world applications. PMID:23607563

  10. Fibonacci-triple sequences and some fundamental properties

    Bijendra Singh

    2010-12-01

    Full Text Available Fibonacci sequence stands as a kind of super sequence with fabulous properties. This note presents Fibonacci-Triple sequences that may also be called 3-F sequences. This is the explosive development in the region of Fibonacci sequence. Our purpose of this paper is to demonstrate fundamental properties of Fibonacci-Triple sequence.

  11. A Model for Pairs of Beatty Sequences

    Ginosar, Y

    2011-01-01

    Two Beatty sequences are recorded by athletes running in opposite directions in a round stadium. This approach suggests a nice interpretation for well known partitioning criteria: such sequences (eventually) partition the integers essentially when the athletes have the same starting point.

  12. Genome Sequence of Pseudomonas chlororaphis Strain 189

    Town, Jennifer; Audy, Patrice; Boyetchko, Susan M.

    2016-01-01

    Pseudomonas chlororaphis strain 189 is a potent inhibitor of the growth of the potato pathogen Phytophthora infestans. We determined the complete, finished sequence of the 6.8-Mbp genome of this strain, consisting of a single contiguous molecule. Strain 189 is closely related to previously sequenced strains of P. chlororaphis. PMID:27340063

  13. Multilocus Sequence Typing Tool for Cyclospora cayetanensis

    Guo, Yaqiong; Roellig, Dawn M.; Li, Na; Tang, Kevin; Frace, Michael; Ortega, Ynes; Arrowood, Michael J.; Qvarnstrom, Yvonne; Wang, Lin; Moss, Delynn M.; Zhang, Longxian; Xiao, Lihua

    2016-01-01

    Because the lack of typing tools for Cyclospora cayetanensis has hampered outbreak investigations, we sequenced its genome and developed a genotyping tool. We observed 2 to 10 geographically segregated sequence types at each of 5 selected loci. This new tool could be useful for case linkage and infection/contamination source tracking. PMID:27433881

  14. Archaebacterial rhodopsin sequences: Implications for evolution

    Lanyi, J. K.

    1991-01-01

    It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.

  15. About a new family of sequences

    Diniz, Felipe Bottega

    2016-01-01

    First we define a new kind of function over $\\mathbb{N}$. For each $i\\in\\mathbb{N}$ we have an associated function, which will be called $S_i$ . Then we define a new kind of sequence, to be made from the functions $S_i$ . Finally, we will see that some of these sequences has a self-similarity feature.

  16. From Arithmetic Sequences to Linear Equations

    Matsuura, Ryota; Harless, Patrick

    2012-01-01

    The first part of the article focuses on deriving the essential properties of arithmetic sequences by appealing to students' sense making and reasoning. The second part describes how to guide students to translate their knowledge of arithmetic sequences into an understanding of linear equations. Ryota Matsuura originally wrote these lessons for…

  17. Towards a reference pecan genome sequence

    The cost of generating DNA sequence data has declined dramatically over the previous 15 years as a result of the Human Genome Project and the potential applications of genome sequencing for human medicine. This cost reduction has generated renewed interest among crop breeding scientists in applying...

  18. Interactive computer programs in sequence data analysis.

    Jagadeeswaran, P; McGuire, P M

    1982-01-01

    We present interactive computer programs for the analysis of nucleic acid sequences. In order to handle these programs, minimum computer experience is sufficient. The nucleotide sequence of the human gamma globin gene complex is used as an example to illustrate the data analysis.

  19. SPARSE SEQUENCE CONSTRUCTION OF LDPC CODES

    2005-01-01

    This letter proposes a novel and simple construction of regular Low-Density Parity-Check (LDPC) codes using sparse binary sequences. It utilizes the cyclic cross correlation function of sparse sequences to generate codes with girth8. The new codes perform well using the sumproduct decoding. Low encodingcomplexity can also be achieved due to the inherent quasi-cyclic structure of the codes.

  20. Stochastic Modelling of Daily Rainfall sequences

    Buishand, T.A.

    1977-01-01

    Rainfall series of different climatic regions were analysed with the aim of generating daily rainfall sequences. A survey of the data is given in I, 1. When analysing daily rainfall sequences one must be aware of the following points:a. Seasonality. Because of seasonal variation of features of the r

  1. Project Report: Automatic Sequence Processor Software Analysis

    Benjamin, Brandon

    2011-01-01

    The Mission Planning and Sequencing (MPS) element of Multi-Mission Ground System and Services (MGSS) provides space missions with multi-purpose software to plan spacecraft activities, sequence spacecraft commands, and then integrate these products and execute them on spacecraft. Jet Propulsion Laboratory (JPL) is currently is flying many missions. The processes for building, integrating, and testing the multi-mission uplink software need to be improved to meet the needs of the missions and the operations teams that command the spacecraft. The Multi-Mission Sequencing Team is responsible for collecting and processing the observations, experiments and engineering activities that are to be performed on a selected spacecraft. The collection of these activities is called a sequence and ultimately a sequence becomes a sequence of spacecraft commands. The operations teams check the sequence to make sure that no constraints are violated. The workflow process involves sending a program start command, which activates the Automatic Sequence Processor (ASP). The ASP is currently a file-based system that is comprised of scripts written in perl, c-shell and awk. Once this start process is complete, the system checks for errors and aborts if there are any; otherwise the system converts the commands to binary, and then sends the resultant information to be radiated to the spacecraft.

  2. Bonobos extract meaning from call sequences.

    Zanna Clay

    Full Text Available Studies on language-trained bonobos have revealed their remarkable abilities in representational and communication tasks. Surprisingly, however, corresponding research into their natural communication has largely been neglected. We address this issue with a first playback study on the natural vocal behaviour of bonobos. Bonobos produce five acoustically distinct call types when finding food, which they regularly mix together into longer call sequences. We found that individual call types were relatively poor indicators of food quality, while context specificity was much greater at the call sequence level. We therefore investigated whether receivers could extract meaning about the quality of food encountered by the caller by integrating across different call sequences. We first trained four captive individuals to find two types of foods, kiwi (preferred and apples (less preferred at two different locations. We then conducted naturalistic playback experiments during which we broadcasted sequences of four calls, originally produced by a familiar individual responding to either kiwi or apples. All sequences contained the same number of calls but varied in the composition of call types. Following playbacks, we found that subjects devoted significantly more search effort to the field indicated by the call sequence. Rather than attending to individual calls, bonobos attended to the entire sequences to make inferences about the food encountered by a caller. These results provide the first empirical evidence that bonobos are able to extract information about external events by attending to vocal sequences of other individuals and highlight the importance of call combinations in their natural communication system.

  3. Molecular selection in a unified evolutionary sequence

    Fox, S. W.

    1986-01-01

    With guidance from experiments and observations that indicate internally limited phenomena, an outline of unified evolutionary sequence is inferred. Such unification is not visible for a context of random matrix and random mutation. The sequence proceeds from Big Bang through prebiotic matter, protocells, through the evolving cell via molecular and natural selection, to mind, behavior, and society.

  4. Pig genome sequence - analysis and publication strategy

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol;

    2010-01-01

    preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. CONCLUSIONS...

  5. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    de Souza, Sandro J.; Anamaria A Camargo; Briones, Marcelo R. S.; Fernando F. Costa; NAGAI, MARIA APARECIDA; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 ...

  6. EXTENSION OF SEQUENCE STRATIGRAPHIC MODEL AND SEQUENCE STRATIGRAPHC ANALYSIS IN LIMNIC DEPOSITIONAL BASIN

    李增学; 魏久传; 王民镇; 李守春; 李青山; 金秀昆; 兰恒星

    1996-01-01

    The architectural patterns of sedimentary succession are diverse in different depositionalbasins. The sedimentary architecture and geological condition of such basins asepicontinental sea, intraplate limnic basins, etc., differ clearly from those of continentalmargin basin. Extension, complement and perfection of sequence stratigraphic models are needed in the studies ofvarious depositional basins based on the classical sequence model. This paper, for this reason,expounds the thought, principles of sequence division, methodology and technology of the studyof sequence stratigraphy in epicontinental and limnic basins.

  7. Fat-supressed sequences and gadolinium

    This paper determines the role of fat-suppressed sequences in medullary bone disorders. Fifteen patients with peripheral bone disorders and normal signal intensity (SI) of medullary spaces on T1- and T2-weighted images were studied with (1) a short inversion recovery (STIR) sequence (1,000/140/20) and (2) a T1-weighted SE sequence with fat suppression by spectral presaturation (SPIR, Gyroscan, 1.5 T). Both sequences and T1-weighted images were obtained before and after contrast material injection. In five patients, STIR images revealed areas of high SI, resulting from subtle edema undetected with SE imaging. No SI changes were observed with contrast-enhanced STIR sequence in these areas

  8. Smgcd: metrics for biological sequence data

    In the realm of bioinformatics, the key challenges are to manage, store and retrieve the biological data efficiently. It can be classified in to structured, unstructured and semi-structured contents. Typically, the semi-structured biological data comprised of biological sequences. The complex biological sequences produce huge volume of biological data which further produce much more problems for its management, storage and retrieval. This paper proposed metrics; namely, symmetry measure, molecular weight measure, similarity or diversity measure, size base measure, size gap measure, complexity measure and size complexity diversity measure to manage the raised problems in biological data sequences. These metrics measure the sequence complexity, molecular weights, length with gaps and without gaps, its symmetry and similarity through mathematical formulations. The metrics are demonstrated and validated using the proposed hybrid technique which combines empirical evidence with theoretical formulation. This research opens new horizons for efficient management to measure the functionality and quality of metadata for single and multiple biological sequences. (author)

  9. Transcriptome sequencing goals, assembly, and assessment.

    Wheat, Christopher W; Vogel, Heiko

    2011-01-01

    Transcriptome sequencing provides quick, direct access to the mRNA. With this information, one can design primers for PCR of thousands of different genes, SNP markers, probes for microarrays and qPCR, or just use the sequence data itself in comparative studies. Transcriptome sequencing, while getting cheaper, is still an expensive endeavor, with an examination of data quality and its assembly infrequently performed in depth. Here, we outline many of the important issues we think need consideration when starting a transcriptome sequencing project. We also walk the reader through a detailed analysis of an example transcriptome dataset, highlighting the importance of both within-dataset analysis and comparative inferences. Our hope is that with greater attention focused upon assessing assembly performance, advances in transcriptome assembly will increase as prices continue to drop and new technologies, such as Illumina sequencing, start to be used. PMID:22065435

  10. Sequence Affects the Cyclization of DNA Minicircles.

    Wang, Qian; Pettitt, B Montgomery

    2016-03-17

    Understanding how the sequence of a DNA molecule affects its dynamic properties is a central problem affecting biochemistry and biotechnology. The process of cyclizing short DNA, as a critical step in molecular cloning, lacks a comprehensive picture of the kinetic process containing sequence information. We have elucidated this process by using coarse-grained simulations, enhanced sampling methods, and recent theoretical advances. We are able to identify the types and positions of structural defects during the looping process at a base-pair level. Correlations along a DNA molecule dictate critical sequence positions that can affect the looping rate. Structural defects change the bending elasticity of the DNA molecule from a harmonic to subharmonic potential with respect to bending angles. We explore the subelastic chain as a possible model in loop formation kinetics. A sequence-dependent model is developed to qualitatively predict the relative loop formation time as a function of DNA sequence. PMID:26938490

  11. Nucleotide sequence preservation of human mitochondrial DNA

    Recombinant DNA techniques have been used to quantitate the amount of nucleotide sequence divergence in the mitochondrial DNA population of individual normal humans. Mitochondrial DNA was isolated from the peripheral blood lymphocytes of five normal humans and cloned in M13 mp11; 49 kilobases of nucleotide sequence information was obtained from 248 independently isolated clones from the five normal donors. Both between- and within-individual differences were identified. Between-individual differences were identified in approximately = to 1/200 nucleotides. In contrast, only one within-individual difference was identified in 49 kilobases of nucleotide sequence information. This high degree of mitochondrial nucleotide sequence homogeneity in human somatic cells is in marked contrast to the rapid evolutionary divergence of human mitochondrial DNA and suggests the existence of mechanisms for the concerted preservation of mammalian mitochondrial DNA sequences in single organisms

  12. Sequencing and comparing whole mitochondrial genomes ofanimals

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  13. M-sequences in ophthalmic electrophysiology.

    Müller, Philipp L; Meigen, Thomas

    2016-01-01

    The aim of this review is to use the multimedia aspects of a purely digital online publication to explain and illustrate the highly capable technique of m-sequences in multifocal ophthalmic electrophysiology. M-sequences have been successfully applied in clinical routines during the past 20 years. However, the underlying mathematical rationale is often daunting. These mathematical properties of m-sequences allow one not only to separate the responses from different fields but also to analyze adaptational effects and impacts of former events. By explaining the history, the formation, and the different aspects of application, a better comprehension of the technique is intended. With this review we aim to clarify the opportunities of m-sequences in order to motivate scientists to use m-sequences in their future research. PMID:26818968

  14. Genotyping-by-Sequencing in Plants

    Gregory D. May

    2012-09-01

    Full Text Available The advent of next-generation DNA sequencing (NGS technologies has led to the development of rapid genome-wide Single Nucleotide Polymorphism (SNP detection applications in various plant species. Recent improvements in sequencing throughput combined with an overall decrease in costs per gigabase of sequence is allowing NGS to be applied to not only the evaluation of small subsets of parental inbred lines, but also the mapping and characterization of traits of interest in much larger populations. Such an approach, where sequences are used simultaneously to detect and score SNPs, therefore bypassing the entire marker assay development stage, is known as genotyping-by-sequencing (GBS. This review will summarize the current state of GBS in plants and the promises it holds as a genome-wide genotyping application.

  15. The DNA sequence of human chromosome 7.

    Hillier, Ladeana W; Fulton, Robert S; Fulton, Lucinda A; Graves, Tina A; Pepin, Kymberlie H; Wagner-McPherson, Caryn; Layman, Dan; Maas, Jason; Jaeger, Sara; Walker, Rebecca; Wylie, Kristine; Sekhon, Mandeep; Becker, Michael C; O'Laughlin, Michelle D; Schaller, Mark E; Fewell, Ginger A; Delehaunty, Kimberly D; Miner, Tracie L; Nash, William E; Cordes, Matt; Du, Hui; Sun, Hui; Edwards, Jennifer; Bradshaw-Cordum, Holland; Ali, Johar; Andrews, Stephanie; Isak, Amber; Vanbrunt, Andrew; Nguyen, Christine; Du, Feiyu; Lamar, Betty; Courtney, Laura; Kalicki, Joelle; Ozersky, Philip; Bielicki, Lauren; Scott, Kelsi; Holmes, Andrea; Harkins, Richard; Harris, Anthony; Strong, Cynthia Madsen; Hou, Shunfang; Tomlinson, Chad; Dauphin-Kohlberg, Sara; Kozlowicz-Reilly, Amy; Leonard, Shawn; Rohlfing, Theresa; Rock, Susan M; Tin-Wollam, Aye-Mon; Abbott, Amanda; Minx, Patrick; Maupin, Rachel; Strowmatt, Catrina; Latreille, Phil; Miller, Nancy; Johnson, Doug; Murray, Jennifer; Woessner, Jeffrey P; Wendl, Michael C; Yang, Shiaw-Pyng; Schultz, Brian R; Wallis, John W; Spieth, John; Bieri, Tamberlyn A; Nelson, Joanne O; Berkowicz, Nicolas; Wohldmann, Patricia E; Cook, Lisa L; Hickenbotham, Matthew T; Eldred, James; Williams, Donald; Bedell, Joseph A; Mardis, Elaine R; Clifton, Sandra W; Chissoe, Stephanie L; Marra, Marco A; Raymond, Christopher; Haugen, Eric; Gillett, Will; Zhou, Yang; James, Rose; Phelps, Karen; Iadanoto, Shawn; Bubb, Kerry; Simms, Elizabeth; Levy, Ruth; Clendenning, James; Kaul, Rajinder; Kent, W James; Furey, Terrence S; Baertsch, Robert A; Brent, Michael R; Keibler, Evan; Flicek, Paul; Bork, Peer; Suyama, Mikita; Bailey, Jeffrey A; Portnoy, Matthew E; Torrents, David; Chinwalla, Asif T; Gish, Warren R; Eddy, Sean R; McPherson, John D; Olson, Maynard V; Eichler, Evan E; Green, Eric D; Waterston, Robert H; Wilson, Richard K

    2003-07-10

    Human chromosome 7 has historically received prominent attention in the human genetics community, primarily related to the search for the cystic fibrosis gene and the frequent cytogenetic changes associated with various forms of cancer. Here we present more than 153 million base pairs representing 99.4% of the euchromatic sequence of chromosome 7, the first metacentric chromosome completed so far. The sequence has excellent concordance with previously established physical and genetic maps, and it exhibits an unusual amount of segmentally duplicated sequence (8.2%), with marked differences between the two arms. Our initial analyses have identified 1,150 protein-coding genes, 605 of which have been confirmed by complementary DNA sequences, and an additional 941 pseudogenes. Of genes confirmed by transcript sequences, some are polymorphic for mutations that disrupt the reading frame. PMID:12853948

  16. Phase Shift Sequences for an Adding Interferometer

    Hyland, Peter; Bunn, Emory F

    2008-01-01

    Cosmic microwave background (CMB) polarimetry has the potential to provide revolutionary advances in cosmology. Future experiments to detect the very weak B mode signal in CMB polarization maps will require unprecedented sensitivity and control of systematic errors. Bolometric interferometry may provide a way to achieve these goals. In a bolometric interferometer (or other adding interferometer), phase shift sequences are applied to the inputs in order to recover the visibilities. Noise is minimized when the phase shift sequences corresponding to all visibilities are orthogonal. We present a systematic method for finding sequences that produce this orthogonality, approximately minimizing both the length of the time sequence and the number of discrete phase shift values required. When some baselines are geometrically equivalent, we can choose sequences that read out those baselines simultaneously, which has been shown to improve signal to noise ratio.

  17. Value of a newly sequenced bacterial genome

    Barbosa, Eudes; Aburjaile, Flavia F; Ramos, Rommel Tj;

    2014-01-01

    Next-generation sequencing (NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also...... heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting...... in an exponential increase in draft (partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome...

  18. SOME GEOMETRIC PROPERTIES OF A NEW DIFFERENCE SEQUENCE SPACE INVOLVING LACUNARY SEQUENCES

    Murat KARAKAŞ; Mikail ET; Vatan KARAKAYA

    2013-01-01

    In this paper, we define a new generalized difference sequence space involving lacunary sequence. Then, we examine k-NUC property and property (β) for this space and also show that it is not rotund where p=(pr) is a bounded sequence of positive real numbers with pr ≥1 for all r∈N.

  19. Targeted next-generation sequencing can replace Sanger sequencing in clinical diagnostics

    Sikkema-Raddatz, B.; Johansson, L.F.; de Boer, E.N.; Almomani, R.; Boven, L.G.; van den Berg, M.P.; van Spaendonck-Zwarts, K.Y.; van Tintelen, J.P.; Sijmons, R.H.; Jongbloed, J.D.H.; Sinke, R.J.

    2013-01-01

    Mutation detection through exome sequencing allows simultaneous analysis of all coding sequences of genes. However, it cannot yet replace Sanger sequencing (SS) in diagnostics because of incomplete representation and coverage of exons leading to missing clinically relevant mutations. Targeted next-g

  20. Sequencing and Analysis of a Genomic Fragment Provide an Insight into the Dunaliella viridis Genomic Sequence

    Xiao-Ming SUN; Yuan-Ping TANG; Xiang-Zong MENG; Wen-Wen ZHANG; Shan LI; Zhi-Rui DENG; Zheng-Kai XU; Ren-Tao SONG

    2006-01-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)n type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features.

  1. Sequencing and analysis of a genomic fragment provide an insight into the Dunaliella viridis genomic sequence.

    Sun, Xiao-Ming; Tang, Yuan-Ping; Meng, Xiang-Zong; Zhang, Wen-Wen; Li, Shan; Deng, Zhi-Rui; Xu, Zheng-Kai; Song, Ren-Tao

    2006-11-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)(n) type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features. PMID:17091199

  2. Draft Genome Sequence of Neisseria gonorrhoeae Sequence Type 1407, a Multidrug-Resistant Clinical Isolate.

    Anselmo, A; Ciammaruconi, A; Carannante, A; Neri, A; Fazio, C; Fortunato, A; Palozzi, A M; Vacca, P; Fillo, S; Lista, F; Stefanelli, P

    2015-01-01

    Gonorrhea may become untreatable due to the spread of resistant or multidrug-resistant strains. Cefixime-resistant gonococci belonging to sequence type 1407 have been described worldwide. We report the genome sequence of Neisseria gonorrhoeae strain G2891, a multidrug-resistant isolate of sequence type 1407, collected in Italy in 2013. PMID:26272575

  3. Comparison of two Next Generation sequencing platforms for full genome sequencing of Classical Swine Fever Virus

    Fahnøe, Ulrik; Pedersen, Anders Gorm; Höper, Dirk;

    2013-01-01

    Next Generation Sequencing (NGS) is becoming more adopted into viral research and will be the preferred technology in the years to come. We have recently sequenced several strains of Classical Swine Fever Virus (CSFV) by NGS on both Genome Sequencer FLX (GS FLX) and Iontorrent PGM platforms. In...

  4. Whole-Genome Shotgun Sequencing of a Colonizing Multilocus Sequence Type 17 Streptococcus agalactiae Strain

    Singh, Pallavi; Springman, A. Cody; Davies, H. Dele

    2012-01-01

    This report highlights the whole-genome shotgun draft sequence for a Streptococcus agalactiae strain representing multilocus sequence type (ST) 17, isolated from a colonized woman at 8 weeks postpartum. This sequence represents an important addition to the published genomes and will promote comparative genomic studies of S. agalactiae recovered from diverse sources. PMID:23045509

  5. Whole-Genome Shotgun Sequencing of a Colonizing Multilocus Sequence Type 17 Streptococcus agalactiae Strain

    Singh, Pallavi; Springman, A. Cody; Davies, H Dele; Manning, Shannon D.

    2012-01-01

    This report highlights the whole-genome shotgun draft sequence for a Streptococcus agalactiae strain representing multilocus sequence type (ST) 17, isolated from a colonized woman at 8 weeks postpartum. This sequence represents an important addition to the published genomes and will promote comparative genomic studies of S. agalactiae recovered from diverse sources.

  6. Reading biological processes from nucleotide sequences

    Murugan, Anand

    Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical

  7. Comparison of metagenomic samples using sequence signatures

    Jiang Bai

    2012-12-01

    Full Text Available Abstract Background Sequence signatures, as defined by the frequencies of k-tuples (or k-mers, k-grams, have been used extensively to compare genomic sequences of individual organisms, to identify cis-regulatory modules, and to study the evolution of regulatory sequences. Recently many next-generation sequencing (NGS read data sets of metagenomic samples from a variety of different environments have been generated. The assembly of these reads can be difficult and analysis methods based on mapping reads to genes or pathways are also restricted by the availability and completeness of existing databases. Sequence-signature-based methods, however, do not need the complete genomes or existing databases and thus, can potentially be very useful for the comparison of metagenomic samples using NGS read data. Still, the applications of sequence signature methods for the comparison of metagenomic samples have not been well studied. Results We studied several dissimilarity measures, including d2, d2* and d2S recently developed from our group, a measure (hereinafter noted as Hao used in CVTree developed from Hao’s group (Qi et al., 2004, measures based on relative di-, tri-, and tetra-nucleotide frequencies as in Willner et al. (2009, as well as standard lp measures between the frequency vectors, for the comparison of metagenomic samples using sequence signatures. We compared their performance using a series of extensive simulations and three real next-generation sequencing (NGS metagenomic datasets: 39 fecal samples from 33 mammalian host species, 56 marine samples across the world, and 13 fecal samples from human individuals. Results showed that the dissimilarity measure d2S can achieve superior performance when comparing metagenomic samples by clustering them into different groups as well as recovering environmental gradients affecting microbial samples. New insights into the environmental factors affecting microbial compositions in metagenomic samples

  8. An Integrated Sequence-Structure Database incorporating matching mRNA sequence, amino acid sequence and protein three-dimensional structure data.

    Adzhubei, I A; Adzhubei, A. A.; Neidle, S.

    1998-01-01

    We have constructed a non-homologous database, termed the Integrated Sequence-Structure Database (ISSD) which comprises the coding sequences of genes, amino acid sequences of the corresponding proteins, their secondary structure and straight phi,psi angles assignments, and polypeptide backbone coordinates. Each protein entry in the database holds the alignment of nucleotide sequence, amino acid sequence and the PDB three-dimensional structure data. The nucleotide and amino acid sequences for ...

  9. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV)

    Martin, A. C. R.

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and ’dotifying’ repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI li...

  10. Large Zero Autocorrelation Zone of Golay Sequences and $4^q$-QAM Golay Complementary Sequences

    Gong, Guang; Yang, Yang

    2011-01-01

    Sequences with good correlation properties have been widely adopted in modern communications, radar and sonar applications. In this paper, we present our new findings on some constructions of single $H$-ary Golay sequence and $4^q$-QAM Golay complementary sequence with a large zero autocorrelation zone, where $H\\ge 2$ is an arbitrary even integer and $q\\ge 2$ is an arbitrary integer. Those new results on Golay sequences and QAM Golay complementary sequences can be explored during synchronization and detection at the receiver end and thus improve the performance of the communication system.

  11. Value of a newly sequenced bacterial genome.

    Barbosa, Eudes Gv; Aburjaile, Flavia F; Ramos, Rommel Tj; Carneiro, Adriana R; Le Loir, Yves; Baumbach, Jan; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco

    2014-05-26

    Next-generation sequencing (NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft (partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information. PMID:24921006

  12. PHASE TRANSITION IN SEQUENCE UNIQUE RECONSTRUCTION

    Li XIA; Chan ZHOU

    2007-01-01

    In this paper,sequence unique reconstruction refers to the property that a sequence is uniquely reconstructable from all its K-tuples.We propose and study the phase transition behavior of the probability P(K)of unique reconstruction with regard to tuple size K in random sequences (iid model).Based on Monte Carlo experiments,artificial proteins generated from iid model exhibit a phase transition when P(K)abruptly jumps from a low value phase(e.g.<0.1)to a high value phase (e.g.>0.9).With a generalization to any alphabet,we prove that for a random sequence of length L,as L is large enough,P(K)undergoes a sharp phase transition when P≤0.1015 where p=P(two random letters match).Besides,formulas are derived to estimate the transition points,which may be of practical use in sequencing DNA by hybridization.Concluded from our study,most proteins do not deviate greatly from random sequences in the sense of sequence unique reconstruction,while there are some "stubborn" proteins which only become uniquely reconstructable at a very large K and probably have biological implications.

  13. Gelada vocal sequences follow Menzerath's linguistic law.

    Gustison, Morgan L; Semple, Stuart; Ferrer-I-Cancho, Ramon; Bergman, Thore J

    2016-05-10

    Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath's law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath's law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath's law reflects compression-the principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language. PMID:27091968

  14. Fungal genome sequencing: basic biology to biotechnology.

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research. PMID:25721271

  15. Enhanced Dynamic Algorithm of Genome Sequence Alignments

    Arabi E. keshk

    2014-05-01

    Full Text Available The merging of biology and computer science has created a new field called computational biology that explore the capacities of computers to gain knowledge from biological data, bioinformatics. Computational biology is rooted in life sciences as well as computers, information sciences, and technologies. The main problem in computational biology is sequence alignment that is a way of arranging the sequences of DNA, RNA or protein to identify the region of similarity and relationship between sequences. This paper introduces an enhancement of dynamic algorithm of genome sequence alignment, which called EDAGSA. It is filling the three main diagonals without filling the entire matrix by the unused data. It gets the optimal solution with decreasing the execution time and therefore the performance is increased. To illustrate the effectiveness of optimizing the performance of the proposed algorithm, it is compared with the traditional methods such as Needleman-Wunsch, Smith-Waterman and longest common subsequence algorithms. Also, database is implemented for using the algorithm in multi-sequence alignments for searching the optimal sequence that matches the given sequence.

  16. Analysis of heterogeneous boron dilution sequences

    In the scope of the international SETH project (focused on boron dilution sequences), the Spanish Nuclear Regulatory Commission (CSN) and the electric energy industry of Spain (UNESA) have promoted in Spain a national project for the analysis and application of the SETH results to the Spanish nuclear power plants. As part of this project, our team has performed a review and analysis of the different sequences that could lead to a boron dilution in the primary circuit of a pressurized water reactor (PWR). On a first stage of the project we have analyzed the different sequences and the phenomenologies that could lead to inadvertent boron dilution in the primary system (about twenty different sequences are described in the literature), the core damage frequency of each one, the projects and experiments carried out on several experimental facilities and the modifications performed in order to avoid or to mitigate this kind of sequences. On a second one we have reviewed the relation between the operating procedures, Westinghouse design reactors, and this kind of sequences. Finally we have analyzed the simulation problems of these kind of sequences and performed several numerical simulations with the TRAC-M (TRACE) code applied to numerical benchmarks and also to a 3D vessel model. (author)

  17. Exploration of noncoding sequences in metagenomes.

    Fabián Tobar-Tosse

    Full Text Available Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C content, Codon Usage (Cd, Trinucleotide Usage (Tn, and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.

  18. CATEGORIZATION OF EVENT SEQUENCES FOR LICENSE APPLICATION

    G.E. Ragan; P. Mecheret; D. Dexheimer

    2005-04-14

    The purposes of this analysis are: (1) Categorize (as Category 1, Category 2, or Beyond Category 2) internal event sequences that may occur before permanent closure of the repository at Yucca Mountain. (2) Categorize external event sequences that may occur before permanent closure of the repository at Yucca Mountain. This includes examining DBGM-1 seismic classifications and upgrading to DBGM-2, if appropriate, to ensure Beyond Category 2 categorization. (3) State the design and operational requirements that are invoked to make the categorization assignments valid. (4) Indicate the amount of material put at risk by Category 1 and Category 2 event sequences. (5) Estimate frequencies of Category 1 event sequences at the maximum capacity and receipt rate of the repository. (6) Distinguish occurrences associated with normal operations from event sequences. It is beyond the scope of the analysis to propose design requirements that may be required to control radiological exposure associated with normal operations. (7) Provide a convenient compilation of the results of the analysis in tabular form. The results of this analysis are used as inputs to the consequence analyses in an iterative design process that is depicted in Figure 1. Categorization of event sequences for permanent retrieval of waste from the repository is beyond the scope of this analysis. Cleanup activities that take place after an event sequence and other responses to abnormal events are also beyond the scope of the analysis.

  19. Value of a newly sequenced bacterial genome

    Eudes; GV; Barbosa; Flavia; F; Aburjaile; Rommel; TJ; Ramos; Adriana; R; Carneiro; Yves; Le; Loir; Jan; Baumbach; Anderson; Miyoshi; Artur; Silva; Vasco; Azevedo

    2014-01-01

    Next-generation sequencing(NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft(partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information.

  20. Effects of the Ion PGM™ Hi-Q™ sequencing chemistry on sequence data quality.

    Churchill, Jennifer D; King, Jonathan L; Chakraborty, Ranajit; Budowle, Bruce

    2016-09-01

    Massively parallel sequencing (MPS) offers substantial improvements over current forensic DNA typing methodologies such as increased resolution, scalability, and throughput. The Ion PGM™ is a promising MPS platform for analysis of forensic biological evidence. The system employs a sequencing-by-synthesis chemistry on a semiconductor chip that measures a pH change due to the release of hydrogen ions as nucleotides are incorporated into the growing DNA strands. However, implementation of MPS into forensic laboratories requires a robust chemistry. Ion Torrent's Hi-Q™ Sequencing Chemistry was evaluated to determine if it could improve on the quality of the generated sequence data in association with selected genetic marker targets. The whole mitochondrial genome and the HID-Ion STR 10-plex panel were sequenced on the Ion PGM™ system with the Ion PGM™ Sequencing 400 Kit and the Ion PGM™ Hi-Q™ Sequencing Kit. Concordance, coverage, strand balance, noise, and deletion ratios were assessed in evaluating the performance of the Ion PGM™ Hi-Q™ Sequencing Kit. The results indicate that reliable, accurate data are generated and that sequencing through homopolymeric regions can be improved with the use of Ion Torrent's Hi-Q™ Sequencing Chemistry. Overall, the quality of the generated sequencing data supports the potential for use of the Ion PGM™ in forensic genetic laboratories. PMID:27025714

  1. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  2. Quantitative texton sequences for legible bivariate maps.

    Ware, Colin

    2009-01-01

    Representing bivariate scalar maps is a common but difficult visualization problem. One solution has been to use two dimensional color schemes, but the results are often hard to interpret and inaccurately read. An alternative is to use a color sequence for one variable and a texture sequence for another. This has been used, for example, in geology, but much less studied than the two dimensional color scheme, although theory suggests that it should lead to easier perceptual separation of information relating to the two variables. To make a texture sequence more clearly readable the concept of the quantitative texton sequence (QTonS) is introduced. A QTonS is defined a sequence of small graphical elements, called textons, where each texton represents a different numerical value and sets of textons can be densely displayed to produce visually differentiable textures. An experiment was carried out to compare two bivariate color coding schemes with two schemes using QTonS for one bivariate map component and a color sequence for the other. Two different key designs were investigated (a key being a sequence of colors or textures used in obtaining quantitative values from a map). The first design used two separate keys, one for each dimension, in order to measure how accurately subjects could independently estimate the underlying scalar variables. The second key design was two dimensional and intended to measure the overall integral accuracy that could be obtained. The results show that the accuracy is substantially higher for the QTonS/color sequence schemes. A hypothesis that texture/color sequence combinations are better for independent judgments of mapped quantities was supported. A second experiment probed the limits of spatial resolution for QTonSs. PMID:19834229

  3. Robot Sequencing and Visualization Program (RSVP)

    Cooper, Brian K.; Maxwell,Scott A.; Hartman, Frank R.; Wright, John R.; Yen, Jeng; Toole, Nicholas T.; Gorjian, Zareh; Morrison, Jack C

    2013-01-01

    The Robot Sequencing and Visualization Program (RSVP) is being used in the Mars Science Laboratory (MSL) mission for downlink data visualization and command sequence generation. RSVP reads and writes downlink data products from the operations data server (ODS) and writes uplink data products to the ODS. The primary users of RSVP are members of the Rover Planner team (part of the Integrated Planning and Execution Team (IPE)), who use it to perform traversability/articulation analyses, take activity plan input from the Science and Mission Planning teams, and create a set of rover sequences to be sent to the rover every sol. The primary inputs to RSVP are downlink data products and activity plans in the ODS database. The primary outputs are command sequences to be placed in the ODS for further processing prior to uplink to each rover. RSVP is composed of two main subsystems. The first, called the Robot Sequence Editor (RoSE), understands the MSL activity and command dictionaries and takes care of converting incoming activity level inputs into command sequences. The Rover Planners use the RoSE component of RSVP to put together command sequences and to view and manage command level resources like time, power, temperature, etc. (via a transparent realtime connection to SEQGEN). The second component of RSVP is called HyperDrive, a set of high-fidelity computer graphics displays of the Martian surface in 3D and in stereo. The Rover Planners can explore the environment around the rover, create commands related to motion of all kinds, and see the simulated result of those commands via its underlying tight coupling with flight navigation, motor, and arm software. This software is the evolutionary replacement for the Rover Sequencing and Visualization software used to create command sequences (and visualize the Martian surface) for the Mars Exploration Rover mission.

  4. Pittosporum cryptic virus 1: genome sequence completion using next-generation sequencing.

    Elbeaino, Toufic; Kubaa, Raied Abou; Tuzlali, Hasan Tuna; Digiaro, Michele

    2016-07-01

    Next-generation sequencing (NGS) was applied to dsRNAs extracted from an Italian pittosporum plant infected with pittosporum cryptic virus 1 (PiCV1). NGS allowed assembly of the full genome sequence of PiCV1, comprising dsRNA1 (1.9 kbp) and dsRNA2 (1.5 kbp), which encode the RNA-dependent RNA polymerase and capsid protein genes, respectively. Phylogenetic and sequence analyses confirmed that PiCV1 is a new member of the genus Deltapartitivirus, family Partiviridae. From the same plant, NSG also permitted assembly of the complete genome sequence of eggplant mottled dwarf virus (EMDV), which shared 86 % to 98 % nucleotide sequence identity with complete and partial sequences (ca 6750 nt) of other known EMDV isolates with sequences available in the GenBank database. PMID:27087112

  5. Engineering RNA sequence specificity of Pumilio repeats

    Cheong, Cheom-Gil; Hall, Traci M. Tanaka

    2006-01-01

    Puf proteins bind RNA sequence specifically and regulate translation and stability of target mRNAs. A “code” for RNA recognition has been deduced from crystal structures of the Puf protein, human Pumilio1, where each of eight repeats binds an RNA base via a combination of three side chains at conserved positions. Here, we report the creation of seven soluble mutant proteins with predictably altered sequence specificity, including one that binds tightly to adenosine-uracil-rich element RNA. These data show that Pumilio1 can be used as a scaffold to engineer RNA-binding proteins with designed sequence specificity. PMID:16954190

  6. Generation of Goldbach sequences in quantum mechanics

    Prudencio, Thiago

    2013-01-01

    This Letter reports for the first time the problem of generating Goldbach sequences in terms of Fock states and examine their equivalence with the corresponding Goldbach conjecture terms. We show that the expectation values of number operators in Fock states cannot generate these sequences without ambiguity due to normalization of superpositions of Fock states. Instead, we show that a complete correspondence between the Goldbach conjecture terms and the Goldbach sequences generated by Fock states can be achieved by means of projections involving number operator and superpositions of Fock states. Our proposal appears as a new alternative way to interpret measurements in fundamental quantum mechanics.

  7. How Long is an Aftershock Sequence?

    Godano, Cataldo; Tramelli, Anna

    2016-07-01

    The occurrence of a mainschok is always followed by aftershocks spatially distributed within the fault area. The aftershocks rate decay with time is described by the empirical Omori law which was inferred by catalogues analysis. The sequences discrimination within catalogues is not a straightforward operation, especially for low-magnitude mainshocks. Here, we describe the rate decay of the Omori law obtained using different sequence discrimination tools and we discover that, when the background seismicity is excluded, the sequences tend to last for the temporal extension of the catalogue.

  8. Large margin filtering for signal sequence labeling

    Flamary, Rémi; Rakotomamonjy, Alain

    2011-01-01

    Signal Sequence Labeling consists in predicting a sequence of labels given an observed sequence of samples. A naive way is to filter the signal in order to reduce the noise and to apply a classification algorithm on the filtered samples. We propose in this paper to jointly learn the filter with the classifier leading to a large margin filtering for classification. This method allows to learn the optimal cutoff frequency and phase of the filter that may be different from zero. Two methods are proposed and tested on a toy dataset and on a real life BCI dataset from BCI Competition III.

  9. Cloned endogenous retroviral sequences from human DNA.

    Bonner, T I; O'Connell, C; Cohen, M.

    1982-01-01

    We have screened a human DNA library using as probe a chimpanzee sequence that contains homology to the polymerase gene of the endogenous baboon virus. One set of overlapping clones spans about 20 kilobases and contains regions of DNA sequence homology to the gag p30, gag p15, and polymerase genes of Moloney murine leukemia virus. Furthermore, the spacings are the same as in Moloney virus between these sequences and a 480-nucleotide region that has the structural characteristics of a 3' copy ...

  10. How Long is an Aftershock Sequence?

    Godano, Cataldo; Tramelli, Anna

    2016-06-01

    The occurrence of a mainschok is always followed by aftershocks spatially distributed within the fault area. The aftershocks rate decay with time is described by the empirical Omori law which was inferred by catalogues analysis. The sequences discrimination within catalogues is not a straightforward operation, especially for low-magnitude mainshocks. Here, we describe the rate decay of the Omori law obtained using different sequence discrimination tools and we discover that, when the background seismicity is excluded, the sequences tend to last for the temporal extension of the catalogue.

  11. Scale-PC shielding analysis sequences

    The SCALE computational system is a modular code system for analyses of nuclear fuel facility and package designs. With the release of SCALE-PC Version 4.3, the radiation shielding analysis community now has the capability to execute the SCALE shielding analysis sequences contained in the control modules SAS1, SAS2, SAS3, and SAS4 on a MS- DOS personal computer (PC). In addition, SCALE-PC includes two new sequences, QADS and ORIGEN-ARP. The capabilities of each sequence are presented, along with example applications

  12. Sequence analysis of the AAA protein family.

    Beyer, A.

    1997-01-01

    The AAA protein family, a recently recognized group of Walker-type ATPases, has been subjected to an extensive sequence analysis. Multiple sequence alignments revealed the existence of a region of sequence similarity, the so-called AAA cassette. The borders of this cassette were localized and within it, three boxes of a high degree of conservation were identified. Two of these boxes could be assigned to substantial parts of the ATP binding site (namely, to Walker motifs A and B); the third ma...

  13. Iterative Method for Generating Correlated Binary Sequences

    Usatenko, O V; Apostolov, S S; Makarov, N M; Krokhin, A A

    2014-01-01

    We propose a new efficient iterative method for generating random correlated binary sequences with prescribed correlation function. The method is based on consecutive linear modulations of initially uncorrelated sequence into a correlated one. Each step of modulation increases the correlations until the desired level has been reached. Robustness and efficiency for the proposed algorithm are tested by generating sequences with inverse power-law correlations. The substantial increase in the strength of correlation in the iterative method with respect to the single-step filtering generation is shown for all studied correlation functions. Our results can be used for design of disordered superlattices, waveguides, and surfaces with selective transport properties.

  14. Initial retrieval sequence and blending strategy

    Pemwell, D.L.; Grenard, C.E.

    1996-09-01

    This report documents the initial retrieval sequence and the methodology used to select it. Waste retrieval, storage, pretreatment and vitrification were modeled for candidate single-shell tank retrieval sequences. Performance of the sequences was measured by a set of metrics (for example,high-level waste glass volume, relative risk and schedule).Computer models were used to evaluate estimated glass volumes,process rates, retrieval dates, and blending strategy effects.The models were based on estimates of component inventories and concentrations, sludge wash factors and timing, retrieval annex limitations, etc.

  15. Deep sequencing increases hepatitis C virus phylogenetic cluster detection compared to Sanger sequencing.

    Montoya, Vincent; Olmstead, Andrea; Tang, Patrick; Cook, Darrel; Janjua, Naveed; Grebely, Jason; Jacka, Brendan; Poon, Art F Y; Krajden, Mel

    2016-09-01

    Effective surveillance and treatment strategies are required to control the hepatitis C virus (HCV) epidemic. Phylogenetic analyses are powerful tools for reconstructing the evolutionary history of viral outbreaks and identifying transmission clusters. These studies often rely on Sanger sequencing which typically generates a single consensus sequence for each infected individual. For rapidly mutating viruses such as HCV, consensus sequencing underestimates the complexity of the viral quasispecies population and could therefore generate different phylogenetic tree topologies. Although deep sequencing provides a more detailed quasispecies characterization, in-depth phylogenetic analyses are challenging due to dataset complexity and computational limitations. Here, we apply deep sequencing to a characterized population to assess its ability to identify phylogenetic clusters compared with consensus Sanger sequencing. For deep sequencing, a sample specific threshold determined by the 50th percentile of the patristic distance distribution for all variants within each individual was used to identify clusters. Among seven patristic distance thresholds tested for the Sanger sequence phylogeny ranging from 0.005-0.06, a threshold of 0.03 was found to provide the maximum balance between positive agreement (samples in a cluster) and negative agreement (samples not in a cluster) relative to the deep sequencing dataset. From 77 HCV seroconverters, 10 individuals were identified in phylogenetic clusters using both methods. Deep sequencing analysis identified an additional 4 individuals and excluded 8 other individuals relative to Sanger sequencing. The application of this deep sequencing approach could be a more effective tool to understand onward HCV transmission dynamics compared with Sanger sequencing, since the incorporation of minority sequence variants improves the discrimination of phylogenetically linked clusters. PMID:27282472

  16. Papaya (Carica papaya) lysozyme is a member of the family 19 (Basic, class II) chitinases

    Subroto, T; Sufiati, S; Beintema, JJ

    1999-01-01

    The most comprehensive studies on a plant lysozyme (EC 3.2.1.17) are those on the enzyme from papaya (Carica papaya) latex, published in 1967 and 1969. However, the N-terminal amino acid sequence of five amino acid sequence of this enzyme, determined by manual Edman degradation, did not allow assign

  17. A Clinician's perspective on clinical exome sequencing.

    O'Donnell-Luria, Anne H; Miller, David T

    2016-06-01

    Clinical exome sequencing has clearly improved our ability as clinicians to identify the cause of a wide variety of disorders. Prior to exome sequencing, a majority of patients with apparent syndromes never received a specific molecular genetic diagnosis despite extensive diagnostic odysseys. Even for those receiving an answer to the question of what caused their disorder, the diagnostic odyssey often spanned years to decades. Determining the particular genetic cause in an individual patient can be challenging due to inherent phenotypic and genetic heterogeneity of disease, technical limitations of testing or both. Blended phenotypes, due to multiple monogenic disorders in the same patient, are true dilemmas for traditional genetic evaluations, but are increasingly being diagnosed through clinical exome sequencing. New sequencing technologies have increased the proportion of patients receiving molecular diagnoses, while significantly shortening the time scale, providing multiple benefits for the health-care team, patient and family. PMID:27126233

  18. Long range correlations in DNA sequences

    Mohanty, A K

    2002-01-01

    The so called long range correlation properties of DNA sequences are studied using the variance analyses of the density distribution of a single or a group of nucleotides in a model independent way. This new method which was suggested earlier has been applied to extract slope parameters that characterize the correlation properties for several intron containing and intron less DNA sequences. An important aspect of all the DNA sequences is the properties of complimentarity by virtue of which any two complimentary distributions (like GA is complimentary to TC or G is complimentary to ATC) have identical fluctuations at all scales although their distribution functions need not be identical. Due to this complimentarity, the famous DNA walk representation whose statistical interpretation is still unresolved is shown to be a special case of the present formalism with a density distribution corresponding to a purine or a pyrimidine group. Another interesting aspect of most of the DNA sequences is that the factorial m...

  19. Special Issue: Next Generation DNA Sequencing

    Paul Richardson

    2010-10-01

    Full Text Available Next Generation Sequencing (NGS refers to technologies that do not rely on traditional dideoxy-nucleotide (Sanger sequencing where labeled DNA fragments are physically resolved by electrophoresis. These new technologies rely on different strategies, but essentially all of them make use of real-time data collection of a base level incorporation event across a massive number of reactions (on the order of millions versus 96 for capillary electrophoresis for instance. The major commercial NGS platforms available to researchers are the 454 Genome Sequencer (Roche, Illumina (formerly Solexa Genome analyzer, the SOLiD system (Applied Biosystems/Life Technologies and the Heliscope (Helicos Corporation. The techniques and different strategies utilized by these platforms are reviewed in a number of the papers in this special issue. These technologies are enabling new applications that take advantage of the massive data produced by this next generation of sequencing instruments. [...

  20. A spectral sequence for parallelized persistence

    Lipsky, David; Vejdemo-Johansson, Mikael

    2011-01-01

    We approach the problem of the computation of persistent homology for large datasets by a divide-and-conquer strategy. Dividing the total space into separate but overlapping components, we are able to limit the total memory residency for any part of the computation, while not degrading the overall complexity much. Locally computed persistence information is then merged from the components and their intersections using a spectral sequence generalizing the Mayer-Vietoris long exact sequence. We describe the Mayer-Vietoris spectral sequence and give details on how to compute with it. This allows us to merge local homological data into the global persistent homology. Furthermore, we detail how the classical topology constructions inherent in the spectral sequence adapt to a persistence perspective, as well as describe the techniques from computational commutative algebra necessary for this extension. The resulting computational scheme suggests a parallelization scheme, and we discuss the communication steps invol...

  1. Generating target probability sequences and events

    Ella, Vaignana Spoorthy

    2013-01-01

    Cryptography and simulation of systems require that events of pre-defined probability be generated. This paper presents methods to generate target probability events based on the oblivious transfer protocol and target probabilistic sequences using probability distribution functions.

  2. Network of tRNA Gene Sequences

    WEI Fang-ping; LI Sheng; MA Hong-ru

    2008-01-01

    A network of 3719 tRNA gene sequences was constructed using simplest alignment. Its topology, degree distribution and clustering coefficient were studied. The behaviors of the network shift from fluctuated distribution to scale-free distribution when the similarity degree of the tRNA gene sequences increases. The tRNA gene sequences with the same anticodon identity are more self-organized than those with different anticodon identities and form local clusters in the network. Some vertices of the local cluster have a high connection with other local clusters, and the probable reason was given. Moreover, a network constructed by the same number of random tRNA sequences was used to make comparisons. The relationships between the properties of the tRNA similarity network and the characters of tRNA evolutionary history were discussed.

  3. I and I* convergent function sequences

    Gezer, F.; Karakuş, S.

    2005-01-01

    In this paper, we introduce the concepts of I-pointwise convergence, I-uniform convergence,I*-pointwise convergence and I*-uniform convergence of function sequences and then we examine the relation between them.

  4. Fibonacci Sequence and Supramolecular Structure of DNA.

    Shabalkin, I P; Grigor'eva, E Yu; Gudkova, M V; Shabalkin, P I

    2016-05-01

    We proposed a new model of supramolecular DNA structure. Similar to the previously developed by us model of primary DNA structure [11-15], 3D structure of DNA molecule is assembled in accordance to a mathematic rule known as Fibonacci sequence. Unlike primary DNA structure, supramolecular 3D structure is assembled from complex moieties including a regular tetrahedron and a regular octahedron consisting of monomers, elements of the primary DNA structure. The moieties of the supramolecular DNA structure forming fragments of regular spatial lattice are bound via linker (joint) sequences of the DNA chain. The lattice perceives and transmits information signals over a considerable distance without acoustic aberrations. Linker sequences expand conformational space between lattice segments allowing their sliding relative to each other under the action of external forces. In this case, sliding is provided by stretching of the stacked linker sequences. PMID:27265133

  5. "X"-tending the Fibonacci Sequence.

    Moran, Glenn T.

    2002-01-01

    Outlines a lesson on the Fibonacci and Lucas sequences that captures student interest by presenting the opportunity for computation practice, mental mathematics, and proof for algebra students. Discusses an extension for solving simultaneous equations. (YDS)

  6. New stopping criteria for segmenting DNA sequences

    Li, W

    2001-01-01

    We propose a solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns. This new stopping criterion is based on Bayesian Information Criterion (BIC) in the model selection framework. When this stopping criterion is applied to a left telomere sequence of yeast Saccharomyces cerevisiae and the complete genome sequence of bacterium Escherichia coli, borders of biologically meaningful units were identified (e.g. subtelomeric units, replication origin, and replication terminus), and a more reasonable number of domains was obtained. We also introduce a measure called segmentation strength which can be used to control the delineation of large domains. The relationship between the average domain size and the threshold of segmentation strength is determined for several genome sequences.

  7. Using mobile sequencers in an academic classroom.

    Zaaijer, Sophie; Erlich, Yaniv

    2016-01-01

    The advent of mobile DNA sequencers has made it possible to generate DNA sequencing data outside of laboratories and genome centers. Here, we report our experience of using the MinION, a mobile sequencer, in a 13-week academic course for undergraduate and graduate students. The course consisted of theoretical sessions that presented fundamental topics in genomics and several applied hackathon sessions. In these hackathons, the students used MinION sequencers to generate and analyze their own data and gain hands-on experience in the topics discussed in the theoretical classes. The manuscript describes the structure of our class, the educational material, and the lessons we learned in the process. We hope that the knowledge and material presented here will provide the community with useful tools to help educate future generations of genome scientists. PMID:27054412

  8. Supervised Sequence Labelling with Recurrent Neural Networks

    Graves, Alex

    2012-01-01

    Supervised sequence labelling is a vital area of machine learning, encompassing tasks such as speech, handwriting and gesture recognition, protein secondary structure prediction and part-of-speech tagging. Recurrent neural networks are powerful sequence learning tools—robust to input noise and distortion, able to exploit long-range contextual information—that would seem ideally suited to such problems. However their role in large-scale sequence labelling systems has so far been auxiliary.    The goal of this book is a complete framework for classifying and transcribing sequential data with recurrent neural networks only. Three main innovations are introduced in order to realise this goal. Firstly, the connectionist temporal classification output layer allows the framework to be trained with unsegmented target sequences, such as phoneme-level speech transcriptions; this is in contrast to previous connectionist approaches, which were dependent on error-prone prior segmentation. Secondly, multidimensional...

  9. ASAP: Amplification, sequencing & annotation of plastomes

    Folta Kevin M

    2005-12-01

    Full Text Available Abstract Background Availability of DNA sequence information is vital for pursuing structural, functional and comparative genomics studies in plastids. Traditionally, the first step in mining the valuable information within a chloroplast genome requires sequencing a chloroplast plasmid library or BAC clones. These activities involve complicated preparatory procedures like chloroplast DNA isolation or identification of the appropriate BAC clones to be sequenced. Rolling circle amplification (RCA is being used currently to amplify the chloroplast genome from purified chloroplast DNA and the resulting products are sheared and cloned prior to sequencing. Herein we present a universal high-throughput, rapid PCR-based technique to amplify, sequence and assemble plastid genome sequence from diverse species in a short time and at reasonable cost from total plant DNA, using the large inverted repeat region from strawberry and peach as proof of concept. The method exploits the highly conserved coding regions or intergenic regions of plastid genes. Using an informatics approach, chloroplast DNA sequence information from 5 available eudicot plastomes was aligned to identify the most conserved regions. Cognate primer pairs were then designed to generate ~1 – 1.2 kb overlapping amplicons from the inverted repeat region in 14 diverse genera. Results 100% coverage of the inverted repeat region was obtained from Arabidopsis, tobacco, orange, strawberry, peach, lettuce, tomato and Amaranthus. Over 80% coverage was obtained from distant species, including Ginkgo, loblolly pine and Equisetum. Sequence from the inverted repeat region of strawberry and peach plastome was obtained, annotated and analyzed. Additionally, a polymorphic region identified from gel electrophoresis was sequenced from tomato and Amaranthus. Sequence analysis revealed large deletions in these species relative to tobacco plastome thus exhibiting the utility of this method for structural and

  10. Visually lossless compression of digital hologram sequences

    Darakis E.; Kowiel M.; Nasanen R.; Naughton T.J.

    2010-01-01

    Digital hologram sequences have great potential for the recording of 3D scenes of moving macroscopic objects as their numerical reconstruction can yield a range of perspective views of the scene. Digital holograms inherently have large information content and lossless coding of holographic data is rather inefficient due to the speckled nature of the interference fringes they contain. Lossy coding of still holograms and hologram sequences has shown promising results. By definition,...

  11. Learning Interpretable SVMs for Biological Sequence Classification

    Sonnenburg Sören; Rätsch Gunnar; Schäfer Christin

    2006-01-01

    Abstract Background Support Vector Machines (SVMs) – using a variety of string kernels – have been successfully applied to biological sequence classification problems. While SVMs achieve high classification accuracy they lack interpretability. In many applications, it does not suffice that an algorithm just detects a biological signal in the sequence, but it should also provide means to interpret its solution in order to gain biological insight. Results We propose novel and efficient algorith...

  12. Optimization of a sequence of reactors

    Vidal, Rene Victor Valqui

    1991-01-01

    Concerns the optimal production of sulphuric acid in a sequence of reactors. Using a suitable approximation to the objective function, this problem can easily be solved using the maximum principle. A numerical example documents the applicability of the suggested approach......Concerns the optimal production of sulphuric acid in a sequence of reactors. Using a suitable approximation to the objective function, this problem can easily be solved using the maximum principle. A numerical example documents the applicability of the suggested approach...

  13. Next Generation Sequencing and Its Latest Application

    Shang Shanshan; You Jianxin

    2013-01-01

    The next generation sequencing, as an efficient and effective method, has been widely used in researches and help researches to get novel research results, so it is quite useful to summarize its characteristics and application. This study has a detailed review and introduction on the next Generation Sequencing (NGS) and makes a comparison on the NGS platforms, which can be very helpful for readers to understand the NGS. After introducing the NGS application, ...

  14. From Video Sequences to Motion Panoramas

    Bartoli, Adrien; Dalal, Navneet; Bose, Biswajit; Horaud, Radu

    2002-01-01

    We address the problem of constructing mosaics from video sequences taken by rotating cameras. In particular we investigate the widespread case where the scene is not only static but may also contain large dynamic areas, induced by moving or deforming objects. Most of the existing techniques fail to produce reliable results on such video sequences. For such alignment purposes, two classes of techniques may be used: feature-based and direct methods. We derive both of them in a unified statisti...

  15. Fluorescent signatures for variable DNA sequences

    Rice, John E; Arthur H. Reis; Rice, Lisa M.; Carver-Brown, Rachel K.; Wangh, Lawrence J.

    2012-01-01

    Life abounds with genetic variations writ in sequences that are often only a few hundred nucleotides long. Rapid detection of these variations for identification of genetic diseases, pathogens and organisms has become the mainstay of molecular science and medicine. This report describes a new, highly informative closed-tube polymerase chain reaction (PCR) strategy for analysis of both known and unknown sequence variations. It combines efficient quantitative amplification of single-stranded DN...

  16. Engineering RNA sequence specificity of Pumilio repeats

    Cheong, Cheom-Gil; Hall, Traci M. Tanaka

    2006-01-01

    Puf proteins bind RNA sequence specifically and regulate translation and stability of target mRNAs. A “code” for RNA recognition has been deduced from crystal structures of the Puf protein, human Pumilio1, where each of eight repeats binds an RNA base via a combination of three side chains at conserved positions. Here, we report the creation of seven soluble mutant proteins with predictably altered sequence specificity, including one that binds tightly to adenosine-uracil-rich element RNA. Th...

  17. Logarithm of Irrationals and Beatty Sequences

    E, Geremías Polanco

    2015-01-01

    In this paper we find an identity that gives a representation for the logarithm of any two irrational numbers $a, b >1$ in terms of a series whose terms are ratios of elements from the Beatty Sequences generated by these two numbers. We also show that Sturmian sequences can be defined in terms of these ratios. Furthermore, we find an identity for such series that bears a superficial resemblance to (a discrete version of) Frullani's Integral.

  18. Next generation sequencing in cardiovascular diseases

    Faita, Francesca; Vecoli, Cecilia; Foffa, Ilenia; Andreassi, Maria Grazia

    2012-01-01

    In the last few years, the advent of next generation sequencing (NGS) has revolutionized the approach to genetic studies, making whole-genome sequencing a possible way of obtaining global genomic information. NGS has very recently been shown to be successful in identifying novel causative mutations of rare or common Mendelian disorders. At the present time, it is expected that NGS will be increasingly important in the study of inherited and complex cardiovascular diseases (CVDs). However, the...

  19. Bernuau spline wavelets and Sturmian sequences

    Andrle, Miroslav; Burdik, Cestmir; Gazeau, Jean-Pierre

    2003-01-01

    A spline wavelets construction of class C^n(R) supported by sequences of aperiodic discretizations of R is presented. The construction is based on multiresolution analysis recently elaborated by G. Bernuau. At a given scale, we consider discretizations that are sets of left-hand ends of tiles in a self-similar tiling of the real line with finite local complexity. Corresponding tilings are determined by two-letter Sturmian substitution sequences. We illustrate the construction with examples ha...

  20. Sequencing antibody repertoires: The next generation

    Fischer, Nicolas

    2011-01-01

    Genomic studies have been revolutionized by the use of next generation sequencing (NGS), which delivers huge amounts of sequence information in a short span of time. The number of applications for NGS is rapidly expanding and significantly transforming many areas of life sciences. The field of antibody research and discovery is no exception. Several recent studies have harnessed the power of NGS for analyzing natural or synthetic immunoglobulin repertoires with unprecedented resolution and ex...

  1. Symbolic Representations in Motor Sequence Learning

    Bo, J.; Peltier, S.J.; Noll, D.C.; Seidler, R.D.

    2010-01-01

    It has been shown that varying the spatial versus symbolic nature of stimulus presentation and response production, which affects stimulus-response (S-R) mapping requirements, influences the magnitude of implicit sequence learning (Koch & Hoffman, 2000). Here, we evaluated how spatial and symbolic stimuli and responses affect the neural bases of sequence learning. We selectively eliminated the spatial component of stimulus presentation (spatial versus symbolic), response execution (manual ver...

  2. Probabilistic error correction for RNA sequencing

    Le, Hai-Son; Schulz, Marcel H.; McCauley, Brenna M.; Hinman, Veronica F.; Bar-Joseph, Ziv

    2013-01-01

    Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not available. Current read error correction methods, developed for DNA sequence data, cannot handle the overlapping effects of non-uniform abundance, polymorphisms and alternative splicing...

  3. Suicidal nucleotide sequences for DNA polymerization.

    Samadashwily, G M; Dayn, A; Mirkin, S M

    1993-01-01

    Studying the activity of T7 DNA polymerase (Sequenase) on open circular DNAs, we observed virtually complete termination within potential triplex-forming sequences. Mutations destroying the triplex potential of the sequences prevented termination, while compensatory mutations restoring triplex potential restored it. We hypothesize that strand displacement during DNA polymerization of double-helical templates brings three DNA strands (duplex DNA downstream of the polymerase plus a displaced ov...

  4. Comparison of Next-Generation Sequencing Systems

    Lin Liu; Yinhu Li; Siliang Li; Ni Hu; Yimin He; Ray Pong; Danni Lin; Lihua Lu; Maggie Law

    2012-01-01

    With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world’s ...

  5. A Sequencer for the LHC ERA

    Baggiolini, V; Gorbonosov, R; Khasbulatov, D; Lamont, M

    2009-01-01

    The Sequencer is a high level software application that helps operators and physicists to commission and control the LHC. It is an important operational tool for the LHC and a core part of the control system that interacts with all LHC sub-systems. This paper describes the architecture and design of the sequencer and illustrates some innovative parts of the implementation, based on modern Java technology.

  6. Chromatid interchanges at intrachromosomal telomeric DNA sequences

    Chinese hamster Don cells were exposed to X-rays, mitomycin C and teniposide (VM-26) to induce chromatid exchanges (quadriradials and triradials). After fluorescence in situ hybridization (FISH) of telomere sequences it was found that interstitial telomere-like DNA sequence arrays presented around five times more breakage-rearrangements than the genome overall. This high recombinogenic capacity was independent of the clastogen, suggesting that this susceptibility is not related to the initial mechanisms of DNA damage. (author)

  7. Enhanced Dynamic Algorithm of Genome Sequence Alignments

    Arabi E. keshk

    2014-01-01

    The merging of biology and computer science has created a new field called computational biology that explore the capacities of computers to gain knowledge from biological data, bioinformatics. Computational biology is rooted in life sciences as well as computers, information sciences, and technologies. The main problem in computational biology is sequence alignment that is a way of arranging the sequences of DNA, RNA or protein to identify the region of similarity and relationship between se...

  8. Finding Sequential Patterns from Large Sequence Data

    Fazekas Gabor; Mahdi Esmaeili (1971-) (informatikus)

    2010-01-01

    Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Sequential pattern mining finds sets of data items that occur together frequently in some sequences. Sequential pattern mining, which extracts frequent subsequences from a sequence database, has attracted a great deal of interest during the recent data mining research because it is ...

  9. A comparative evaluation of sequence classification programs

    Bazinet Adam L; Cummings Michael P

    2012-01-01

    Abstract Background A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics). Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classification, some methods screen sequence reads for ’barcoding genes’ like 16S rRNA, or various types of protein-co...

  10. SEQOPTICS: a protein sequence clustering system

    2006-01-01

    Background Protein sequence clustering has been widely used as a part of the analysis of protein structure and function. In most cases single linkage or graph-based clustering algorithms have been applied. OPTICS (Ordering Points To Identify the Clustering Structure) is an attractive approach due to its emphasis on visualization of results and support for interactive work, e.g., in choosing parameters. However, OPTICS has not been used, as far as we know, for protein sequence clustering. Resu...

  11. Coverage statistics for sequence census methods

    Evans Steven N; Hower Valerie; Pachter Lior

    2010-01-01

    Abstract Background We study the statistical properties of fragment coverage in genome sequencing experiments. In an extension of the classic Lander-Waterman model, we consider the effect of the length distribution of fragments. We also introduce a coding of the shape of the coverage depth function as a tree and explain how this can be used to detect regions with anomalous coverage. This modeling perspective is especially germane to current high-throughput sequencing experiments, where both s...

  12. Fibonacci difference sequence spaces for modulus functions

    Kuldip Raj

    2015-05-01

    Full Text Available In the present paper we introduce Fibonacci difference sequence spaces l(F, Ƒ, p, u and  l_∞(F, Ƒ, p, u by using a sequence of modulus functions and a new band matrix F. We also make an effort to study some inclusion relations, topological and geometric properties of these spaces. Furthermore, the alpha, beta, gamma duals and matrix transformation of the space l(F, Ƒ, p, u are determined.

  13. Proteomics-grade de novo sequencing approach.

    Savitski, Mikhail M; Nielsen, Michael L; Kjeldsen, Frank; Zubarev, Roman A

    2005-01-01

    The conventional approach in modern proteomics to identify proteins from limited information provided by molecular and fragment masses of their enzymatic degradation products carries an inherent risk of both false positive and false negative identifications. For reliable identification of even known proteins, complete de novo sequencing of their peptides is desired. The main problems of conventional sequencing based on tandem mass spectrometry are incomplete backbone fragmentation and the frequent overlap of fragment masses. In this work, the first proteomics-grade de novo approach is presented, where the above problems are alleviated by the use of complementary fragmentation techniques CAD and ECD. Implementation of a high-current, large-area dispenser cathode as a source of low-energy electrons provided efficient ECD of doubly charged peptides, the most abundant species (65-80%), in a typical trypsin-based proteomics experiment. A new linear de novo algorithm is developed combining efficiency and speed, processing on a conventional 3 GHz PC, 1000 MS/MS data sets in 60 s. More than 6% of all MS/MS data for doubly charged peptides yielded complete sequences, and another 13% gave nearly complete sequences with a maximum gap of two amino acid residues. These figures are comparable with the typical success rates (5-15%) of database identification. For peptides reliably found in the database (Mowse score > or = 34), the agreement with de novo-derived full sequences was >95%. Full sequences were derived in 67% of the cases when full sequence information was present in MS/MS spectra. Thus the new de novo sequencing approach reached the same level of efficiency and reliability as conventional database-identification strategies. PMID:16335984

  14. Calling SNPs without a reference sequence

    Schuster Stephan C; Hayes Vanessa M; Zhang Yu; Ratan Aakrosh; Miller Webb

    2010-01-01

    Abstract Background The most common application for the next-generation sequencing technologies is resequencing, where short reads from the genome of an individual are aligned to a reference genome sequence for the same species. These mappings can then be used to identify genetic differences among individuals in a population, and perhaps ultimately to explain phenotypic variation. Many algorithms capable of aligning short reads to the reference, and determining differences between them have b...

  15. Calling SNPs without a reference sequence

    Ratan, Aakrosh; Zhang, Yu; Hayes, Vanessa M.; Stephan C Schuster; Miller, Webb

    2010-01-01

    Background The most common application for the next-generation sequencing technologies is resequencing, where short reads from the genome of an individual are aligned to a reference genome sequence for the same species. These mappings can then be used to identify genetic differences among individuals in a population, and perhaps ultimately to explain phenotypic variation. Many algorithms capable of aligning short reads to the reference, and determining differences between them have been repor...

  16. Oral cortisol improves implicit sequence learning

    Sonja Roemer

    2009-01-01

    Medial temporal lobe structures are suggested to be essential for declarative memory (explicit memory for facts and events). Recent studies show that the medial temporal lobe (including the hippocampus) is also active in implicit sequence learning. Since glucocorticoids affect memory function via receptors in the hippocampus, the aim of the present study was to investigate implicit sequence learning after glucocorticoid administration. Oral cortisol (30 mg) was given to 29 healthy subjects,...

  17. Parallel motif extraction from very long sequences

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  18. Blazar sequence - an artefact of Doppler boosting

    Nieppola, E.; Valtaoja, E.; Tornikoski, M.; Hovatta, T.; Kotiranta, M.

    2008-01-01

    The blazar sequence is a scenario in which the bolometric luminosity of the blazar governs the appearance of its spectral energy distribution. The most prominent result is the significant negative correlation between the synchrotron peak frequencies and the synchrotron peak luminosities of the blazar population. Observational studies of the blazar sequence have, in general, neglected the effect of Doppler boosting. We study the dependence of both the synchrotron peak frequency and luminosity ...

  19. Monitoring genomic sequences during SELEX using high-throughput sequencing: neutral SELEX.

    Bob Zimmermann

    Full Text Available BACKGROUND: SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. METHODOLOGY/PRINCIPAL FINDINGS: To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. CONCLUSIONS/SIGNIFICANCE: Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

  20. The accuracy, feasibility and challenges of sequencing short tandem repeats using next-generation sequencing platforms.

    Monika Zavodna

    Full Text Available To date we have little knowledge of how accurate next-generation sequencing (NGS technologies are in sequencing repetitive sequences beyond known limitations to accurately sequence homopolymers. Only a handful of previous reports have evaluated the potential of NGS for sequencing short tandem repeats (microsatellites and no empirical study has compared and evaluated the performance of more than one NGS platform with the same dataset. Here we examined yeast microsatellite variants from both long-read (454-sequencing and short-read (Illumina NGS platforms and compared these to data derived through Sanger sequencing. In addition, we investigated any locus-specific biases and differences that might have resulted from variability in microsatellite repeat number, repeat motif or type of mutation. Out of 112 insertion/deletion variants identified among 45 microsatellite amplicons in our study, we found 87.5% agreement between the 454-platform and Sanger sequencing in frequency of variant detection after Benjamini-Hochberg correction for multiple tests. For a subset of 21 microsatellite amplicons derived from Illumina sequencing, the results of short-read platform were highly consistent with the other two platforms, with 100% agreement with 454-sequencing and 93.6% agreement with the Sanger method after Benjamini-Hochberg correction. We found that the microsatellite attributes copy number, repeat motif and type of mutation did not have a significant effect on differences seen between the sequencing platforms. We show that both long-read and short-read NGS platforms can be used to sequence short tandem repeats accurately, which makes it feasible to consider the use of these platforms in high-throughput genotyping. It appears the major requirement for achieving both high accuracy and rare variant detection in microsatellite genotyping is sufficient read depth coverage. This might be a challenge because each platform generates a consistent pattern of non

  1. Sequence determinants of human microsatellite variability

    Jakobsson Mattias

    2009-12-01

    Full Text Available Abstract Background Microsatellite loci are frequently used in genomic studies of DNA sequence repeats and in population studies of genetic variability. To investigate the effect of sequence properties of microsatellites on their level of variability we have analyzed genotypes at 627 microsatellite loci in 1,048 worldwide individuals from the HGDP-CEPH cell line panel together with the DNA sequences of these microsatellites in the human RefSeq database. Results Calibrating PCR fragment lengths in individual genotypes by using the RefSeq sequence enabled us to infer repeat number in the HGDP-CEPH dataset and to calculate the mean number of repeats (as opposed to the mean PCR fragment length, under the assumption that differences in PCR fragment length reflect differences in the numbers of repeats in the embedded repeat sequences. We find the mean and maximum numbers of repeats across individuals to be positively correlated with heterozygosity. The size and composition of the repeat unit of a microsatellite are also important factors in predicting heterozygosity, with tetra-nucleotide repeat units high in G/C content leading to higher heterozygosity. Finally, we find that microsatellites containing more separate sets of repeated motifs generally have higher heterozygosity. Conclusions These results suggest that sequence properties of microsatellites have a significant impact in determining the features of human microsatellite variability.

  2. Global patterns of sequence evolution in Drosophila

    Marín Ignacio

    2007-11-01

    Full Text Available Abstract Background Sequencing of the genomes of several Drosophila allows for the first precise analyses of how global sequence patterns change among multiple, closely related animal species. A basic question is whether there are characteristic features that differentiate chromosomes within a species or between different species. Results We explored the euchromatin of the chromosomes of seven Drosophila species to establish their global patterns of DNA sequence diversity. Between species, differences in the types and amounts of simple sequence repeats were found. Within each species, the autosomes have almost identical oligonucleotide profiles. However, X chromosomes and autosomes have, in all species, a qualitatively different composition. The X chromosomes are less complex than the autosomes, containing both a higher amount of simple DNA sequences and, in several cases, chromosome-specific repetitive sequences. Moreover, we show that the right arm of the X chromosome of Drosophila pseudoobscura, which evolved from an autosome 10 – 18 millions of years ago, has a composition which is identical to that of the original, left arm of the X chromosome. Conclusion The consistent differences among species, differences among X chromosomes and autosomes and the convergent evolution of X and neo-X chromosomes demonstrate that strong forces are acting on drosophilid genomes to generate peculiar chromosomal landscapes. We discuss the relationships of the patterns observed with differential recombination and mutation rates and with the process of dosage compensation.

  3. Comparison of Next-Generation Sequencing Systems

    Lin Liu

    2012-01-01

    Full Text Available With fast development and wide applications of next-generation sequencing (NGS technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI, which possesses the world’s biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized.

  4. Sequencing proteins with transverse ionic transport

    Boynton, Paul; di Ventra, Massimiliano

    2015-03-01

    De novo protein sequencing is essential for understanding cellular processes that govern the function of living organisms. By obtaining the order of the amino acids that composes a given protein one can determine both its secondary and tertiary structures through protein structure prediction, which is used to create models for protein aggregation diseases such as Alzheimer's Disease. Mass spectrometry is the current technique of choice for de novo sequencing, but because some amino acids have the same mass the sequence cannot be completely determined in many cases. In this paper we propose a new technique for de novo protein sequencing that involves translocating a polypeptide through a synthetic nanochannel and measuring the ionic current of each amino acid through an intersecting perpendicular nanochannel, similar to that proposed in for DNA sequencing. Indeed, we find that the distribution of ionic currents for each of the 20 proteinogenic amino acids encoded by eukaryotic genes is statistically distinct, showing this technique's potential for de novo protein sequencing.

  5. Unlocking short read sequencing for metagenomics.

    Sébastien Rodrigue

    Full Text Available BACKGROUND: Different high-throughput nucleic acid sequencing platforms are currently available but a trade-off currently exists between the cost and number of reads that can be generated versus the read length that can be achieved. METHODOLOGY/PRINCIPAL FINDINGS: We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read. CONCLUSIONS/SIGNIFICANCE: This strategy is broadly applicable to sequencing applications that benefit from low-cost high-throughput sequencing, but require longer read lengths. We demonstrate that our approach enables metagenomic analyses using the Illumina Genome Analyzer, with low error rates, and at a fraction of the cost of pyrosequencing.

  6. An enhanced algorithm for multiple sequence alignment of protein sequences using genetic algorithm.

    Kumar, Manish

    2015-01-01

    One of the most fundamental operations in biological sequence analysis is multiple sequence alignment (MSA). The basic of multiple sequence alignment problems is to determine the most biologically plausible alignments of protein or DNA sequences. In this paper, an alignment method using genetic algorithm for multiple sequence alignment has been proposed. Two different genetic operators mainly crossover and mutation were defined and implemented with the proposed method in order to know the population evolution and quality of the sequence aligned. The proposed method is assessed with protein benchmark dataset, e.g., BALIBASE, by comparing the obtained results to those obtained with other alignment algorithms, e.g., SAGA, RBT-GA, PRRP, HMMT, SB-PIMA, CLUSTALX, CLUSTAL W, DIALIGN and PILEUP8 etc. Experiments on a wide range of data have shown that the proposed algorithm is much better (it terms of score) than previously proposed algorithms in its ability to achieve high alignment quality. PMID:27065770

  7. Wavelet Analysis on Symbolic Sequences and Two-Fold de Bruijn Sequences

    Osipov, V. Al.

    2016-05-01

    The concept of symbolic sequences play important role in study of complex systems. In the work we are interested in ultrametric structure of the set of cyclic sequences naturally arising in theory of dynamical systems. Aimed at construction of analytic and numerical methods for investigation of clusters we introduce operator language on the space of symbolic sequences and propose an approach based on wavelet analysis for study of the cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a formula for counting of two-fold de Bruijn sequences, the extension of the notion of de Bruijn sequences. Possible advantages of the developed description is also discussed in context of applied problem of construction of efficient DNA sequence assembly algorithms.

  8. Wavelet Analysis on Symbolic Sequences and Two-Fold de Bruijn Sequences

    Osipov, V. Al.

    2016-07-01

    The concept of symbolic sequences play important role in study of complex systems. In the work we are interested in ultrametric structure of the set of cyclic sequences naturally arising in theory of dynamical systems. Aimed at construction of analytic and numerical methods for investigation of clusters we introduce operator language on the space of symbolic sequences and propose an approach based on wavelet analysis for study of the cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a formula for counting of two-fold de Bruijn sequences, the extension of the notion of de Bruijn sequences. Possible advantages of the developed description is also discussed in context of applied problem of construction of efficient DNA sequence assembly algorithms.

  9. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene.

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the 'CCCGCC' motif in the GFP coding sequence. PMID:27193250

  10. Artificial duplicate reads in sequencing data of 454 Genome Sequencer FLX System

    Hui Dong; Yangyi Chen; Yan Shen; Shengyue Wang; Guoping Zhao; Weirong Jin

    2011-01-01

    The 454 Genome Sequencer (GS) FLX System is one of the next-generation sequencing systems featured by long reads, high accuracy, and ultra-high throughput.Based on the mechanism of emulsion PCR, a unique DNA template would only generate a unique sequence read after being amplified and sequenced on GS FLX.However,biased amplification of DNA templates might occur in the process of emulsion PCR, which results in production of artificial duplicate reads.Under the condition that each DNA template is unique to another, 3.49%-18.14% of total reads in GS FLX-sequencing data were found to be artificial duplicate reads.These duplicate reads may lead to misunderstanding of sequencing data and special attention should be paid to the potential biases they introduced to the data.

  11. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the ‘CCCGCC’ motif in the GFP coding sequence. PMID:27193250

  12. Identification of 10 882 porcine microsatellite sequences and virtual mapping of 4528 of these sequences

    Karlskov-Mortensen, Peter; Hu, Z.L.; Gorodkin, Jan;

    2007-01-01

    the human genome (BLAST cut-off threshold = 1 x 10-5). All microsatellite sequences placed on the comparative map are accessible at http://www.animalgenome.org/QTLdb/pig.html . These sequences increase the number of identified microsatellites in the porcine genome by several orders of magnitude....... They are a new resource of microsatellite sequences for generating markers to be used in linkage studies and in fine mapping and positional cloning of quantitative trait loci....

  13. TWIN REVERSED ARTERIAL PERFUSION SEQUENCE (TRAP SEQUENCE). THE ACARDIAC /ACEPHALIC TWIN

    S. Saritha; Sumedha S. Anjankar

    2013-01-01

    Twin-Reversed Arterial Perfusion (TRAP sequence) is a rare complication of monochorionic twins (MC, twins sharing one placenta). TRAP sequence is known as acardius or chorioangiopagus parasiticus. It occurs in 1% of monochorionic twin pregnancies and in 1 in 35,000 pregnancies. The risk of recurrence was estimated 1:10,000. TRAP sequence is characterized by a structurally normal pump twin perfusing an anomalous twin. In TRAP syndrome, there is mortality and deformities in both twins. ...

  14. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    Baldwin Stephen A

    2011-03-01

    Full Text Available Abstract Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  15. Enabling technologies of genomic-scale sequence enrichment for targeted high-throughput sequencing

    Summerer, Daniel

    2009-01-01

    Next-generation sequencing has still not reached its full potential due to the technical inability of effectively targeting desired genomic regions of interest. Once available, methods adressing this bottleneck will dramatically reduce cost and enable the efficient analysis of complex samples. Recently, a number of possible approaches for genomic-scale sequence enrichment have been reported using different strategies. All methods basically rely on sequence-specific nucleic acid hybridization,...

  16. Mixture of experts models to exploit global sequence similarity on biomolecular sequence labeling

    2009-01-01

    Background Identification of functionally important sites in biomolecular sequences has broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks. Experimental determination of such sites lags far behind the number of known biomolecular sequences. Hence, there is a need to develop reliable computational methods for identifying functionally important sites from biomolecular sequences. Results We present a mixture of experts approach to b...

  17. Improving Multiple Sequence Alignments by Revising Sequence Families with Alignment Scoring Approaches

    Levchuk, Aleksandr O.

    2011-01-01

    Characterizing the functional, structural, and evolutionary relationships of biological sequences is an important task in modern genomics and computational biology. Most of these applications involve the assembly of sequence families by similarity searching, subsequent formation of multiple sequence alignments (MSAs) and downstream phylogenetic analyses. Especially, MSAs play a central role in this modeling workflow. Thus, the quality of the MSAs is of critical importance for its success. In ...

  18. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    2011-01-01

    Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/. PMID:21385349

  19. Clinical evaluation of further-developed MRCP sequences in comparison with standard MRCP sequences

    The purpose of this study was the comparison of technically improved single-shot magnetic resonance cholangiopancreatography (MRCP) sequences with standard single-shot rapid acquisition with relaxation enhancement (RARE) and half-Fourier acquired single-shot turbo spin-echo (HASTE) sequences in evaluating the normal and abnormal biliary duct system. The bile duct system of 45 patients was prospectively investigated on a 1.5-T MRI system. The investigation was performed with RARE and HASTE MR cholangiography sequences with standard and high spatial resolutions, and with a delayed-echo half-Fourier RARE (HASTE) sequence. Findings of the improved MRCP sequences were compared with the standard MRCP sequences. The level of confidence in assessing the diagnosis was divided into five groups. The Wilcoxon signed-rank test at a level of p<0.05 was applied. In 15 patients no pathology was found. The MRCP showed stenoses of the bile duct system in 10 patients and choledocholithiasis and cholecystolithiasis in 16 patients. In 12 patients a dilatation of the bile duct system was found. Comparison of the low- and high spatial resolution sequences and the short and long TE times of the half-Fourier RARE (HASTE) sequence revealed no statistically significant differences regarding accuracy of the examination. The diagnostic confidence level in assessing normal or pathological findings for the high-resolution RARE and half-Fourier RARE (HASTE) was significantly better than for the standard sequences. For the delayed-echo half-Fourier RARE (HASTE) sequence no statistically significant difference was seen. The high-resolution RARE and half-Fourier RARE (HASTE) sequences had a higher confidence level, but there was no significant difference in diagnosis in terms of detection and assessment of pathological changes in the biliary duct system compared with standard sequences. (orig.)

  20. Size dependent complexity of sequences in protein families

    Li, J.; Wang, J.; Wang, W.

    2005-10-01

    The size dependent complexity of protein sequences in various families in the FSSP database is characterized by sequence entropy, sequence similarity and sequence identity. As the average length Lf of sequences in the family increases, an increasing trend of the sequence entropy and a decreasing trend of the sequence similarity and sequence identity are found. As Lf increases beyond 250, a saturation of the sequence entropy, the sequence similarity and the sequence identity is observed. Such a saturated behavior of complexity is attributed to the saturation of the probability Pg of global (long-range) interactions in protein structures when Lf >250. It is also found that the alphabet size of residue types describing the sequence diversity depends on the value of Lf, and becomes saturated at 12.

  1. Coverage statistics for sequence census methods

    Evans Steven N

    2010-08-01

    Full Text Available Abstract Background We study the statistical properties of fragment coverage in genome sequencing experiments. In an extension of the classic Lander-Waterman model, we consider the effect of the length distribution of fragments. We also introduce a coding of the shape of the coverage depth function as a tree and explain how this can be used to detect regions with anomalous coverage. This modeling perspective is especially germane to current high-throughput sequencing experiments, where both sample preparation protocols and sequencing technology particulars can affect fragment length distributions. Results Under the mild assumptions that fragment start sites are Poisson distributed and successive fragment lengths are independent and identically distributed, we observe that, regardless of fragment length distribution, the fragments produced in a sequencing experiment can be viewed as resulting from a two-dimensional spatial Poisson process. We then study the successive jumps of the coverage function, and show that they can be encoded as a random tree that is approximately a Galton-Watson tree with generation-dependent geometric offspring distributions whose parameters can be computed. Conclusions We extend standard analyses of shotgun sequencing that focus on coverage statistics at individual sites, and provide a null model for detecting deviations from random coverage in high-throughput sequence census based experiments. Our approach leads to explicit determinations of the null distributions of certain test statistics, while for others it greatly simplifies the approximation of their null distributions by simulation. Our focus on fragments also leads to a new approach to visualizing sequencing data that is of independent interest.

  2. Calling SNPs without a reference sequence

    Schuster Stephan C

    2010-03-01

    Full Text Available Abstract Background The most common application for the next-generation sequencing technologies is resequencing, where short reads from the genome of an individual are aligned to a reference genome sequence for the same species. These mappings can then be used to identify genetic differences among individuals in a population, and perhaps ultimately to explain phenotypic variation. Many algorithms capable of aligning short reads to the reference, and determining differences between them have been reported. Much less has been reported on how to use these technologies to determine genetic differences among individuals of a species for which a reference sequence is not available, which drastically limits the number of species that can easily benefit from these new technologies. Results We describe a computational pipeline, called DIAL (De novo Identification of Alleles, for identifying single-base substitutions between two closely related genomes without the help of a reference genome. The method works even when the depth of coverage is insufficient for de novo assembly, and it can be extended to determine small insertions/deletions. We evaluate the software's effectiveness using published Roche/454 sequence data from the genome of Dr. James Watson (to detect heterozygous positions and recent Illumina data from orangutan, in each case comparing our results to those from computational analysis that uses a reference genome assembly. We also illustrate the use of DIAL to identify nucleotide differences among transcriptome sequences. Conclusions DIAL can be used for identification of nucleotide differences in species for which no reference sequence is available. Our main motivation is to use this tool to survey the genetic diversity of endangered species as the identified sequence differences can be used to design genotyping arrays to assist in the species' management. The DIAL source code is freely available at http://www.bx.psu.edu/miller_lab/.

  3. Sequence conservation on the Y chromosome

    Gibson, L.H.; Yang-Feng, L. [Yale Univ. School of Medicine, New Haven, CT (United States); Lau, C. [Univ. of California, San Francisco, CA (United States)

    1994-09-01

    The Y chromosome is present in all mammals and is considered to be essential to sex determination. Despite intense genomic research, only a few genes have been identified and mapped to this chromosome in humans. Several of them, such as SRY and ZFY, have been demonstrated to be conserved and Y-located in other mammals. In order to address the issue of sequence conservation on the Y chromosome, we performed fluorescence in situ hybridization (FISH) with DNA from a human Y cosmid library as a probe to study the Y chromosomes from other mammalian species. Total DNA from 3,000-4,500 cosmid pools were labeled with biotinylated-dUTP and hybridized to metaphase chromosomes. For human and primate preparations, human cot1 DNA was included in the hybridization mixture to suppress the hybridization from repeat sequences. FISH signals were detected on the Y chromosomes of human, gorilla, orangutan and baboon (Old World monkey) and were absent on those of squirrel monkey (New World monkey), Indian munjac, wood lemming, Chinese hamster, rat and mouse. Since sequence analysis suggested that specific genes, e.g. SRY and ZFY, are conserved between these two groups, the lack of detectable hybridization in the latter group implies either that conservation of the human Y sequences is limited to the Y chromosomes of the great apes and Old World monkeys, or that the size of the syntenic segment is too small to be detected under the resolution of FISH, or that homologeous sequences have undergone considerable divergence. Further studies with reduced hybridization stringency are currently being conducted. Our results provide some clues as to Y-sequence conservation across species and demonstrate the limitations of FISH across species with total DNA sequences from a particular chromosome.

  4. Comparative sequence analysis for Brassica oleracea with similar sequences in B. rapa and Arabidopsis thaliana

    Qiu, Dan; Gao, Muqiang; Li, Genyi; Quiros, Carlos

    2009-01-01

    We sequenced five BAC clones of Brassica oleracea doubled haploid ‘Early Big' broccoli containing major genes in the aliphatic glucosinolate pathway, and comparatively analyzed them with similar sequences in A. thaliana and B. rapa. Additionally, we included in the analysis published sequences from three other B. oleracea BAC clones and a contig of this species corresponding to segments in A. thaliana chromosomes IV and V. A total of 2,946 kb of B. oleracea, 1,069 kb of B. rapa sequence and 2...

  5. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid...

  6. Microsatellite Length Scoring by Single Molecule Real Time Sequencing - Effects of Sequence Structure and PCR Regime.

    Liljegren, Mikkel Meyn; de Muinck, Eric Jacques; Trosvik, Pål

    2016-01-01

    Microsatellites are DNA sequences consisting of repeated, short (1-6 bp) sequence motifs that are highly mutable by enzymatic slippage during replication. Due to their high intrinsic variability, microsatellites have important applications in population genetics, forensics, genome mapping, as well as cancer diagnostics and prognosis. The current analytical standard for microsatellites is based on length scoring by high precision electrophoresis, but due to increasing efficiency next-generation sequencing techniques may provide a viable alternative. Here, we evaluated single molecule real time (SMRT) sequencing, implemented in the PacBio series of sequencing apparatuses, as a means of microsatellite length scoring. To this end we carried out multiplexed SMRT sequencing of plasmid-carried artificial microsatellites of varying structure under different pre-sequencing PCR regimes. For each repeat structure, reads corresponding to the target length dominated. We found that pre-sequencing amplification had large effects on scoring accuracy and error distribution relative to controls, but that the effects of the number of amplification cycles were generally weak. In line with expectations enzymatic slippage decreased proportionally with microsatellite repeat unit length and increased with repetition number. Finally, we determined directional mutation trends, showing that PCR and SMRT sequencing introduced consistent but opposing error patterns in contraction and expansion of the microsatellites on the repeat motif and single nucleotide level. PMID:27414800

  7. Microsatellite Length Scoring by Single Molecule Real Time Sequencing - Effects of Sequence Structure and PCR Regime.

    Mikkel Meyn Liljegren

    Full Text Available Microsatellites are DNA sequences consisting of repeated, short (1-6 bp sequence motifs that are highly mutable by enzymatic slippage during replication. Due to their high intrinsic variability, microsatellites have important applications in population genetics, forensics, genome mapping, as well as cancer diagnostics and prognosis. The current analytical standard for microsatellites is based on length scoring by high precision electrophoresis, but due to increasing efficiency next-generation sequencing techniques may provide a viable alternative. Here, we evaluated single molecule real time (SMRT sequencing, implemented in the PacBio series of sequencing apparatuses, as a means of microsatellite length scoring. To this end we carried out multiplexed SMRT sequencing of plasmid-carried artificial microsatellites of varying structure under different pre-sequencing PCR regimes. For each repeat structure, reads corresponding to the target length dominated. We found that pre-sequencing amplification had large effects on scoring accuracy and error distribution relative to controls, but that the effects of the number of amplification cycles were generally weak. In line with expectations enzymatic slippage decreased proportionally with microsatellite repeat unit length and increased with repetition number. Finally, we determined directional mutation trends, showing that PCR and SMRT sequencing introduced consistent but opposing error patterns in contraction and expansion of the microsatellites on the repeat motif and single nucleotide level.

  8. Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.

    Laehnemann, David; Borkhardt, Arndt; McHardy, Alice Carolyn

    2016-01-01

    Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159

  9. End Sequencing and Finger Printing of Human & Mouse BAC Libraries

    Fraser, C

    2005-09-27

    This project provided for continued end sequencing of existing and new BAC libraries constructed to support human sequencing as well as to initiate BAC end sequencing from the mouse BAC libraries constructed to support mouse sequencing. The clones, the sequences, and the fingerprints are now an available resource for the community at large. Research and development of new metaodologies for BAC end sequencing have reduced costs and increase throughput.

  10. Nucleotide sequence and genome organization of carnation mottle virus RNA.

    Guilley, H; Carrington, J C; Balàzs, E; Jonard, G; Richards, K; Morris, T J

    1985-01-01

    The complete nucleotide sequence of carnation mottle genomic RNA (4003 nucleotides) is presented. The sequence was determined for cloned cDNA copies of viral RNA containing over 99% of the sequence and was completed by direct sequence analysis of RNA and cDNA transcripts. The sequence contains two long open reading frames which together can account for observed translation products. One translation product would arise by suppression of an amber termination codon and the sequence raises the po...

  11. Div-BLAST: Diversification of Sequence Search Results

    Eser, Elif; Can, Tolga; Ferhatosmanoğlu, Hakan

    2014-01-01

    Sequence similarity tools, such as BLAST, seek sequences most similar to a query from a database of sequences. They return results significantly similar to the query sequence and that are typically highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory approach, where the initial results guide the user to new searches. However, diversity has not yet been considered an integral component of sequence search tools for this discipline. Some redundancy ...

  12. Lossless Compression and Complexity of Chaotic Sequences

    Nagaraj, Nithin; T., Arjun Venugopal; Krishnan, Nithin

    2011-01-01

    We investigate the complexity of short symbolic sequences of chaotic dynamical systems by using lossless compression algorithms. In particular, we study Non-Sequential Recursive Pair Substitution (NSRPS), a lossless compression algorithm first proposed by W. Ebeling et al. [Math. Biosc. 52, 1980] and Jim\\'{e}nez-Monta\\~{n}o et al. [arXiv:cond-mat/0204134, 2002]) which was subsequently shown to be optimal. NSPRS has also been used to estimate Entropy of written English (P. Grassberger [arXiv:physics/0207023, 2002]). We propose a new measure of complexity - defined as the number of iterations of NSRPS required to transform the input sequence into a constant sequence. We test this measure on symbolic sequences of the Logistic map for various values of the bifurcation parameter. The proposed measure of complexity is easy to compute and is observed to be highly correlated with the Lyapunov exponent of the original non-linear time series, even for very short symbolic sequences (as short as 50 samples). Finally, we ...

  13. The IMGT/HLA sequence database.

    Robinson, J; Marsh, S G

    2000-01-01

    The IMGT/HLA database (wwwebi.ac.uk/imgt/hla/) specialises in sequences of the polymorphic genes of the HLA system, the humanmajor histocompatibility complex (MHC). This complex is located within the 6p213 region on the short arm of human chromosome 6 and contains more than 220 genes of diverse function. Many of the genes encode proteins of the immune system and these include the 21 highly polymorphic HLA genes, which influence the outcome of clinical transplantation and confer susceptibility to a wide range of non-infectious diseases. The database contains sequences for all HLA alleles officially recognised by the WHO Nomenclature Committee for Factors of the HLA System and provides users with online tools and facilities for their retrieval and analysis. These include allele reports, alignment tools, and detailed descriptions of the source cells. The online submission tool allows both new and confirmatory sequences to be submitted directly to the WHO Nomenclature Committee. The latest version (release 1.10.0 April 2001) contains 1329 HLA alleles, 61 HLA related sequences, derived from around 3350 component sequences from the EMBL/ GenBank/DDBJ databases. The IMGT/HLA database provides a model that will be extended to provide specialist databases for polymorphic MHC genes of other species. PMID:12361093

  14. BLEACHING EUCALYPTUS PULPS WITH SHORT SEQUENCES

    Flaviana Reis Milagres

    2011-03-01

    Full Text Available Eucalyptus spp kraft pulp, due to its high content of hexenuronic acids, is quite easy to bleach. Therefore, investigations have been made attempting to decrease the number of stages in the bleaching process in order to minimize capital costs. This study focused on the evaluation of short ECF (Elemental Chlorine Free and TCF (Totally Chlorine Free sequences for bleaching oxygen delignified Eucalyptus spp kraft pulp to 90% ISO brightness: PMoDP (Molybdenum catalyzed acid peroxide, chlorine dioxide and hydrogen peroxide, PMoD/P (Molybdenum catalyzed acid peroxide, chlorine dioxide and hydrogen peroxide, without washing PMoD(PO (Molybdenum catalyzed acid peroxide, chlorine dioxide and pressurized peroxide, D(EPODP (chlorine dioxide, extraction oxidative with oxygen and peroxide, chlorine dioxide and hydrogen peroxide, PMoQ(PO (Molybdenum catalyzed acid peroxide, DTPA and pressurized peroxide, and XPMoQ(PO (Enzyme, molybdenum catalyzed acid peroxide, DTPA and pressurized peroxide. Uncommon pulp treatments, such as molybdenum catalyzed acid peroxide (PMo and xylanase (X bleaching stages, were used. Among the ECF alternatives, the two-stage PMoD/P sequence proved highly cost-effective without affecting pulp quality in relation to the traditional D(EPODP sequence and produced better quality effluent in relation to the reference. However, a four stage sequence, XPMoQ(PO, was required to achieve full brightness using the TCF technology. This sequence was highly cost-effective although it only produced pulp of acceptable quality.

  15. DNA sequence of the yeast transketolase gene.

    Fletcher, T S; Kwee, I L; Nakada, T; Largman, C; Martin, B M

    1992-02-18

    Transketolase (EC 2.2.1.1) is the enzyme that, together with aldolase, forms a reversible link between the glycolytic and pentose phosphate pathways. We have cloned and sequenced the transketolase gene from yeast (Saccharomyces cerevisiae). This is the first transketolase gene of the pentose phosphate shunt to be sequenced from any source. The molecular mass of the proposed translated protein is 73,976 daltons, in good agreement with the observed molecular mass of about 75,000 daltons. The 5'-nontranslated region of the gene is similar to other yeast genes. There is no evidence of 5'-splice junctions or branch points in the sequence. The 3'-nontranslated region contains the polyadenylation signal (AATAAA), 80 base pairs downstream from the termination codon. A high degree of homology is found between yeast transketolase and dihydroxyacetone synthase (formaldehyde transketolase) from the yeast Hansenula polymorpha. The overall sequence identity between these two proteins is 37%, with four regions of much greater similarity. The regions from amino acid residues 98-131, 157-182, 410-433, and 474-489 have sequence identities of 74%, 66%, 83%, and 82%, respectively. One of these regions (157-182) includes a possible thiamin pyrophosphate (TPP) binding domain, and another (410-433) may contain the catalytic domain. PMID:1737042

  16. SNAD: sequence name annotation-based designer

    Gorbalenya Alexander E

    2009-08-01

    Full Text Available Abstract Background A growing diversity of biological data is tagged with unique identifiers (UIDs associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Results Here we introduce SNAD (Sequence Name Annotation-based Designer that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. Conclusion A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.

  17. DNA sequencing by synthesis with degenerate primers

    2008-01-01

    The degenerate primer-based sequencing Was developed by a synthesis method(DP-SBS)for high-throughput DNA sequencing,in which a set of degenerate primers are hybridized on the arrayed DNA templates and extended by DNA polymerase on microarrays.In this method,adifferent set of degenerate primers containing a give nnumber(n)of degenerate nucleotides at the 3'-ends were annealed to the sequenced templates that were immobilized on the solid surface.The nucleotides(n+1)on the template sequences were determined by detecting the incorporation of fluorescent labeled nucleotides.The fluorescent labeled nucleotide was incorporated into the primer in a base-specific manner after the enzymatic primer extension reactions and nine-base length were read out accurately.The main advanmge of the DP-SBS is that the method only uses very conventional biochemical reagents and avoids the complicated special chemical reagents for removing the labeled nucleotides and reactivating the primer for further extension.From the present study,it is found that the DP-SBS method is reliable,simple,and cost-effective for laboratory-sequencing a large amount of short DNA fragments.

  18. Viral genome sequencing by random priming methods

    Zhang Xinsheng

    2008-01-01

    Full Text Available Abstract Background Most emerging health threats are of zoonotic origin. For the overwhelming majority, their causative agents are RNA viruses which include but are not limited to HIV, Influenza, SARS, Ebola, Dengue, and Hantavirus. Of increasing importance therefore is a better understanding of global viral diversity to enable better surveillance and prediction of pandemic threats; this will require rapid and flexible methods for complete viral genome sequencing. Results We have adapted the SISPA methodology 123 to genome sequencing of RNA and DNA viruses. We have demonstrated the utility of the method on various types and sources of viruses, obtaining near complete genome sequence of viruses ranging in size from 3,000–15,000 kb with a median depth of coverage of 14.33. We used this technique to generate full viral genome sequence in the presence of host contaminants, using viral preparations from cell culture supernatant, allantoic fluid and fecal matter. Conclusion The method described is of great utility in generating whole genome assemblies for viruses with little or no available sequence information, viruses from greatly divergent families, previously uncharacterized viruses, or to more fully describe mixed viral infections.

  19. Entropy estimation of very short symbolic sequences

    Lesne, Annick; Blanc, Jean-Luc; Pezard, Laurent

    2009-04-01

    While entropy per unit time is a meaningful index to quantify the dynamic features of experimental time series, its estimation is often hampered in practice by the finite length of the data. We here investigate the performance of entropy estimation procedures, relying either on block entropies or Lempel-Ziv complexity, when only very short symbolic sequences are available. Heuristic analytical arguments point at the influence of temporal correlations on the bias and statistical fluctuations, and put forward a reduced effective sequence length suitable for error estimation. Numerical studies are conducted using, as benchmarks, the wealth of different dynamic regimes generated by the family of logistic maps and stochastic evolutions generated by a Markov chain of tunable correlation time. Practical guidelines and validity criteria are proposed. For instance, block entropy leads to a dramatic overestimation for sequences of low entropy, whereas it outperforms Lempel-Ziv complexity at high entropy. As a general result, the quality of entropy estimation is sensitive to the sequence temporal correlation hence self-consistently depends on the entropy value itself, thus promoting a two-step procedure. Lempel-Ziv complexity is to be preferred in the first step and remains the best estimator for highly correlated sequences.

  20. Sequence analysis by iterated maps, a review.

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results. PMID:24162172

  1. OTU analysis using metagenomic shotgun sequencing data.

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  2. Probabilistic Sequence Learning in Mild Cognitive Impairment

    Dezso eNemeth

    2013-07-01

    Full Text Available Mild Cognitive Impairment (MCI causes slight but noticeable disruption in cognitive systems, primarily executive and memory functions. However, it is not clear if the development of sequence learning is affected by an impaired cognitive system and, if so, how. The goal of our study was to investigate the development of probabilistic sequence learning, from the initial acquisition to consolidation, in MCI and healthy elderly control groups. We used the Alternating Serial Reaction Time task (ASRT to measure probabilistic sequence learning. Individuals with MCI showed weaker learning performance than the healthy elderly group. However, using the reaction times only from the second half of each learning block – after the reactivation phase - we found intact learning in MCI. Based on the assumption that the first part of each learning block is related to reactivation/recall processes, we suggest that these processes are affected in MCI. The 24-hour offline period showed no effect on sequence-specific learning in either group but did on general skill learning: the healthy elderly group showed offline improvement in general reaction times while individuals with MCI did not. Our findings deepen our understanding regarding the underlying mechanisms and time course of sequence acquisition and consolidation.

  3. Variable depth recursion algorithm for leaf sequencing

    The processes of extraction and sweep are basic segmentation steps that are used in leaf sequencing algorithms. A modified version of a commercial leaf sequencer changed the way that the extracts are selected and expanded the search space, but the modification maintained the basic search paradigm of evaluating multiple solutions, each one consisting of up to 12 extracts and a sweep sequence. While it generated the best solutions compared to other published algorithms, it used more computation time. A new, faster algorithm selects one extract at a time but calls itself as an evaluation function a user-specified number of times, after which it uses the bidirectional sweeping window algorithm as the final evaluation function. To achieve a performance comparable to that of the modified commercial leaf sequencer, 2-3 calls were needed, and in all test cases, there were only slight improvements beyond two calls. For the 13 clinical test maps, computation speeds improved by a factor between 12 and 43, depending on the constraints, namely the ability to interdigitate and the avoidance of the tongue-and-groove under dose. The new algorithm was compared to the original and modified versions of the commercial leaf sequencer. It was also compared to other published algorithms for 1400, random, 15x15, test maps with 3-16 intensity levels. In every single case the new algorithm provided the best solution

  4. Extended sequence diagram for human system interaction

    Unified Modeling Language (UML) is a modeling language in the field of object oriented software engineering. The sequence diagram is a kind of interaction diagram that shows how processes operate with one another and in what order. It is a construct of a message sequence chart. It depicts the objects and classes involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario. This paper proposes the Extended Sequence Diagram (ESD), which is capable of depicting human system interaction for nuclear power plants, as well as cognitive process of operators analysis. In the conventional sequence diagram, there is a limit to only identify the activities of human and systems interactions. The ESD is extended to describe operators' cognitive process in more detail. The ESD is expected to be used as a task analysis method for describing human system interaction. The ESD can also present key steps causing abnormal operations or failures and diverse human errors based on cognitive condition

  5. Metal resistance sequences and transgenic plants

    Meagher, Richard Brian; Summers, Anne O.; Rugh, Clayton L.

    1999-10-12

    The present invention provides nucleic acid sequences encoding a metal ion resistance protein, which are expressible in plant cells. The metal resistance protein provides for the enzymatic reduction of metal ions including but not limited to divalent Cu, divalent mercury, trivalent gold, divalent cadmium, lead ions and monovalent silver ions. Transgenic plants which express these coding sequences exhibit increased resistance to metal ions in the environment as compared with plants which have not been so genetically modified. Transgenic plants with improved resistance to organometals including alkylmercury compounds, among others, are provided by the further inclusion of plant-expressible organometal lyase coding sequences, as specifically exemplified by the plant-expressible merB coding sequence. Furthermore, these transgenic plants which have been genetically modified to express the metal resistance coding sequences of the present invention can participate in the bioremediation of metal contamination via the enzymatic reduction of metal ions. Transgenic plants resistant to organometals can further mediate remediation of organic metal compounds, for example, alkylmetal compounds including but not limited to methyl mercury, methyl lead compounds, methyl cadmium and methyl arsenic compounds, in the environment by causing the freeing of mercuric or other metal ions and the reduction of the ionic mercury or other metal ions to the less toxic elemental mercury or other metals.

  6. Ion Torren Semiconductor Sequencing Allows Rapid, Low Cost Sequencing of the Human Exome ( 7th Annual SFAF Meeting, 2012)

    Jenkins, David [EdgeBio

    2012-06-01

    David Jenkins on "Ion Torrent semiconductor sequencing allows rapid, low-cost sequencing of the human exome" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  7. Generation of physical map contig-specific sequences useful for whole genome sequence scaffolding.

    Yanliang Jiang

    Full Text Available Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge.

  8. Sequence analysis of 16S rRNA from mycoplasmas by direct solid-phase DNA sequencing.

    Pettersson, B; Johansson, K. E.; Uhlén, M

    1994-01-01

    Automated solid-phase DNA sequencing was used for determination of partial 16S ribosomal DNA sequences of mycoplasmas. The sequence information was used to establish phylogenetic relationships of 11 different mycoplasmas whose 16S rRNA sequences had not been determined earlier. A biotinylated fragment corresponding to positions 344 to 939 in the Escherichia coli sequence was generated by PCR. The PCR product was immobilized onto streptavidin-coated paramagnetic beads, and direct sequencing wa...

  9. LARGE DEVIATIONS FOR SOME DEPENDENT SEQUENCES

    Hu Shuhe; Wang Xuejun

    2008-01-01

    Let (Xi) be a martingale difference sequence and Sn=n∑i=1 Xi. Suppose (Xi) is bounded in Lp. In the case p2, Lesigne and Volny (Stochastic Process. Appl. 96 (2001) 143) obtained the estimation μ(Sn n) < cn-p/2, Yulin Li (Statist. Probab.Lett. 62 (2003) 317) generalized the result to the case when p ∈(1, 2) and obtained μ(Sn n) < cn1-P, these are optimal in a certain sense. In this article, the authors study the large deviation of Sn for some dependent sequences and obtain the same order optimal upper bounds for μ(Sn n) as those for martingale difference sequence.

  10. Degree sequences of k-multi-hypertournaments

    Pirzada S

    2009-01-01

    Let n and k (n ≥ k > 1) be two non-negative integers. A k-multi-hypertournament on n vertices is a pair (V, A), where V is a set of vertices with |V|= n, and A is a set of k-tuples of vertices, called arcs, such that for any k-subset S of V, A contains at least one (at most k!) of the k! k-tuples whose entries belong to S. The necessary and sufficient conditions for a non-decreasing sequence of non-negative integers to be the out-degree sequence (in-degree sequence) of some k-multi-hypertournament are given.

  11. Inferring interaction partners from protein sequences

    Bitbol, Anne-Florence; Colwell, Lucy J; Wingreen, Ned S

    2016-01-01

    Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multi-protein complexes, and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners. Hence, the sequences of interacting partners are correlated. Here we exploit these correlations to accurately identify which proteins are specific interaction partners from sequence data alone. Our general approach, which employs a pairwise maximum entropy model to infer direct couplings between residues, has been successfully used to predict the three-dimensional structures of proteins from sequences. Building on this approach, we introduce an iterative algorithm to predict specific interaction partners from among the members of two protein families. We assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. The algorithm proves successful without any a pri...

  12. Finding Sequential Patterns from Large Sequence Data

    Esmaeili, Mahdi

    2010-01-01

    Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Sequential pattern mining finds sets of data items that occur together frequently in some sequences. Sequential pattern mining, which extracts frequent subsequences from a sequence database, has attracted a great deal of interest during the recent data mining research because it is the basis of many applications, such as: web user analysis, stock trend prediction, DNA sequence analysis, finding language or linguistic patterns from natural language texts, and using the history of symptoms to predict certain kind of disease. The diversity of the applications may not be possible to apply a single sequential pattern model to all these problems. Each application may require a unique model and solution. A number of research projects were established in recent years to develop meaningful sequential pattern...

  13. Microbial species delineation using whole genome sequences

    Kyrpides, Nikos; Mukherjee, Supratim; Ivanova, Natalia; Mavrommatics, Kostas; Pati, Amrita; Konstantinidis, Konstantinos

    2014-10-20

    Species assignments in prokaryotes use a manual, poly-phasic approach utilizing both phenotypic traits and sequence information of phylogenetic marker genes. With thousands of genomes being sequenced every year, an automated, uniform and scalable approach exploiting the rich genomic information in whole genome sequences is desired, at least for the initial assignment of species to an organism. We have evaluated pairwise genome-wide Average Nucleotide Identity (gANI) values and alignment fractions (AFs) for nearly 13,000 genomes using our fast implementation of the computation, identifying robust and widely applicable hard cut-offs for species assignments based on AF and gANI. Using these cutoffs, we generated stable species-level clusters of organisms, which enabled the identification of several species mis-assignments and facilitated the assignment of species for organisms without species definitions.

  14. A glass menagerie of low complexity sequences.

    Halfmann, Randal

    2016-06-01

    Remarkably simple proteins play outsize roles in the execution of developmental complexity within biological systems. Sequence information determines structure and hence function, so how do low complexity sequences fulfill their functions? Recent discoveries are raising the curtain on a new dimension of the sequence-structure paradigm. In it, function derives not from the structures of individual proteins, but instead, from dynamic material properties of entire ensembles of the proteins acting in unison through phase changes. These phases include liquids, one-dimensional crystals, and - as elaborated herein - even glasses. The peculiar thermodynamics of glass-like protein assemblies, in particular, illuminate new principles of information flow through and, at times, orthogonal to the central dogma of molecular biology. PMID:27258703

  15. Physical approaches to DNA sequencing and detection

    Zwolak, Michael

    2007-01-01

    With the continued improvement of sequencing technologies, the prospect of genome-based medicine is now at the forefront of scientific research. To realize this potential, however, we need a revolutionary sequencing method for the cost-effective and rapid interrogation of individual genomes. This capability is likely to be provided by a physical approach to probing DNA at the single nucleotide level. This is in sharp contrast to current techniques and instruments which probe, through chemical elongation, electrophoresis, and optical detection, length differences and terminating bases of strands of DNA. In this Colloquium we review several physical approaches to DNA detection that have the potential to deliver fast and low-cost sequencing. Center-fold to these approaches is the concept of nanochannels or nanopores which allow for the spatial confinement of DNA molecules. In addition to their possible impact in medicine and biology, the methods offer ideal test beds to study open scientific issues and challenge...

  16. Developmental sequence of Cambrian embryo Markuelia

    DONG XiPing

    2007-01-01

    Based on more exquisitely preserved specimens of Markuelia hunanensis recently recovered from Middle and Upper Cambrian in western Hunan and in the light of Synchrotron radiation X-ray tomographic microscopy, the developmental sequence from cleavage through organogenesis to the pre-hatching of Cambrian embryo Markuelia, especially the developmental sequence during the pre-hatching stage, i.e. from the earliest period when the scalids and tail spines only took shape to the latest period (just about hatching), is established. This developmental sequence provides a pattern of embryonic development during the pre-hatching stage, which has not been established in the living scalidophorans (priapulids, Ioriciferans and kinorhynchs). Thus, it not only enriches our knowledge on the embryonic development of the extant descendants of Markuelia, but also opens a new window to the evolution and development of the animal.

  17. TRAP Sequence - An Interesting Entity in Twins

    R H Srinivas Prasad

    2012-01-01

    Full Text Available Twin reversed arterial perfusion (TRAP sequence, is a rare malformation occurring in monozygotic multiple gestations. One well-developed normal (pump twin and the other twin with absent cardiac structure (acardiac, who is hemodynamically dependent on the normal (pump twin are characteristic of this syndrome. The acardiac twin develops multiple anomalies that make survival difficult. The prognosis of the pump twin is variable with mortality rate ranging from 50% to 70%. Complications that affect the prognosis of the pump twin include complications of congestive cardiac failure due to increased cardiac demand, prematurity secondary to preterm delivery, and polyhydramnios. Because of these complications prompt detection, follow-up, and treatment of this condition is very important. We report two cases of TRAP sequence that emphasizes the importance of gray-scale and color Doppler imaging in diagnosis, detection of poor prognostic features, follow-up, and management of TRAP sequence.

  18. Dynamics and Control of DNA Sequence Amplification

    Marimuthu, Karthikeyan

    2014-01-01

    DNA amplification is the process of replication of a specified DNA sequence \\emph{in vitro} through time-dependent manipulation of its external environment. A theoretical framework for determination of the optimal dynamic operating conditions of DNA amplification reactions, for any specified amplification objective, is presented based on first-principles biophysical modeling and control theory. Amplification of DNA is formulated as a problem in control theory with optimal solutions that can differ considerably from strategies typically used in practice. Using the Polymerase Chain Reaction (PCR) as an example, sequence-dependent biophysical models for DNA amplification are cast as control systems, wherein the dynamics of the reaction are controlled by a manipulated input variable. Using these control systems, we demonstrate that there exists an optimal temperature cycling strategy for geometric amplification of any DNA sequence and formulate optimal control problems that can be used to derive the optimal tempe...

  19. Spectral sequences in smooth generalized cohomology

    Grady, Daniel

    2016-01-01

    We consider spectral sequences in smooth generalized cohomology theories, including differential generalized cohomology theories. The main differential spectral sequences will be of the Atiyah-Hirzebruch (AHSS) type, where we provide a filtration by the Cech resolution of smooth manifolds. This allows for systematic study of torsion in differential cohomology. We apply this in detail to smooth Deligne cohomology, differential topological complex K-theory, and to a smooth extension of integral Morava K-theory that we introduce. In each case we explicitly identify the differentials in the corresponding spectral sequences, which exhibit an interesting and systematic interplay between (refinement of) classical cohomology operations, operations involving differential forms, and operations on cohomology with U(1) coefficients.

  20. Identifying driver mutations in sequenced cancer genomes

    Raphael, Benjamin J; Dobson, Jason R; Oesper, Layla;

    2014-01-01

    protein sequence or structure. Finally, we review techniques to identify recurrent combinations of somatic mutations, including approaches that examine mutations in known pathways or protein-interaction networks, as well as de novo approaches that identify combinations of mutations according to......-throughput DNA sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on......, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high...

  1. Auditory sequence analysis and phonological skill.

    Grube, Manon; Kumar, Sukhbinder; Cooper, Freya E; Turton, Stuart; Griffiths, Timothy D

    2012-11-01

    This work tests the relationship between auditory and phonological skill in a non-selected cohort of 238 school students (age 11) with the specific hypothesis that sound-sequence analysis would be more relevant to phonological skill than the analysis of basic, single sounds. Auditory processing was assessed across the domains of pitch, time and timbre; a combination of six standard tests of literacy and language ability was used to assess phonological skill. A significant correlation between general auditory and phonological skill was demonstrated, plus a significant, specific correlation between measures of phonological skill and the auditory analysis of short sequences in pitch and time. The data support a limited but significant link between auditory and phonological ability with a specific role for sound-sequence analysis, and provide a possible new focus for auditory training strategies to aid language development in early adolescence. PMID:22951739

  2. Indexing for Large DNA Database Sequences

    S. M. Wohoush & M.H. Saheb

    2011-10-01

    Full Text Available Bioinformatics data consists of a huge amount of information due to the large number ofsequences, the very high sequences lengths and the daily new additions. This data need to beefficiently accessed for many needs. What makes one DNA data item distinct from another is itsDNA sequence. DNA sequence consists of a combination of four characters which are A, C, G, Tand have different lengths. Use a suitable representation of DNA sequences, and a suitable indexstructure to hold this representation at main memory will lead to have efficient processing byaccessing the DNA sequences through indexing, and will reduce number of disk I/O accesses.I/O operations needed at the end, to avoid false hits, we reduce the number of candidate DNAsequences that need to be checked by pruning, so no need to search the whole database. Weneed to have a suitable index for searching DNA sequences efficiently, with suitable index sizeand searching time. The suitable selection of relation fields, where index is build upon has a bigeffect on index size and search time. Our experiments use the n-gram wavelet transformationupon one field and multi-fields index structure under the relational DBMS environment. Resultsshow the need to consider index size and search time while using indexing carefully. Increasingwindow size decreases the amount of I/O reference. The use of a single field and multiple fieldsindexing is highly affected by window size value. Increasing window size value lead to bettersearching time with special type index using single filed indexing. While the search time is almostgood and the same with most index types when using multiple field indexing. Storage spaceneeded for RDMS indexing types are almost the same or greater than the actual data.

  3. Image correlation method for DNA sequence alignment.

    Millaray Curilem Saldías

    Full Text Available The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs and 100 scenes represented by 100 x 100 images each (in total, one million base pair database were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%, specificity (98.99% and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment.

  4. BAC-pool 454-sequencing: A rapid and efficient approach to sequence complex tetraploid cotton genomes

    New and emerging next generation sequencing technologies have been promising in reducing sequencing costs, but not significantly for complex polyploid plant genomes such as cotton. Large and highly repetitive genome of G. hirsutum (~2.5GB) is less amenable and cost-intensive with traditional BAC-by...

  5. Cloning, sequencing, and sequence analysis of two novel plasmids from the thermophilic anaerobic bacterium Anaerocellum thermophilum

    Clausen, Anders; Mikkelsen, Marie Just; Schrøder, I.;

    2004-01-01

    The nucleotide sequence of two novel plasmids isolated from the extreme thermophilic anaerobic bacterium Anaerocellum thermophilum DSM6725 (A. thermophilum), growing optimally at 70degreesC, has been determined. pBAS2 was found to be a 3653 bp plasmid with a GC content of 43%, and the sequence...

  6. Sequence, Program Words, and Sequence Rationale for the 1971 Revised First-Grade Spelling Program.

    Berdiansky, Betty

    Pupils participating in the 1971-1972 tryout of the Southwest Regional Laboratory (SWRL) First Grade Spelling Program were taught to combine consonants and consonant clusters with word elements to form program words. This paper presents the sequence of instruction for these elements and the rationale used in deriving this sequence. In addition, it…

  7. Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

    Miri eMichaeli

    2012-12-01

    Full Text Available High throughput sequencing (HTS yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner, a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier, a program for identifying legitimate and artifact insertions and/or deletions (indels. Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.

  8. Complete Genome Sequence of Streptococcus agalactiae CNCTC 10/84, a Hypervirulent Sequence Type 26 Strain

    Hooven, Thomas A.; Randis, Tara M.; Sean C Daugherty; Narechania, Apurva; Planet, Paul J.; Tettelin, Hervé; Ratner, Adam J.

    2014-01-01

    Streptococcus agalactiae (group B Streptococcus [GBS]) is a human pathogen with a propensity to cause neonatal infections. We report the complete genome sequence of GBS strain CNCTC 10/84, a hypervirulent clinical isolate frequently used to study GBS pathogenesis. Comparative analysis of this sequence may shed light on novel pathogenic mechanisms.

  9. Sequence embedding for fast construction of guide trees for multiple sequence alignment

    Blackshields, Gordon

    2010-05-14

    Abstract Background The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http:\\/\\/www.clustal.org\\/mbed.tgz.

  10. Sequence embedding for fast construction of guide trees for multiple sequence alignment

    Wilm Andreas

    2010-05-01

    Full Text Available Abstract Background The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http://www.clustal.org/mbed.tgz.

  11. Finding Common Sequence and Structure Motifs in a set of RNA sequences

    Gorodkin, Jan; Heyer, Laurie J.; Stormo, Gary D.

    maintains tractability by constructing multi-sequence alignments for pairwise comparisons. The overall method has similarities to both CLUSTAL and CONSENSUS, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. Example solutions, and...... comparisons with other approaches, are provided. The solutions include finding consensus structure identical to published ones....

  12. Selection of DNA clones with enhancer sequences.

    Asoh, S; Lee-Kwon, W; Mouradian, M M; Nirenberg, M

    1994-01-01

    A method is described for selection of DNA clones that contain enhancer sequences that activate gene expression. An Escherichia coli-rodent cell shuttle vector, pPyE0, was used that contains polyoma viral DNA without the polyoma enhancer region. Replication of pPyE0 DNA in mouse cells is markedly reduced due to deletion of the polyoma enhancer region. Insertion of mouse genomic DNA fragments that contain putative enhancer sequences into pPyE0 adjacent to the polyoma origin of replication rest...

  13. ASTRAL, a hyperspectral imaging DNA sequencer

    O'Brien, Kevin M.; Wren, Jonathan; Davé, Varshal K.; Bai, Diane; Anderson, Richard D.; Rayner, Simon; Evans, Glen A.; Dabiri, Ali E.; Garner, Harold R.

    1998-05-01

    We are developing a prototype automatic DNA sequencer which utilizes polyacrylamide slab gels imaged through a novel optical detection system. The design of this prototype sequencer allows the ability to perform direct optical coupling over the entire read area of the gel and hyperspectrographic separation and detection of the fluorescence emission. The machine has no moving parts. All the major components incorporated in this prototype are all currently available "off the shelf," thus reducing equipment development time and decreasing costs. Software developed for data acquisition, analysis, and conversion to other standard formats facilitates compatibility.

  14. Digital image sequence processing, compression, and analysis

    Reed, Todd R

    2004-01-01

    IntroductionTodd R. ReedCONTENT-BASED IMAGE SEQUENCE REPRESENTATIONPedro M. Q. Aguiar, Radu S. Jasinschi, José M. F. Moura, andCharnchai PluempitiwiriyawejTHE COMPUTATION OF MOTIONChristoph Stiller, Sören Kammel, Jan Horn, and Thao DangMOTION ANALYSIS AND DISPLACEMENT ESTIMATION IN THE FREQUENCY DOMAINLuca Lucchese and Guido Maria CortelazzoQUALITY OF SERVICE ASSESSMENT IN NEW GENERATION WIRELESS VIDEO COMMUNICATIONSGaetano GiuntaERROR CONCEALMENT IN DIGITAL VIDEOFrancesco G.B. De NataleIMAGE SEQUENCE RESTORATION: A WIDER PERSPECTIVEAnil KokaramVIDEO SUMMARIZATIONCuneyt M. Taskiran and Edward

  15. Rates of Convergence of Recursively Defined Sequences

    Lambov, Branimir Zdravkov

    2005-01-01

    This paper gives a generalization of a result by Matiyasevich which gives explicit rates of convergence for monotone recursively defined sequences. The generalization is motivated by recent developments in fixed point theory and the search for applications of proof mining to the field. It relaxes...... the requirement for monotonicity to the form xn+1 ≤ (1+an)xn+bn where the parameter sequences have to be bounded in sum, and also provides means to treat computational errors. The paper also gives an example result, an application of proof mining to fixed point theory, that can be achieved by the...

  16. Comparative Amino Acid Sequences of Dengue Viruses

    Haishi, Shozo; TANAKA Mariko; Igarashi, Akira

    1990-01-01

    Amino acid (AA) sequences of 4 serotype of dengue viruses deduced from their nucleotide (nt) sequences of genomic RNA were analyzed for each genome segment and each stretch of 10 AA residues. Precursor of membrane protein (pM), and 4 nonstructural proteins (NS1, NS3, NS4B, NS5) were highly conserved, while another nonstructural protein (NS2A) was least conserved among 5 strains of dengue viruses. When homology was compared among heterotypic viruses, type 1 and type 3 dengue viruses showed clo...

  17. IRF6 Sequencing in Interrupted Clefting.

    Cuddapah, Sanmati R; Kominek, Selma; Grant, John H; Robin, Nathaniel H

    2016-05-01

    In a retrospective review of patients seen at the University of Alabama at Birmingham Cleft and Craniofacial Center, four patients with rare interrupted clefting were identified who had undergone genetic testing. Each of these patients had a typical cleft lip, with intact hard palate and cleft of the soft palate. Given this picture of mixed clefting, IRF6 sequencing was done and was negative for mutations in all four patients. As genetic testing for single-gene mutations and exome sequencing become clinically available, it may be possible to identify novel mutations responsible for this previously unreported type of interrupted clefting. PMID:26090788

  18. Multiple Sequence Alignment Based on Chaotic PSO

    Lei, Xiu-Juan; Sun, Jing-Jing; Ma, Qian-Zhi

    This paper introduces a new improved algorithm called chaotic PSO (CPSO) based on the thought of chaos optimization to solve multiple sequence alignment. For one thing, the chaotic variables are generated between 0 and 1 when initializing the population so that the particles are distributed uniformly in the solution space. For another thing, the chaotic sequences are generated using the Logistic mapping function in order to make chaotic search and strengthen the diversity of the population. The simulation results of several benchmark data sets of BAliBase show that the improved algorithm is effective and has good performances for the data sets with different similarity.

  19. Automated Testing with Targeted Event Sequence Generation

    Jensen, Casper Svenning; Prasad, Mukul R.; Møller, Anders

    2013-01-01

    Automated software testing aims to detect errors by producing test inputs that cover as much of the application source code as possible. Applications for mobile devices are typically event-driven, which raises the challenge of automatically producing event sequences that result in high coverage....... Some existing approaches use random or model-based testing that largely treats the application as a black box. Other approaches use symbolic execution, either starting from the entry points of the applications or on specific event sequences. A common limitation of the existing approaches is that they...

  20. Population genetic inference from genomic sequence variation

    Pool, John E.; Hellmann, Ines; Jeffrey D. Jensen; Nielsen, Rasmus

    2010-01-01

    Population genetics has evolved from a theory-driven field with little empirical data into a data-driven discipline in which genome-scale data sets test the limits of available models and computational analysis methods. In humans and a few model organisms, analyses of whole-genome sequence polymorphism data are currently under way. And in light of the falling costs of next-generation sequencing technologies, such studies will soon become common in many other organisms as well. Here, we assess...

  1. The computational linguistics of biological sequences

    Searls, D. [Univ. of Pennsylvania, Philadelphia, PA (United States)

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Protein sequences are analogous in many respects, particularly their folding behavior. Proteins have a much richer variety of interactions, but in theory the same linguistic principles could come to bear in describing dependencies between distant residues that arise by virtue of three-dimensional structure. This tutorial will concentrate on nucleic acid sequences.

  2. Applications of High Throughput Nucleotide Sequencing

    Waage, Johannes Eichler

    The recent advent of high throughput sequencing of nucleic acids (RNA and DNA) has vastly expanded research into the functional and structural biology of the genome of all living organisms (and even a few dead ones). With this enormous and exponential growth in biological data generation come......, focusing on oft encountered problems in data processing, such as quality assurance, mapping, normalization, visualization, and interpretation. Presented in the second part are scientific endeavors representing solutions to problems of two sub-genres of next generation sequencing. For the first flavor, RNA...

  3. Dimensions of Copeland-Erdos Sequences

    Gu, Xiaoyang; Lutz, Jack H.; Moser, Philippe

    2005-01-01

    The base-$k$ {\\em Copeland-Erd\\"os sequence} given by an infinite set $A$ of positive integers is the infinite sequence $\\CE_k(A)$ formed by concatenating the base-$k$ representations of the elements of $A$ in numerical order. This paper concerns the following four quantities. The {\\em finite-state dimension} $\\dimfs (\\CE_k(A))$, a finite-state version of classical Hausdorff dimension introduced in 2001. The {\\em finite-state strong dimension} $\\Dimfs(\\CE_k(A))$, a finite-state version of cla...

  4. Final sonorant sequences in the Celje dialect

    Alja Ferme

    2015-08-01

    Full Text Available In this paper I will analyse final sonorant sequencesin the Celje variety of Slovene. In §2 various definitions of a consonant cluster will be discussed and the definition needed for further development ofthe article will be provided. In §3 I will present pretheoretical arguments against treating all final sonorant sequences as consonant clusters. In addition, a seemingly special behaviour of a small group of sequences will be pointed out. The government phonology framework will be introduced in §4. In §5 the hin the given theoretical framework.

  5. On colorings of bivariate random sequences

    Matúš, František; Kupsa, Michal

    Piscataway: IEEE, 2010, s. 1272-1276. ISBN 978-1-4244-7892-7. [IEEE International Symposium on Information Theory 2010. Austin (US), 13.06.2010-18.06.2010] R&D Projects: GA AV ČR IAA100750603; GA AV ČR KJB100750901; GA ČR GA201/08/0539 Institutional research plan: CEZ:AV0Z10750506 Keywords : colorings * ergodic sequences * entropy rate * asymptotic equipartition property Subject RIV: BD - Theory of Information http://library.utia.cas.cz/separaty/2010/MTR/matus-on colorings of bivariate random sequences.pdf

  6. Linear recurring sequences over rings and modules

    Kurakin, V.L.; Kuzmin, A.S.; Mikhalev, A.V. [and others

    1995-10-15

    Here we present some fundamental concept and results of the theory of linear recurring sequences over rings and modules and their applications. Of course, the authors give in more detail those results that are close to their mathematical interests. In particular, an attempt has been made to construct a general algebraic theory of k-LRS over modules, paying explicit attention to periodic k-sequences, to properties of linear recurrences over finite rings and especially over Galois rings, and also to methods of constructing codes baed on such recurrences.

  7. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

    Martin, Andrew C R

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/. PMID:25653836

  8. Characterization of minisatellites in Arabidopsis thaliana with sequence similarity to the human minisatellite core sequence.

    Tourmente, S; Deragon, J M; Lafleuriel, J; Tutois, S; Pélissier, T; Cuvillier, C; Espagnol, M C; Picard, G

    1994-08-25

    A strategy based on random PCR amplification was used to isolate new repetitive elements of Arabidopsis thaliana. One of the random PCR product analyzed by this approach contained a tandem repetitive minisatellite sequence composed of 33 bp repeated units. The genomic locus corresponding to this PCR product was isolated by screening a lambda genomic library. New related loci were also isolated from the genomic library by screening with a 14 mer oligonucleotide representing a region conserved among the different repeated units. Alignment of the consensus sequence for each minisatellite locus allowed the definition of an Arabidopsis thaliana core sequence that shows strong sequence similarities with the human core sequence and with the generalized recombination signal Chi of Escherichia coli. The minisatellites were tested for their ability to detect polymorphism, and their chromosomal position was established. PMID:8078766

  9. The Equilibrium Thermodynamics of Various Peptide Sequences

    Yaşar, Fatih

    The equilibrium thermodynamic properties of two peptide sequences of β-casein in the α-helix regions were studied by three-dimensional molecular modeling in vacuum. All the three-dimensional conformations of each peptide sequences were obtained by multicanonical simulations using ECEPP/2 force field and each simulation was started from completely random initial conformation. No a-priori information about ground-state is used in the simulations. In the present study, we calculated the average values of total energy, specific heat, fourth-order cumulant for two peptide sequences of β-casein as a function of temperature. We observed that the specific heat shows two peaks as a function of temperature for both peptides. Because our sequences have highly helical structure and two peaks in the specific heat, we have also studied the helix-coil transitions to determine these peaks. Our data indeed show these peptides have highly helical structure and better agreement with the results of spectroscopic techniques and other prediction methods.

  10. Crop Sequence Economics in Dynamic Cropping Systems

    No-till production systems allow more intensified and diversified production in the northern Great Plains; however, this has increased the need for information on improving economic returns through crop sequence selection. Field research was conducted 6 km southwest of Mandan ND to determine the inf...

  11. Sequence Classification: 892357 [TMBETA-GENOME[Archive

    Full Text Available Non-TMB Non-TMH Non-TMB Non-TMB Non-TMB Non-TMB >gi|6322971|ref|NP_013043.1| Fe(II)-dependent su ... dioxygenase, involved in sulfonate catabolism for use ... as a sulfur source, contains sequence that closely ...

  12. Asymptotic behaviour of firmly non expansive sequences

    We introduce the notion of firmly non expansive sequences in a Banach space and present several results concerning their asymptotic behaviour extending previous results and giving an affirmative answer to an open question raised by S. Reich and I. Shafir. Applications to averaged mappings are also given. (author). 16 refs

  13. Symmetric and Transitive Registration of Image Sequences

    Oskar Škrinjar

    2008-01-01

    Full Text Available This paper presents a method for constructing symmetric and transitive algorithms for registration of image sequences from image registration algorithms that do not have these two properties. The method is applicable to both rigid and nonrigid registration and it can be used with linear or periodic image sequences. The symmetry and transitivity properties are satisfied exactly (up to the machine precision, that is, they always hold regardless of the image type, quality, and the registration algorithm as long as the computed transformations are invertable. These two properties are especially important in motion tracking applications since physically incorrect deformations might be obtained if the registration algorithm is not symmetric and transitive. The method was tested on two sequences of cardiac magnetic resonance images using two different nonrigid image registration algorithms. It was demonstrated that the transitivity and symmetry errors of the symmetric and transitive modification of the algorithms could be made arbitrary small when the computed transformations are invertable, whereas the corresponding errors for the nonmodified algorithms were on the order of the pixel size. Furthermore, the symmetric and transitive modification of the algorithms had higher registration accuracy than the nonmodified algorithms for both image sequences.

  14. Novel overlapping coding sequences in Chlamydia trachomatis

    Jensen, Klaus Thorleif; Petersen, Lise; Falk, Søren;

    2006-01-01

    Chlamydia trachomatis is the aetiological agent of trachoma and sexually transmitted infections. The C. trachomatis genome sequence revealed an organism adapted to the intracellular habitat with a high coding ratio and a small genome consisting of 1.042-kilobase (kb) with 895 annotated protein...

  15. Zero-Till Crop Sequence Economics

    No-till production systems allow more intensified and diversified production in the northern Great Plains; however, this has increased the need for information on improving economic returns through crop sequence selection. Field research was conducted near Mandan ND to determine the influences of pr...

  16. Time sequence photography of Roosters Comb

    The importance of understanding natural landscape changes is key in properly determining rangeland ecology. Time sequence photography allows a landscape snapshot to be documented and enables the ability to compare natural changes overtime. Photographs of Roosters Comb were taken from the same vantag...

  17. Chromosome number9 specific repetitive DNA sequence

    Human repetitive DNA libraries have been constructed and various recombinant DNA clones isolated that are likely candidates for chromosome specific sequences. The first clone tested (pHuR 98; plasmid human repeat 98) was biotinylated and hybridized to human chromosomes in situ. The hybridized recombinant probe was detected with fluoresceinated avidin, and chromosomes were counter-stained with either propidium iodide or distamycin-DAPI. Specific hybridization to chromosome band 9q1 was obtained. The localization was confirmed by hybridizing radiolabeled pHuR 98 DNA to human chromosomes sorted by flow cytometry. Various methods, including orthogonal field pulsed gel electrophoresis analysis indicate that 75 kilobase blocks of this sequence are interspersed with other repetitive DNA sequences in this chromosome band. This study is the first to report a human repetitive DNA sequence uniquely localized to a specific chromosome. This clone provides an easily detected and highly specific chromosomal marker for molecular cytogenetic analyses in numerous basic research and clinical studies

  18. On VC-density over indiscernible sequences

    Guingona, Vincent; Hill, Cameron Donnay

    2011-01-01

    In this paper, we study VC-density over indiscernible sequences (denoted VC_ind-density). We answer an open question in [1], showing that VC_ind-density is always integer valued. We also show that VC_ind-density and dp-rank coincide in the natural way.

  19. Output-Sensitive Pattern Extraction in Sequences

    Grossi, Roberto; Menconi, Giulia; Pisanti, Nadia;

    2014-01-01

    Genomic Analysis, Plagiarism Detection, Data Mining, Intrusion Detection, Spam Fighting and Time Series Analysis are just some examples of applications where extraction of recurring patterns in sequences of objects is one of the main computational challenges. Several notions of patterns exist, and...

  20. Flexible, Mastery-Oriented Astrophysics Sequence.

    Zeilik, Michael, II

    1981-01-01

    Describes the implementation and impact of a two-semester mastery-oriented astrophysics sequence for upper-level physics/astrophysics majors designed to handle flexibly a wide range of student backgrounds. A Personalized System of Instruction (PSI) format was used fostering frequent student-instructor interaction and role-modeling behavior in…

  1. Ab initio gene identification in metagenomic sequences.

    Zhu, Wenhan; Lomsadze, Alexandre; Borodovsky, Mark

    2010-07-01

    We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from dependencies, formed in evolution, between frequencies of oligonucleotides in protein-coding regions and genome nucleotide composition. Original version of the method was proposed in 1999 and has been used since for (i) reconstructing codon frequency vector needed for gene finding in viral genomes and (ii) initializing parameters of self-training gene finding algorithms. With advent of new prokaryotic genomes en masse it became possible to enhance the original approach by using direct polynomial and logistic approximations of oligonucleotide frequencies, as well as by separating models for bacteria and archaea. These advances have increased the accuracy of model reconstruction and, subsequently, gene prediction. We describe the refined method and assess its accuracy on known prokaryotic genomes split into short sequences. Also, we show that as a result of application of the new method, several thousands of new genes could be added to existing annotations of several human and mouse gut metagenomes. PMID:20403810

  2. Sorghum genome sequencing by methylation filtration.

    Joseph A Bedell

    2005-01-01

    Full Text Available Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis.

  3. Linearly recursive sequences and Dynkin diagrams

    Reutenauer, Christophe

    2012-01-01

    Motivated by a construction in the theory of cluster algebras (Fomin and Zelevinsky), one associates to each acyclic directed graph a family of sequences of natural integers, one for each vertex; this construction is called a {\\em frieze}; these sequences are given by nonlinear recursions (with division), and the fact that they are integers is a consequence of the Laurent phenomenon of Fomin and Zelevinsky. If the sequences satisfy a linear recursion with constant coefficients, then the graph must be a Dynkin diagram or an extended Dynkin diagram, with an acyclic orientation. The converse also holds: the sequences of the frieze associated to an oriented Dynkin or Euclidean diagram satisfy linear recursions, and are even $\\mathbb N$-rational. One uses in the proof objects called $SL_2$-{\\em tilings of the plane}, which are fillings of the discrete plane such that each adjacent 2 by 2 minor is equal to 1. These objects, which have applications in the theory of cluster algebras, are interesting for themselves. S...

  4. Galaxy LIMS for next-generation sequencing

    Scholtalbers, J.; Rossler, J.; Sorn, P.; Graaf, J. de; Boisguerin, V.; Castle, J.; Sahin, U.

    2013-01-01

    SUMMARY: We have developed a laboratory information management system (LIMS) for a next-generation sequencing (NGS) laboratory within the existing Galaxy platform. The system provides lab technicians standard and customizable sample information forms, barcoded submission forms, tracking of input sam

  5. Artificial Intelligence Controls Tape-Recording Sequence

    Schwuttke, Ursula M.; Otamura, Roy M.; Zottarelli, Lawrence J.

    1989-01-01

    Developmental expert-system computer program intended to schedule recording of large amounts of data on limited amount of magnetic tape. Schedules recording using two sets of rules. First set incorporates knowledge of locations for recording of new data. Second set incorporates knowledge about issuing commands to recorder. Designed primarily for use on Voyager Spacecraft, also applicable to planning and sequencing in industry.

  6. Draft Genome Sequence of Lactobacillus rhamnosus 2166.

    Karlyshev, Andrey V.; Melnikov, Vyacheslav G.; Kosarev, Igor V.; Abramov, Vyacheslav M.

    2014-01-01

    In this report, we present a draft sequence of the genome of Lactobacillus rhamnosus strain 2166, a potential novel probiotic. Genome annotation and read mapping onto a reference genome of L. rhamnosus strain GG allowed for the identification of the differences and similarities in the genomic contents and gene arrangements of these strains.

  7. Sublinear Time Motif Discovery from Multiple Sequences

    Yunhui Fu

    2013-10-01

    Full Text Available In this paper, a natural probabilistic model for motif discovery has been used to experimentally test the quality of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet, Σ. A motif G = g1g2 ... gm is a string of m characters. In each background sequence is implanted a probabilistically-generated approximate copy of G. For a probabilistically-generated approximate copy b1b2 ... bm of G, every character, bi, is probabilistically generated, such that the probability for bi ≠ gi is at most α. We develop two new randomized algorithms and one new deterministic algorithm. They make advancements in the following aspects: (1 The algorithms are much faster than those before. Our algorithms can even run in sublinear time. (2 They can handle any motif pattern. (3 The restriction for the alphabet size is a lower bound of four. This gives them potential applications in practical problems, since gene sequences have an alphabet size of four. (4 All algorithms have rigorous proofs about their performances. The methods developed in this paper have been used in the software implementation. We observed some encouraging results that show improved performance for motif detection compared with other software.

  8. Hidden ribozymes in eukaryotic genome sequence

    Sean P Ryder

    2010-01-01

    The small self-cleaving ribozymes fold into complex tertiary structures to promote autocatalytic cleavage or ligation at a precise position within their sequence. Until recently, relatively few examples had been identified. Two papers now reveal that self-cleaving ribozymes are prevalent in eukaryotic genomes and, in some cases, might play a role in regulating gene expression.

  9. Deciphering the RNA landscape by RNAome sequencing

    K.W.J. Derks (Kasper); B. Misovic (Branislav); M.C.G.N. van den hout (Mirjam); C. Kockx (Christel); C.P. Gomez (Cesar Payan); R.W.W. Brouwer; H. Vrieling (Harry); J.H.J. Hoeijmakers (Jan); W.F.J. van IJcken (Wilfred); J. Pothof (Joris)

    2015-01-01

    textabstractCurrent RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species in an unperturbed manner. We report strand-specific RNAome sequencing that determines expression of small and large RNAs from rRNA-depleted total RNA in a sin

  10. Relations between Sequences and Selection Properties

    D. Djurčić

    2007-10-01

    Full Text Available We consider the set 𝕊 of sequences of positive real numbers and show that some subclasses of 𝕊 have certain nice selection and game theoretic properties.

  11. Sorghum genome sequencing by methylation filtration.

    Bedell, Joseph A; Budiman, Muhammad A; Nunberg, Andrew; Citek, Robert W; Robbins, Dan; Jones, Joshua; Flick, Elizabeth; Rholfing, Theresa; Fries, Jason; Bradford, Kourtney; McMenamy, Jennifer; Smith, Michael; Holeman, Heather; Roe, Bruce A; Wiley, Graham; Korf, Ian F; Rabinowicz, Pablo D; Lakey, Nathan; McCombie, W Richard; Jeddeloh, Jeffrey A; Martienssen, Robert A

    2005-01-01

    Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis. PMID:15660154

  12. What Makes a Protein Sequence a Prion?

    Sabate, Raimon; Rousseau, Frederic; Schymkowitz, Joost; Ventura, Salvador

    2015-01-01

    Typical amyloid diseases such as Alzheimer's and Parkinson's were thought to exclusively result from de novo aggregation, but recently it was shown that amyloids formed in one cell can cross-seed aggregation in other cells, following a prion-like mechanism. Despite the large experimental effort devoted to understanding the phenomenon of prion transmissibility, it is still poorly understood how this property is encoded in the primary sequence. In many cases, prion structural conversion is driven by the presence of relatively large glutamine/asparagine (Q/N) enriched segments. Several studies suggest that it is the amino acid composition of these regions rather than their specific sequence that accounts for their priogenicity. However, our analysis indicates that it is instead the presence and potency of specific short amyloid-prone sequences that occur within intrinsically disordered Q/N-rich regions that determine their prion behaviour, modulated by the structural and compositional context. This provides a basis for the accurate identification and evaluation of prion candidate sequences in proteomes in the context of a unified framework for amyloid formation and prion propagation. PMID:25569335

  13. The Ultramafites and layered Gabbro Sequences

    Oosterom, M.G.

    1963-01-01

    On Stjernöy, Seiland and the neighbouring peninsulas of Öksfjord and Bergsfjord ultramafic bodies of peridotite and pyroxenite with associated layered gabbro sequences occur within a complex of highly metamorphic gabbro gneisses, rocks akin to pyroxene-granulites and mafic charnockites. As is shown

  14. 32 CFR 179.7 - Sequencing.

    2010-07-01

    ... 32 National Defense 1 2010-07-01 2010-07-01 false Sequencing. 179.7 Section 179.7 National Defense Department of Defense OFFICE OF THE SECRETARY OF DEFENSE CLOSURES AND REALIGNMENT MUNITIONS RESPONSE SITE... evaluations based on MRS-specific data. (5) Reasonably anticipated future land use, especially when...

  15. Gamma and Related Functions Generalized for Sequences

    Ollerton, R. L.

    2008-01-01

    Given a sequence g[subscript k] greater than 0, the "g-factorial" product [big product][superscript k] [subscript i=1] g[subscript i] is extended from integer k to real x by generalizing properties of the gamma function [Gamma](x). The Euler-Mascheroni constant [gamma] and the beta and zeta functions are also generalized. Specific examples include…

  16. Using "Arabidopsis" Genetic Sequences to Teach Bioinformatics

    Zhang, Xiaorong

    2009-01-01

    This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR RLK) genetic…

  17. ZERO-TILL CROP SEQUENCE ECONOMICS

    No-till production systems allow more intensified and diversified production in the northern Great Plains; however, this has increased the need for information on improving economic returns through crop sequence selection. Field research was conducted near Mandan ND to determine the influences of pr...

  18. The Pizza Problem: A Solution with Sequences

    Shafer, Kathryn G.; Mast, Caleb J.

    2008-01-01

    This article addresses the issues of coaching and assessing. A preservice middle school teacher's unique solution to the Pizza problem was not what the professor expected. The student's solution strategy, based on sequences and a reinvention of Pascal's triangle, is explained in detail. (Contains 8 figures.)

  19. Gnathostome phylogenomics utilizing lungfish EST sequences.

    Hallström, Björn M; Janke, Axel

    2009-02-01

    The relationship between the Chondrichthyes (cartilaginous fishes), the Actinopterygii (ray-finned fishes), and the piscine Sarcopterygii (lobe-finned fishes) and how the Tetrapoda (four-limbed terrestrial vertebrates) are related to these has been a contentious issue for more than a century. A general consensus about the relationship of these vertebrate clades has gradually emerged among morphologists, but no molecular study has yet provided conclusive evidence for any specific hypothesis. In order to examine these relationships on the basis of more extensive sequence data, we have produced almost 1,000,000 bp of expressed sequence tags (ESTs) from the African marbled lungfish, Protopterus aethiopicus. This new data set yielded 771 transcribed nuclear sequences that had not been previously described. The lungfish EST sequences were combined with EST data from two cartilaginous fishes and whole genome data from an agnathan, four ray-finned fishes, and four tetrapods. Phylogenomic analysis of these data yielded, for the first time, significant maximum likelihood support for a traditional gnathostome tree with a split between the Chondrichthyes and remaining (bone) gnathostomes. Also, the sister group relationship between Dipnoi (lungfishes) and Tetrapoda received conclusive support. Previously proposed hypotheses, such as the monophyly of fishes, could be rejected significantly. The divergence time between lungfishes and tetrapods was estimated to 382-388 Ma by the current data set and six calibration points. PMID:19029191

  20. Whole genome sequences of four Brucella strains.

    Ding, Jiabo; Pan, Yuanlong; Jiang, Hai; Cheng, Junsheng; Liu, Taotao; Qin, Nan; Yang, Yi; Cui, Buyun; Chen, Chen; Liu, Cuihua; Mao, Kairong; Zhu, Baoli

    2011-07-01

    Brucella melitensis and Brucella suis are intracellular pathogens of livestock and humans. Here we report four genome sequences, those of the virulent strain B. melitensis M28-12 and vaccine strains B. melitensis M5 and M111 and B. suis S2, which show different virulences and pathogenicities, which will help to design a more effective brucellosis vaccine. PMID:21602346