WorldWideScience

Sample records for abrf edman sequencing

  1. High-throughput sequencing of peptoids and peptide-peptoid hybrids by partial edman degradation and mass spectrometry.

    Science.gov (United States)

    Thakkar, Amit; Cohen, Allison S; Connolly, Michael D; Zuckermann, Ronald N; Pei, Dehua

    2009-03-09

    A method for the rapid sequence determination of peptoids [oligo(N-substituted glycines)] and peptide-peptoid hybrids selected from one-bead-one-compound combinatorial libraries has been developed. In this method, beads carrying unique peptoid (or peptide-peptoid) sequences were subjected to multiple cycles of partial Edman degradation (PED) by treatment with a 1:3 (mol/mol) mixture of phenyl isothiocyanate (PITC) and 9-fluorenylmethyl chloroformate (Fmoc-Cl) to generate a series of N-terminal truncation products for each resin-bound peptoid. After PED, the Fmoc group was removed from the N-terminus and any reacted side chains via piperidine treatment. The resulting mixture of the full-length peptoid and its truncation products was analyzed by matrix-assisted laser desorption ionization (MALDI) mass spectrometry, to reveal the sequence of the full-length peptoid. With a slight modification, the method was also effective in the sequence determination of peptide-peptoid hybrids. This rapid, high-throughput, sensitive, and inexpensive sequencing method should greatly expand the utility of combinatorial peptoid libraries in biomedical and materials research.

  2. Detection of DBD-carbamoyl amino acids in amino acid sequence and D/L configuration determination of peptides with fluorogenic Edman reagent 7-[(N,N-dimethylamino)sulfonyl]-2,1,3-benzoxadiazol-4-yl isothiocyanate.

    Science.gov (United States)

    Huang, Y; Matsunaga, H; Toriba, A; Santa, T; Fukushima, T; Imai, K

    1999-06-01

    A method for amino acid sequence and D/L configuration identification of peptides by using fluorogenic Edman reagent 7-[(N, N-dimethylamino)sulfonyl]-2,1,3-benzoxadiazol-4-yl isothiocyanate (DBD-NCS) has been developed. This method was based on the Edman degradation principle with some modifications. A peptide or protein was coupled with DBD-NCS under basic conditions and then cyclized/cleaved to produce DBD-thiazolinone (TZ) derivative by BF3, a Lewis acid, which could significantly suppress the amino acid racemization. The liberated DBD-TZ amino acid was hydrolyzed to DBD-thiocarbamoyl (TC) amino acid under a weakly acidic condition and then oxidized by NaNO2/H+ to DBD-carbamoyl (CA) amino acid which was a stable and had a strong fluorescence intensity. The individual DBD-CA amino acids were separated on a reversed-phase high-performance liquid chromatography (RP-HPLC) for amino acid sequencing and their enantiomers were resolved on a chiral stationary-phase HPLC for identifying their D/L configurations. Combination of the two HPLC systems, the amino acid sequence and D/L configuration of peptides could be determined. This method will be useful for searching D-amino-acid-containing peptides in animals.

  3. A photothermally responsive nanoprobe for bioimaging based on Edman degradation

    Science.gov (United States)

    Liu, Yi; Wang, Zhantong; Zhang, Huimin; Lang, Lixin; Ma, Ying; He, Qianjun; Lu, Nan; Huang, Peng; Liu, Yijing; Song, Jibin; Liu, Zhibo; Gao, Shi; Ma, Qingjie; Kiesewetter, Dale O.; Chen, Xiaoyuan

    2016-05-01

    A new type of photothermally responsive nanoprobe based on Edman degradation has been synthesized and characterized. Under irradiation by an 808 nm laser, the heat generated by the gold nanorod core breaks the thiocarbamide structure and releases the fluorescent dye Cy5.5 with increased near-infrared (NIR) fluorescence under mild acidic conditions. This RGD modified nanoprobe is capable of fluorescence imaging of ανβ3 over-expressing U87MG cells in vitro and in vivo. This Edman degradation-based nanoprobe provides a novel strategy to design activatable probes for biomedical imaging and drug/gene delivery.A new type of photothermally responsive nanoprobe based on Edman degradation has been synthesized and characterized. Under irradiation by an 808 nm laser, the heat generated by the gold nanorod core breaks the thiocarbamide structure and releases the fluorescent dye Cy5.5 with increased near-infrared (NIR) fluorescence under mild acidic conditions. This RGD modified nanoprobe is capable of fluorescence imaging of ανβ3 over-expressing U87MG cells in vitro and in vivo. This Edman degradation-based nanoprobe provides a novel strategy to design activatable probes for biomedical imaging and drug/gene delivery. Electronic supplementary information (ESI) available: HPLC, MS and 1H NMR spectrum. See DOI: 10.1039/c6nr01400c

  4. Interlaboratory Study on Differential Analysis of Protein Glycosylation by Mass Spectrometry: The ABRF Glycoprotein Research Multi-Institutional Study 2012*

    Science.gov (United States)

    Leymarie, Nancy; Griffin, Paula J.; Jonscher, Karen; Kolarich, Daniel; Orlando, Ron; McComb, Mark; Zaia, Joseph; Aguilan, Jennifer; Alley, William R.; Altmann, Friederich; Ball, Lauren E.; Basumallick, Lipika; Bazemore-Walker, Carthene R.; Behnken, Henning; Blank, Michael A.; Brown, Kristy J.; Bunz, Svenja-Catharina; Cairo, Christopher W.; Cipollo, John F.; Daneshfar, Rambod; Desaire, Heather; Drake, Richard R.; Go, Eden P.; Goldman, Radoslav; Gruber, Clemens; Halim, Adnan; Hathout, Yetrib; Hensbergen, Paul J.; Horn, David M.; Hurum, Deanna; Jabs, Wolfgang; Larson, Göran; Ly, Mellisa; Mann, Benjamin F.; Marx, Kristina; Mechref, Yehia; Meyer, Bernd; Möginger, Uwe; Neusüβ, Christian; Nilsson, Jonas; Novotny, Milos V.; Nyalwidhe, Julius O.; Packer, Nicolle H.; Pompach, Petr; Reiz, Bela; Resemann, Anja; Rohrer, Jeffrey S.; Ruthenbeck, Alexandra; Sanda, Miloslav; Schulz, Jan Mirco; Schweiger-Hufnagel, Ulrike; Sihlbom, Carina; Song, Ehwang; Staples, Gregory O.; Suckau, Detlev; Tang, Haixu; Thaysen-Andersen, Morten; Viner, Rosa I.; An, Yanming; Valmu, Leena; Wada, Yoshinao; Watson, Megan; Windwarder, Markus; Whittal, Randy; Wuhrer, Manfred; Zhu, Yiying; Zou, Chunxia

    2013-01-01

    One of the principal goals of glycoprotein research is to correlate glycan structure and function. Such correlation is necessary in order for one to understand the mechanisms whereby glycoprotein structure elaborates the functions of myriad proteins. The accurate comparison of glycoforms and quantification of glycosites are essential steps in this direction. Mass spectrometry has emerged as a powerful analytical technique in the field of glycoprotein characterization. Its sensitivity, high dynamic range, and mass accuracy provide both quantitative and sequence/structural information. As part of the 2012 ABRF Glycoprotein Research Group study, we explored the use of mass spectrometry and ancillary methodologies to characterize the glycoforms of two sources of human prostate specific antigen (PSA). PSA is used as a tumor marker for prostate cancer, with increasing blood levels used to distinguish between normal and cancer states. The glycans on PSA are believed to be biantennary N-linked, and it has been observed that prostate cancer tissues and cell lines contain more antennae than their benign counterparts. Thus, the ability to quantify differences in glycosylation associated with cancer has the potential to positively impact the use of PSA as a biomarker. We studied standard peptide-based proteomics/glycomics methodologies, including LC-MS/MS for peptide/glycopeptide sequencing and label-free approaches for differential quantification. We performed an interlaboratory study to determine the ability of different laboratories to correctly characterize the differences between glycoforms from two different sources using mass spectrometry methods. We used clustering analysis and ancillary statistical data treatment on the data sets submitted by participating laboratories to obtain a consensus of the glycoforms and abundances. The results demonstrate the relative strengths and weaknesses of top-down glycoproteomics, bottom-up glycoproteomics, and glycomics methods. PMID

  5. The 2012/2013 ABRF Proteomic Research Group Study: Assessing Longitudinal Intralaboratory Variability in Routine Peptide Liquid Chromatography Tandem Mass Spectrometry Analyses.

    Science.gov (United States)

    Bennett, Keiryn L; Wang, Xia; Bystrom, Cory E; Chambers, Matthew C; Andacht, Tracy M; Dangott, Larry J; Elortza, Félix; Leszyk, John; Molina, Henrik; Moritz, Robert L; Phinney, Brett S; Thompson, J Will; Bunger, Maureen K; Tabb, David L

    2015-12-01

    Questions concerning longitudinal data quality and reproducibility of proteomic laboratories spurred the Protein Research Group of the Association of Biomolecular Resource Facilities (ABRF-PRG) to design a study to systematically assess the reproducibility of proteomic laboratories over an extended period of time. Developed as an open study, initially 64 participants were recruited from the broader mass spectrometry community to analyze provided aliquots of a six bovine protein tryptic digest mixture every month for a period of nine months. Data were uploaded to a central repository, and the operators answered an accompanying survey. Ultimately, 45 laboratories submitted a minimum of eight LC-MSMS raw data files collected in data-dependent acquisition (DDA) mode. No standard operating procedures were enforced; rather the participants were encouraged to analyze the samples according to usual practices in the laboratory. Unlike previous studies, this investigation was not designed to compare laboratories or instrument configuration, but rather to assess the temporal intralaboratory reproducibility. The outcome of the study was reassuring with 80% of the participating laboratories performing analyses at a medium to high level of reproducibility and quality over the 9-month period. For the groups that had one or more outlying experiments, the major contributing factor that correlated to the survey data was the performance of preventative maintenance prior to the LC-MSMS analyses. Thus, the Protein Research Group of the Association of Biomolecular Resource Facilities recommends that laboratories closely scrutinize the quality control data following such events. Additionally, improved quality control recording is imperative. This longitudinal study provides evidence that mass spectrometry-based proteomics is reproducible. When quality control measures are strictly adhered to, such reproducibility is comparable among many disparate groups. Data from the study are

  6. Localization of an O-glycosylated site in the recombinant barley alpha-amylase 1 produced in yeast and correction of the amino acid sequence using matrix-assisted laser desorption/ionization mass spectrometry of peptide mixtures

    DEFF Research Database (Denmark)

    Andersen, Jens S.; Søgaard, M; Svensson, B

    1994-01-01

    , and analyzed directly by MALDI-MS. Based on the three mass spectrometric peptide maps, an error in the sequence deduced from cDNA, resulting in a mass difference of 28 Da, was located to a sequence stretch of 5 amino acid residues; furthermore, a dihexose substituent was identified on Thr410. Subsequent Edman...

  7. Amino acid sequence of versutoxin, a lethal neurotoxin from the venom of the funnel-web spider Atrax versutus.

    Science.gov (United States)

    Brown, M R; Sheumack, D D; Tyler, M I; Howden, M E

    1988-03-01

    The complete amino acid sequence of versutoxin, a lethal neurotoxic polypeptide isolated from the venom of male and female funnel-web spiders of the species Atrax versutus, was determined. Sequencing was performed in a gas-phase protein sequencer by automated Edman degradation of the S-carboxymethylated toxin and fragments of it produced by reaction with CNBr. Versutoxin consisted of a single chain of 42 amino acid residues. It was found to have a high proportion of basic residues and of cystine. The primary structure showed marked homology with that of robustoxin, a novel neurotoxin recently isolated from the venom of another funnel-web-spider species, Atrax robustus.

  8. Amino acid sequences of neuropeptides in the sinus gland of the land crab Cardisoma carnifex: a novel neuropeptide proteolysis site.

    Science.gov (United States)

    Newcomb, R W

    1987-08-01

    The sinus gland is a major neurosecretory structure in Crustacea. Five peptides, labeled C, D, E, F, and I, isolated from the sinus gland of the land crab have been hypothesized to arise from the incomplete proteolysis at two internal sites on a single biosynthetic intermediate peptide "H", based on amino acid composition additivities and pulse-chase radiolabeling studies. The presence of only a single major precursor for the sinus gland peptides implies that peptide H may be synthesized on a common precursor with crustacean hyperglycemic hormone forms, "J" and "L," and a peptide, "K," similar to peptides with molt inhibiting activity. Here I report amino acid sequences of these peptides. The amino terminal sequence of the parent peptide, H, (and the homologous fragments) proved refractory to Edman degradation. Data from amino acid analysis and carboxypeptidase digestion of the naturally occurring fragments and of fragments produced by endopeptidase digestion were used together with Edman degradation to obtain the sequences. Amino acid analysis of fragments of the naturally occurring "overlap" peptides (those produced by internal cleavage at one site on H) was used to obtain the sequences across the cleavage sites. The amino acid sequence of the land crab peptide H is Arg-Ser-Ala-Asp-Gly-Phe-Gly-Arg-Met-Glu-Ser-Leu-Leu-Thr-Ser-Leu-Arg-Gly- Ser-Ala-Glu- Ser-Pro-Ala-Ala-Leu-Gly-Glu-Ala-Ser-Ala-Ala-His-Pro-Leu-Glu. In vivo cleavage at one site involves excision of arginine from the sequence Leu-Arg-Gly, whereas cleavage at the other site involves excision of serine from the sequence Glu-Ser-Leu. Proteolysis at the latter sequence has not been previously reported in intact secretory granules. The aspartate at position 4 is possibly covalently modified.

  9. Amino acid sequence of myoglobin from emu (Dromaius novaehollandiae) skeletal muscle.

    Science.gov (United States)

    Suman, S P; Joseph, P; Li, S; Beach, C M; Fontaine, M; Steinke, L

    2010-11-01

    The objective of the present study was to characterize the primary structure of emu myoglobin (Mb). Emu Mb was isolated from Iliofibularis muscle employing gel-filtration chromatography. Matrix Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry was employed to determine the exact molecular mass of emu Mb in comparison with horse Mb, and Edman degradation was utilized to characterize the amino acid sequence. The molecular mass of emu Mb was 17,380 Da and was close to those reported for ratite and poultry myoglobins. Similar to myoglobins from meat-producing livestock and birds, emu Mb has 153 amino acids. Emu Mb contains 9 histidines. Proximal and distal histidines, responsible for coordinating oxygen-binding property of Mb, are conserved in emu. Emu Mb shared more than 90% homology with ratite and chicken myoglobins, whereas it demonstrated only less than 70% sequence similarity with ruminant myoglobins.

  10. The amino acid sequence of Escherichia coli cyanase.

    Science.gov (United States)

    Chin, C C; Anderson, P M; Wold, F

    1983-01-10

    The amino acid sequence of the enzyme cyanase (cyanate hydrolase) from Escherichia coli has been determined by automatic Edman degradation of the intact protein and of its component peptides. The primary peptides used in the sequencing were produced by cyanogen bromide cleavage at the methionine residues, yielding 4 peptides plus free homoserine from the NH2-terminal methionine, and by trypsin cleavage at the 7 arginine residues after acetylation of the lysines. Secondary peptides required for overlaps and COOH-terminal sequences were produced by chymotrypsin or clostripain cleavage of some of the larger peptides. The complete sequence of the cyanase subunit consists of 156 amino acid residues (Mr 16,350). Based on the observation that the cysteine-containing peptide is obtained as a disulfide-linked dimer, it is proposed that the covalent structure of cyanase is made up of two subunits linked by a disulfide bond between the single cystine residue in each subunit. The native enzyme (Mr 150,000) then appears to be a complex of four or five such subunit dimers.

  11. Investigation of the protein osteocalcin of Camelops hesternus: Sequence, structure and phylogenetic implications

    Science.gov (United States)

    Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.

    2007-12-01

    Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a

  12. The primary structure of the hemoglobin of Malayan sun bear (Helarctos malayanus, Carnivora) and structural comparison to other hemoglobin sequences.

    Science.gov (United States)

    Hofmann, O; Braunitzer, G; Göltenboth, R

    1987-05-01

    The complete primary structure of the alpha- and beta-chains of the hemoglobin of Malayan Sun Bear (Helarctos malayanus) is presented. After cleavage of the heme-protein link and chain separation by RP-HPLC, amino-acid sequences were determined by Edman degradation in liquid- and gas-phase sequenators. An interesting result of this work is the demonstration that the hemoglobin of Malayan Sun Bear is identical to the hemoglobins of Polar Bear (Ursus maritimus) and Asiatic Black Bear (Ursus tibetanus). The paper gives an updated table of identical hemoglobin chains from different species. This paper may be considered as a compilation of work on the genetic relationship of Pandas.

  13. Purification and N-terminal sequence of a serine proteinase-like protein (BMK-CBP) from the venom of the Chinese scorpion (Buthus martensii Karsch).

    Science.gov (United States)

    Gao, Rong; Zhang, Yong; Gopalakrishnakone, Ponnampalam

    2008-08-01

    A serine proteinase-like protein was isolated from the venom of Chinese red scorpion (Buthus martensii Karsch) by combination of gel filtration, ion-exchange and reveres-phase chromatography and named BMK-CBP. The apparent molecular weight of BMK-CBP was identified as 33 kDa by SDS-PAGE under non-reducing condition. The sequence of N-terminal 40 amino acids was obtained by Edman degradation. The sequence shows highest similarity to proteinase from insect source. When tested with commonly used substrates of proteinase, no significant hydrolytic activity was observed for BMK-CBP. The purified BMK-CBP was found to bind to the cancer cell line MCF-7 and the cell binding ability was dose-dependent.

  14. Automatic sequences

    CERN Document Server

    Haeseler, Friedrich

    2003-01-01

    Automatic sequences are sequences which are produced by a finite automaton. Although they are not random they may look as being random. They are complicated, in the sense of not being not ultimately periodic, they may look rather complicated, in the sense that it may not be easy to name the rule by which the sequence is generated, however there exists a rule which generates the sequence. The concept automatic sequences has special applications in algebra, number theory, finite automata and formal languages, combinatorics on words. The text deals with different aspects of automatic sequences, in particular:· a general introduction to automatic sequences· the basic (combinatorial) properties of automatic sequences· the algebraic approach to automatic sequences· geometric objects related to automatic sequences.

  15. Sequence analysis and location of capsid proteins within RNA 2 of strawberry latent ringspot virus.

    Science.gov (United States)

    Kreiah, S; Strunk, G; Cooper, J I

    1994-09-01

    The nucleotide sequence of the RNA 2 of a strawberry isolate (H) of strawberry latent ringspot virus (SLRSV) comprised 3824 nucleotides and contained one long open reading frame with a theoretical coding capacity of 890 amino acids equivalent to a protein of 98.8K. The N-terminal amino acid sequences of virion-derived proteins were determined by Edman degradation allowing the capsid coding regions to be located and serine/glycine cleavage sites to be identified within the polyprotein. The amino acid sequence in the capsid coding region of an isolate of SLRSV from flowering cherry in New Zealand was 97% identical to that of SLRSV-H. Except in the 3' and 5' terminal non-coding sequences, computer-based alignment and comparison algorithms did not reveal any substantial homologies between RNA 2 of SLRSV-H and the equivalent genomic segments in the nepoviruses arabis mosaic, cherry leaf roll, grapevine fanleaf, raspberry ringspot, grapevine hungarian chrome mosaic, tomato blackring, tomato ringspot, tobacco ringspot, or in the comoviruses cowpea mosaic and red clover mottle. Despite the similarities in overall genome organization, data from RNA 2 remain insufficient for unambiguous positioning of SLRSV in relation to species/genera in the Comoviridae.

  16. The complete amino acid sequence of a trypsin inhibitor from Bauhinia variegata var. candida seeds.

    Science.gov (United States)

    Di Ciero, L; Oliva, M L; Torquato, R; Köhler, P; Weder, J K; Camillo Novello, J; Sampaio, C A; Oliveira, B; Marangoni, S

    1998-11-01

    Trypsin inhibitors of two varieties of Bauhinia variegata seeds have been isolated and characterized. Bauhinia variegata candida trypsin inhibitor (BvcTI) and B. variegata lilac trypsin inhibitor (BvlTI) are proteins with Mr of about 20,000 without free sulfhydryl groups. Amino acid analysis shows a high content of aspartic acid, glutamic acid, serine, and glycine, and a low content of histidine, tyrosine, methionine, and lysine in both inhibitors. Isoelectric focusing for both varieties detected three isoforms (pI 4.85, 5.00, and 5.15), which were resolved by HPLC procedure. The trypsin inhibitors show Ki values of 6.9 and 1.2 nM for BvcTI and BvlTI, respectively. The N-terminal sequences of the three trypsin inhibitor isoforms from both varieties of Bauhinia variegata and the complete amino acid sequence of B. variegata var. candida L. trypsin inhibitor isoform 3 (BvcTI-3) are presented. The sequences have been determined by automated Edman degradation of the reduced and carboxymethylated proteins of the peptides resulting from Staphylococcus aureus protease and trypsin digestion. BvcTI-3 is composed of 167 residues and has a calculated molecular mass of 18,529. Homology studies with other trypsin inhibitors show that BvcTI-3 belongs to the Kunitz family. The putative active site encompasses Arg (63)-Ile (64).

  17. Complete amino acid sequence of the myoglobin from the Atlantic bottlenosed dolphin, Tursiops truncatus.

    Science.gov (United States)

    Jones, B N; Vigna, R A; Dwulet, F E; Bogardt, R A; Lehman, L D; Gurd, F R

    1976-10-05

    The complete amino acid sequence of the major component myoglobin from the Atlantic bottlenosed dolphin, Tursiops truncatus, was determined by specific cleavage of the protein to obtain large peptides that are readily degraded by the automatic sequencer. Three easily separable peptides were obtained by cleaving the protein with cyanogen bromide at the 2 methionine residues and 4 peptides were obtained by cleaving the methyl acetimidated protein with trypsin at the 3 arginine residues. By subjecting 4 of these peptides and the apomyoglobin to automatic Edman degradation, over 80% of the covalent structure of the protein was obtained. The remainder of the primary structure was determined by further digestion of the central cyanogen bromide peptide with trypsin and staphylococcal protease. This myoglobin differs from that of the sperm whale, Physter catodon, at 15 positions, from that of the California gray whale, Eschrichtius gibbosus, at 14 positions, from that of the common porpoise, Phocoena phocoena, at 6 positions, and from the myoglobin of the Black Sea dolphin, Delphinus delphis and the Amazon River dolphin, Inia goeffrensis, at 5 and 7 positions, respecitvely. All substitutions observed in this sequence fit easily into the tertiary structure of sperm whale myoglobin.

  18. Sequence assembly

    DEFF Research Database (Denmark)

    Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria

    2009-01-01

    Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and...... in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html....

  19. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  20. Isolation, characterization and cDNA sequencing of a Kazal family proteinase inhibitor from seminal plasma of turkey (Meleagris gallopavo).

    Science.gov (United States)

    Słowińska, Mariola; Olczak, Mariusz; Wojtczak, Mariola; Glogowski, Jan; Jankowski, Jan; Watorek, Wiesław; Amarowicz, Ryszard; Ciereszko, Andrzej

    2008-06-01

    The turkey reproductive tract and seminal plasma contain a serine proteinase inhibitor that seems to be unique for the reproductive tract. Our experimental objective was to isolate, characterize and cDNA sequence the Kazal family proteinase inhibitor from turkey seminal plasma and testis. Seminal plasma contains two forms of a Kazal family inhibitor: virgin (Ia) represented by an inhibitor of moderate electrophoretic migration rate (present also in the testis) and modified (Ib, a split peptide bond) represented by an inhibitor with a fast migration rate. The inhibitor from the seminal plasma was purified by affinity, ion-exchange and reverse phase chromatography. The testis inhibitor was purified by affinity and ion-exchange chromatography. N-terminal Edman sequencing of the two seminal plasma inhibitors and testis inhibitor were identical. This sequence was used to construct primers and obtain a cDNA sequence from the testis. Analysis of a cDNA sequence indicated that turkey proteinase inhibitor belongs to Kazal family inhibitors (pancreatic secretory trypsin inhibitors, mammalian acrosin inhibitors) and caltrin. The turkey seminal plasma Kazal inhibitor belongs to low molecular mass inhibitors and is characterized by a high value of the equilibrium association constant for inhibitor/trypsin complexes.

  1. Dna Sequencing

    Science.gov (United States)

    Tabor, Stanley; Richardson, Charles C.

    1995-04-25

    A method for sequencing a strand of DNA, including the steps off: providing the strand of DNA; annealing the strand with a primer able to hybridize to the strand to give an annealed mixture; incubating the mixture with four deoxyribonucleoside triphosphates, a DNA polymerase, and at least three deoxyribonucleoside triphosphates in different amounts, under conditions in favoring primer extension to form nucleic acid fragments complementory to the DNA to be sequenced; labelling the nucleic and fragments; separating them and determining the position of the deoxyribonucleoside triphosphates by differences in the intensity of the labels, thereby to determine the DNA sequence.

  2. Complete amino acid sequence of a Lolium perenne (perennial rye grass) pollen allergen, Lol p II.

    Science.gov (United States)

    Ansari, A A; Shenbagamurthi, P; Marsh, D G

    1989-07-05

    The complete amino acid sequence of a Lolium perenne (rye grass) pollen allergen, Lol p II was determined by automated Edman degradation of the protein and selected fragments. Cleavage of the protein by enzymatic and chemical techniques established an unambiguous sequence for the protein. Lol p II contains 97 amino acid residues, with a calculated molecular weight of 10,882. The protein lacks cysteine and glutamine and shows no evidence of glycosylation. Theoretical predictions by Fraga's (Fraga, S. (1982) Can. J. Chem. 60, 2606-2610) and Hopp and Woods' (Hopp, T. P., and Woods, K. R. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 3824-3828) methods indicate the presence of four hydrophilic regions, which may contribute to sequential or parts of conformational B-cell epitopes. Analysis of amphipathic regions by Berzofsky's method indicates the presence of a highly amphipathic region, which may contain, or contribute to, an Ia/T-cell epitope. This latter segment of Lol p II was found to be highly homologous with an antibody-binding segment of the major rye allergen Lol p I and may explain why immune responsiveness to both the allergens is associated with HLA-DR3.

  3. De novo sequencing of two novel peptides homologous to calcitonin-like peptides, from skin secretion of the Chinese Frog, Odorrana schmackeri

    Directory of Open Access Journals (Sweden)

    Geisa P.C. Evaristo

    2015-09-01

    Full Text Available An MS/MS based analytical strategy was followed to solve the complete sequence of two new peptides from frog (Odorrana schmackeri skin secretion. This involved reduction and alkylation with two different alkylating agents followed by high resolution tandem mass spectrometry. De novo sequencing was achieved by complementary CID and ETD fragmentations of full-length peptides and of selected tryptic fragments. Heavy and light isotope dimethyl labeling assisted with annotation of sequence ion series. The identified primary structures are GCD[I/L]STCATHN[I/L]VNE[I/L]NKFDKSKPSSGGVGPESP-NH2 and SCNLSTCATHNLVNELNKFDKSKPSSGGVGPESF-NH2, i.e. two carboxyamidated 34 residue peptides with an aminoterminal intramolecular ring structure formed by a disulfide bridge between Cys2 and Cys7. Edman degradation analysis of the second peptide positively confirmed the exact sequence, resolving I/L discriminations. Both peptide sequences are novel and share homology with calcitonin, calcitonin gene related peptide (CGRP and adrenomedullin from other vertebrates. Detailed sequence analysis as well as the 34 residue length of both O. schmackeri peptides, suggest they do not fully qualify as either calcitonins (32 residues or CGRPs (37 amino acids and may justify their classification in a novel peptide family within the calcitonin gene related peptide superfamily. Smooth muscle contractility assays with synthetic replicas of the S–S linked peptides on rat tail artery, uterus, bladder and ileum did not reveal myotropic activity.

  4. Amino acid sequence and disulfide bond assignment of myotoxin a isolated from the venom of prairie rattlesnake (Crotalus viridis viridis)

    Energy Technology Data Exchange (ETDEWEB)

    Fox, J.W.; Elzinga, M.; Tu, A.T.

    1979-02-20

    The primary structure of myotoxin a, a myotoxin protein from the venom of the North American rattlesnake Crotalus viridis viridis, was determined and the position of the disulfide bonds assigned. The toxin was isolated, carboxymethylated, and cleaved by cyanogen bromide, and the resultant peptides were isolated. The cyanogen bromide peptides were subjected to amino acid sequence analysis. In order to assign the positions of the three disulfide bonds, the native toxin was cleaved sequentially with cyanogen bromide and trypsin. A two peptide unit connected by one disulfide bond was isolated and characterized, and a three-peptide unit connected by two disulfide bonds was isolated. One peptide in the three-peptide unit was identified as Cys-Cys-Lys. In order to establish the linkages between the peptides and Cys-Cys-Lys, one cycle of Edman degradation was carried out such that the Cys-Cys bond was cleaved. Upon isolation and analysis of the cleavage products, the disulfide bonds connecting the three peptides were determined. The positions of the disulfide bridges of myotoxin a were determined to be totally different from those of neurotoxins isolated from snake venoms. The sequence of myotoxin a was compared with the sequences of other snake venom toxins using the computer program RELATE to determine whether myotoxin a is similar to any other types of toxins. From the computer analysis, myotoxin a did not show any close relationship to other toxins except crotamine from the South American rattlesnake Crotalus durissus terrificus.

  5. Main: Sequences [KOME

    Lifescience Database Archive (English)

    Full Text Available Sequences Nucleotide Sequence Nucleotide sequence of full length cDNA (trimmed sequence) kome_ine_full_seq...uence_db.fasta.zip kome_ine_full_sequence_db.zip kome_ine_full_sequence_db ...

  6. Rapid 'de novo' peptide sequencing by a combination of nanoelectrospray, isotopic labeling and a quadrupole/time-of-flight mass spectrometer.

    Science.gov (United States)

    Shevchenko, A; Chernushevich, I; Ens, W; Standing, K G; Thomson, B; Wilm, M; Mann, M

    1997-01-01

    Protein microanalysis usually involves the sequencing of gel-separated proteins available in very small amounts. While mass spectrometry has become the method of choice for identifying proteins in databases, in almost all laboratories 'de novo' protein sequencing is still performed by Edman degradation. Here we show that a combination of the nanoelectrospray ion source, isotopic end labeling of peptides and a quadrupole/ time-of-flight instrument allows facile read-out of the sequences of tryptic peptides. Isotopic labeling was performed by enzymatic digestion of proteins in 1:1 16O/18O water, eliminating the need for peptide derivatization. A quadrupole/time-of-flight mass spectrometer was constructed from a triple quadrupole and an electrospray time-of-flight instrument. Tandem mass spectra of peptides were obtained with better than 50 ppm mass accuracy and resolution routinely in excess of 5000. Unique and error tolerant identification of yeast proteins as well as the sequencing of a novel protein illustrate the potential of the approach. The high data quality in tandem mass spectra and the additional information provided by the isotopic end labeling of peptides enabled automated interpretation of the spectra via simple software algorithms. The technique demonstrated here removes one of the last obstacles to routine and high throughput protein sequencing by mass spectrometry.

  7. Complete primary structure of a Lolium perenne (perennial rye grass) pollen allergen, Lol p III: comparison with known Lol p I and II sequences.

    Science.gov (United States)

    Ansari, A A; Shenbagamurthi, P; Marsh, D G

    1989-10-17

    The complete amino acid sequence of a Lolium perenne (rye grass) pollen allergen, Lol p III, determined by the automated Edman degradation of the protein and its selected fragments, is reported in this paper. Cleavage by enzymatic and chemical techniques established unambiguously the sequence for this 97-residue protein (Mr = 10,909), which lacks cysteine and shows no evidence of glycosylation. The sequence of Lol p III is very similar to that of another L. perenne allergen, Lol p II, which was sequenced recently; of the 97 positions in the two proteins, 57 are occupied by identical amino acids (59% identity). In addition, both allergens share a similar structure with an antibody-binding fragment of a third L. perenne allergen, Lol p I. Since human antibody responsiveness to all these three allergens is associated with HLA-DR3, and since the structure common to the three molecules shows high degrees of amphipathicity in Lol p II and III, we speculate that this common segment in the three molecules might contain or contribute to the respectively Ia/T-cell sites.

  8. Novel proline-hydroxyproline glycopeptides from the dandelion (Taraxacum officinale Wigg.) flowers: de novo sequencing and biological activity.

    Science.gov (United States)

    Astafieva, Alexandra A; Enyenihi, Atim A; Rogozhin, Eugene A; Kozlov, Sergey A; Grishin, Eugene V; Odintsova, Tatyana I; Zubarev, Roman A; Egorov, Tsezi A

    2015-09-01

    Two novel homologous peptides named ToHyp1 and ToHyp2 that show no similarity to any known proteins were isolated from Taraxacum officinale Wigg. flowers by multidimensional liquid chromatography. Amino acid and mass spectrometry analyses demonstrated that the peptides have unusual structure: they are cysteine-free, proline-hydroxyproline-rich and post-translationally glycosylated by pentoses, with 5 carbohydrates in ToHyp2 and 10 in ToHyp1. The ToHyp2 peptide with a monoisotopic molecular mass of 4350.3Da was completely sequenced by a combination of Edman degradation and de novo sequencing via top down multistage collision induced dissociation (CID) and higher energy dissociation (HCD) tandem mass spectrometry (MS(n)). ToHyp2 consists of 35 amino acids, contains eighteen proline residues, of which 8 prolines are hydroxylated. The peptide displays antifungal activity and inhibits growth of Gram-positive and Gram-negative bacteria. We further showed that carbohydrate moieties have no significant impact on the peptide structure, but are important for antifungal activity although not absolutely necessary. The deglycosylated ToHyp2 peptide was less active against the susceptible fungus Bipolaris sorokiniana than the native peptide. Unique structural features of the ToHyp2 peptide place it into a new family of plant defense peptides. The discovery of ToHyp peptides in T. officinale flowers expands the repertoire of molecules of plant origin with practical applications.

  9. Classifying Genomic Sequences by Sequence Feature Analysis

    Institute of Scientific and Technical Information of China (English)

    Zhi-Hua Liu; Dian Jiao; Xiao Sun

    2005-01-01

    Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream,exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis.

  10. Main: Sequences [KOME

    Lifescience Database Archive (English)

    Full Text Available Sequences Amino Acid Sequence Amino Acid sequence of full length cDNA (Longest ORF) kome_ine_full_seq...uence_amino_db.fasta.zip kome_ine_full_sequence_amino_db.zip kome_ine_full_sequence_amino_db ...

  11. Shotgun protein sequencing.

    Energy Technology Data Exchange (ETDEWEB)

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  12. Isolation, amino acid sequence and biological activities of novel long-chain polyamine-associated peptide toxins from the sponge Axinyssa aculeata.

    Science.gov (United States)

    Matsunaga, Satoko; Jimbo, Mitsuru; Gill, Martin B; Wyhe, L Leanne Lash-Van; Murata, Michio; Nonomura, Ken'ichi; Swanson, Geoffrey T; Sakai, Ryuichi

    2011-09-19

    A novel family of functionalized peptide toxins, aculeines (ACUs), was isolated from the marine sponge Axinyssa aculeate. ACUs are polypeptides with N-terminal residues that are modified by the addition of long-chain polyamines (LCPA). Aculeines were present in the sponge extract as a complex mixture with differing polyamine chain lengths and peptide structures. ACU-A and B, which were purified in this study, share a common polypeptide chain but differ in their N-terminal residue modifications. The amino acid sequence of the polypeptide portion of ACU-A and B was deduced from 3' and 5' RACE, and supported by Edman degradation and mass spectral analysis of peptide fragments. ACU induced convulsions upon intracerebroventricular (i.c.v.) injection in mice, and disrupted neuronal membrane integrity in electrophysiological assays. ACU also lysed erythrocytes with a potency that differed between animal species. Here we describe the isolation, amino acid sequence, and biological activity of this new group of cytotoxic sponge peptides.

  13. Sequence Read Archive (SRA)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome...

  14. Multimodal sequence learning.

    Science.gov (United States)

    Kemény, Ferenc; Meier, Beat

    2016-02-01

    While sequence learning research models complex phenomena, previous studies have mostly focused on unimodal sequences. The goal of the current experiment is to put implicit sequence learning into a multimodal context: to test whether it can operate across different modalities. We used the Task Sequence Learning paradigm to test whether sequence learning varies across modalities, and whether participants are able to learn multimodal sequences. Our results show that implicit sequence learning is very similar regardless of the source modality. However, the presence of correlated task and response sequences was required for learning to take place. The experiment provides new evidence for implicit sequence learning of abstract conceptual representations. In general, the results suggest that correlated sequences are necessary for implicit sequence learning to occur. Moreover, they show that elements from different modalities can be automatically integrated into one unitary multimodal sequence.

  15. Coordinate cytokine regulatory sequences

    Science.gov (United States)

    Frazer, Kelly A.; Rubin, Edward M.; Loots, Gabriela G.

    2005-05-10

    The present invention provides CNS sequences that regulate the cytokine gene expression, expression cassettes and vectors comprising or lacking the CNS sequences, host cells and non-human transgenic animals comprising the CNS sequences or lacking the CNS sequences. The present invention also provides methods for identifying compounds that modulate the functions of CNS sequences as well as methods for diagnosing defects in the CNS sequences of patients.

  16. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  17. Production, purification, sequencing and activity spectra of mutacins D-123.1 and F-59.1

    Directory of Open Access Journals (Sweden)

    LaPointe Gisèle

    2011-04-01

    Full Text Available Abstract Background The increase in bacterial resistance to antibiotics impels the development of new anti-bacterial substances. Mutacins (bacteriocins are small antibacterial peptides produced by Streptococcus mutans showing activity against bacterial pathogens. The objective of the study was to produce and characterise additional mutacins in order to find new useful antibacterial substances. Results Mutacin F-59.1 was produced in liquid media by S. mutans 59.1 while production of mutacin D-123.1 by S. mutans 123.1 was obtained in semi-solid media. Mutacins were purified by hydrophobic chromatography. The amino acid sequences of the mutacins were obtained by Edman degradation and their molecular mass was determined by mass spectrometry. Mutacin F-59.1 consists of 25 amino acids, containing the YGNGV consensus sequence of pediocin-like bacteriocins with a molecular mass calculated at 2719 Da. Mutacin D-123.1 has an identical molecular mass (2364 Da with the same first 9 amino acids as mutacin I. Mutacins D-123.1 and F-59.1 have wide activity spectra inhibiting human and food-borne pathogens. The lantibiotic mutacin D-123.1 possesses a broader activity spectrum than mutacin F-59.1 against the bacterial strains tested. Conclusion Mutacin F-59.1 is the first pediocin-like bacteriocin identified and characterised that is produced by Streptococcus mutans. Mutacin D-123.1 appears to be identical to mutacin I previously identified in different strains of S. mutans.

  18. Contamination of sequence databases with adaptor sequences

    Energy Technology Data Exchange (ETDEWEB)

    Yoshikawa, Takeo; Sanders, A.R.; Detera-Wadleigh, S.D. [National Institute of Mental Health, Bethesda, MD (United States)

    1997-02-01

    Because of the exponential increase in the amount of DNA sequences being added to the public databases on a daily basis, it has become imperative to identify sources of contamination rapidly. Previously, contaminations of sequence databases have been reported to alert the scientific community to the problem. These contaminations can be divided into two categories. The first category comprises host sequences that have been difficult for submitters to manage or control. Examples include anomalous sequences derived from Escherichia coli, which are inserted into the chromosomes (and plasmids) of the bacterial hosts. Insertion sequences are highly mobile and are capable of transposing themselves into plasmids during cloning manipulation. Another example of the first category is the infection with yeast genomic DNA or with bacterial DNA of some commercially available cDNA libraries from Clontech. The second category of database contamination is due to the inadvertent inclusion of nonhost sequences. This category includes incorporation of cloning-vector sequences and multicloning sites in the database submission. M13-derived artifacts have been common, since M13-based vectors have been widely used for subcloning DNA fragments. Recognizing this problem, the National Center for Biotechnology Information (NCBI) started to screen, in April 1994, all sequences directly submitted to GenBank, against a set of vector data retrieved from GenBank by use of key-word searches, such as {open_quotes}vector.{close_quotes} In this report, we present evidence for another sequence artifact that is widespread but that, to our knowledge, has not yet been reported. 11 refs., 1 tab.

  19. Automated DNA Sequencing System

    Energy Technology Data Exchange (ETDEWEB)

    Armstrong, G.A.; Ekkebus, C.P.; Hauser, L.J.; Kress, R.L.; Mural, R.J.

    1999-04-25

    Oak Ridge National Laboratory (ORNL) is developing a core DNA sequencing facility to support biological research endeavors at ORNL and to conduct basic sequencing automation research. This facility is novel because its development is based on existing standard biology laboratory equipment; thus, the development process is of interest to the many small laboratories trying to use automation to control costs and increase throughput. Before automation, biology Laboratory personnel purified DNA, completed cycle sequencing, and prepared 96-well sample plates with commercially available hardware designed specifically for each step in the process. Following purification and thermal cycling, an automated sequencing machine was used for the sequencing. A technician handled all movement of the 96-well sample plates between machines. To automate the process, ORNL is adding a CRS Robotics A- 465 arm, ABI 377 sequencing machine, automated centrifuge, automated refrigerator, and possibly an automated SpeedVac. The entire system will be integrated with one central controller that will direct each machine and the robot. The goal of this system is to completely automate the sequencing procedure from bacterial cell samples through ready-to-be-sequenced DNA and ultimately to completed sequence. The system will be flexible and will accommodate different chemistries than existing automated sequencing lines. The system will be expanded in the future to include colony picking and/or actual sequencing. This discrete event, DNA sequencing system will demonstrate that smaller sequencing labs can achieve cost-effective the laboratory grow.

  20. sequenceMiner algorithm

    Data.gov (United States)

    National Aeronautics and Space Administration — Detecting and describing anomalies in large repositories of discrete symbol sequences. sequenceMiner has been open-sourced! Download the file below to try it out....

  1. DNA sequencing conference, 2

    Energy Technology Data Exchange (ETDEWEB)

    Cook-Deegan, R.M. [Georgetown Univ., Kennedy Inst. of Ethics, Washington, DC (United States); Venter, J.C. [National Inst. of Neurological Disorders and Strokes, Bethesda, MD (United States); Gilbert, W. [Harvard Univ., Cambridge, MA (United States); Mulligan, J. [Stanford Univ., CA (United States); Mansfield, B.K. [Oak Ridge National Lab., TN (United States)

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  2. Anomaly Detection in Sequences

    Data.gov (United States)

    National Aeronautics and Space Administration — We present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that...

  3. Roles of repetitive sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bell, G.I.

    1991-12-31

    The DNA of higher eukaryotes contains many repetitive sequences. The study of repetitive sequences is important, not only because many have important biological function, but also because they provide information on genome organization, evolution and dynamics. In this paper, I will first discuss some generic effects that repetitive sequences will have upon genome dynamics and evolution. In particular, it will be shown that repetitive sequences foster recombination among, and turnover of, the elements of a genome. I will then consider some examples of repetitive sequences, notably minisatellite sequences and telomere sequences as examples of tandem repeats, without and with respectively known function, and Alu sequences as an example of interspersed repeats. Some other examples will also be considered in less detail.

  4. Enhanced virome sequencing using targeted sequence capture.

    Science.gov (United States)

    Wylie, Todd N; Wylie, Kristine M; Herter, Brandi N; Storch, Gregory A

    2015-12-01

    Metagenomic shotgun sequencing (MSS) is an important tool for characterizing viral populations. It is culture independent, requires no a priori knowledge of the viruses in the sample, and may provide useful genomic information. However, MSS can lack sensitivity and may yield insufficient data for detailed analysis. We have created a targeted sequence capture panel, ViroCap, designed to enrich nucleic acid from DNA and RNA viruses from 34 families that infect vertebrate hosts. A computational approach condensed ∼1 billion bp of viral reference sequence into <200 million bp of unique, representative sequence suitable for targeted sequence capture. We compared the effectiveness of detecting viruses in standard MSS versus MSS following targeted sequence capture. First, we analyzed two sets of samples, one derived from samples submitted to a diagnostic virology laboratory and one derived from samples collected in a study of fever in children. We detected 14 and 18 viruses in the two sets, comprising 19 genera from 10 families, with dramatic enhancement of genome representation following capture enrichment. The median fold-increases in percentage viral reads post-capture were 674 and 296. Median breadth of coverage increased from 2.1% to 83.2% post-capture in the first set and from 2.0% to 75.6% in the second set. Next, we analyzed samples containing a set of diverse anellovirus sequences and demonstrated that ViroCap could be used to detect viral sequences with up to 58% variation from the references used to select capture probes. ViroCap substantially enhances MSS for a comprehensive set of viruses and has utility for research and clinical applications.

  5. DNA sequences encoding erythropoietin

    Energy Technology Data Exchange (ETDEWEB)

    Lin, F.K.

    1987-10-27

    A purified and isolated DNA sequence is described consisting essentially of a DNA sequence encoding a polypeptide having an amino acid sequence sufficiently duplicative of that of erythropoietin to allow possession of the biological property of causing bone marrow cells to increase production of reticulocytes and red blood cells, and to increase hemoglobin synthesis or iron uptake.

  6. Low autocorrelation binary sequences

    Science.gov (United States)

    Packebusch, Tom; Mertens, Stephan

    2016-04-01

    Binary sequences with minimal autocorrelations have applications in communication engineering, mathematics and computer science. In statistical physics they appear as groundstates of the Bernasconi model. Finding these sequences is a notoriously hard problem, that so far can be solved only by exhaustive search. We review recent algorithms and present a new algorithm that finds optimal sequences of length N in time O(N {1.73}N). We computed all optimal sequences for N≤slant 66 and all optimal skewsymmetric sequences for N≤slant 119.

  7. Repdigits in -Lucas Sequences

    Indian Academy of Sciences (India)

    Jhon J J Bravo; Florian Luca

    2014-05-01

    For an integer ≥ 2, let $(L_n^{(k)})_n$ be the -Lucas sequence which starts with $0,\\ldots,0,2,1$ ( terms) and each term afterwards is the sum of the preceding terms. In 2000, Luca (Port. Math. 57(2) 2000 243-254) proved that 11 is the largest number with only one distinct digit (the so-called repdigit) in the sequence $(L_n^{(2)})_n$. In this paper, we address a similar problem in the family of -Lucas sequences. We also show that the -Lucas sequences have similar properties to those of -Fibonacci sequences and occur in formulae simultaneously with the latter.

  8. On Maximal Green Sequences

    CERN Document Server

    Brüstle, Thomas; Pérotin, Matthieu

    2012-01-01

    Maximal green sequences are particular sequences of quiver mutations which were introduced by Keller in the context of quantum dilogarithm identities and independently by Cecotti-Cordova-Vafa in the context of supersymmetric gauge theory. Our aim is to initiate a systematic study of these sequences from a combinatorial point of view. Interpreting maximal green sequences as paths in various natural posets arising in representation theory, we prove the finiteness of the number of maximal green sequences for cluster finite quivers, affine quivers and acyclic quivers with at most three vertices. We also give results concerning the possible numbers and lengths of these maximal green sequences. Finally we describe an algorithm for computing maximal green sequences for arbitrary valued quivers which we used to obtain numerous explicit examples that we present.

  9. Cosmetology: Scope and Sequence.

    Science.gov (United States)

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a cosmetology vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  10. DNA sequencing by CE.

    Science.gov (United States)

    Karger, Barry L; Guttman, András

    2009-06-01

    Sequencing of human and other genomes has been at the center of interest in the biomedical field over the past several decades and is now leading toward an era of personalized medicine. During this time, DNA-sequencing methods have evolved from the labor-intensive slab gel electrophoresis, through automated multiCE systems using fluorophore labeling with multispectral imaging, to the "next-generation" technologies of cyclic-array, hybridization based, nanopore and single molecule sequencing. Deciphering the genetic blueprint and follow-up confirmatory sequencing of Homo sapiens and other genomes were only possible with the advent of modern sequencing technologies that were a result of step-by-step advances with a contribution of academics, medical personnel and instrument companies. While next-generation sequencing is moving ahead at breakneck speed, the multicapillary electrophoretic systems played an essential role in the sequencing of the Human Genome, the foundation of the field of genomics. In this prospective, we wish to overview the role of CE in DNA sequencing based in part of several of our articles in this journal.

  11. Hardware bitstream sequence recognizer

    OpenAIRE

    Karpin, Oleksandr; Sokil, Volodymyr

    2009-01-01

    This paper describes how to implement in hardware a bistream sequence recognizer using the PSoC Pseudo Random Sequence Generator (PRS) User Module. The PRS can be used in digital communication systems with the serial data interface for automatic preamble detection and extraction, control words selection, etc.

  12. Sequencing the maize genome.

    Science.gov (United States)

    Martienssen, Robert A; Rabinowicz, Pablo D; O'Shaughnessy, Andrew; McCombie, W Richard

    2004-04-01

    Sequencing of complex genomes can be accomplished by enriching shotgun libraries for genes. In maize, gene-enrichment by copy-number normalization (high C(0)t) and methylation filtration (MF) have been used to generate up to two-fold coverage of the gene-space with less than 1 million sequencing reads. Simulations using sequenced bacterial artificial chromosome (BAC) clones predict that 5x coverage of gene-rich regions, accompanied by less than 1x coverage of subclones from BAC contigs, will generate high-quality mapped sequence that meets the needs of geneticists while accommodating unusually high levels of structural polymorphism. By sequencing several inbred strains, we propose a strategy for capturing this polymorphism to investigate hybrid vigor or heterosis.

  13. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    OpenAIRE

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; AKIYAMA, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto seque...

  14. HIV Sequence Compendium 2015

    Energy Technology Data Exchange (ETDEWEB)

    Foley, Brian Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas Kenneth [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Cristian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Pennsylvania, Philadelphia, PA (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette Tina Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  15. Phylogenetic Trees From Sequences

    Science.gov (United States)

    Ryvkin, Paul; Wang, Li-San

    In this chapter, we review important concepts and approaches for phylogeny reconstruction from sequence data.We first cover some basic definitions and properties of phylogenetics, and briefly explain how scientists model sequence evolution and measure sequence divergence. We then discuss three major approaches for phylogenetic reconstruction: distance-based phylogenetic reconstruction, maximum parsimony, and maximum likelihood. In the third part of the chapter, we review how multiple phylogenies are compared by consensus methods and how to assess confidence using bootstrapping. At the end of the chapter are two sections that list popular software packages and additional reading.

  16. A calculation of all possible oligosaccharide isomers both branched and linear yields 1.05 x 10(12) structures for a reducing hexasaccharide: the Isomer Barrier to development of single-method saccharide sequencing or synthesis systems.

    Science.gov (United States)

    Laine, R A

    1994-12-01

    The number of all possible linear and branched isomers of a hexasaccharide was calculated and found to be > 1.05 x 10(12). This large number defines the Isomer Barrier, a persistent technological barrier to the development of a single analytical method for the absolute characterization of carbohydrates, regardless of sample quantity. Because of this isomer barrier, no single method can be employed to determine complete oligosaccharide structure in 100 nmol amounts with the same assurance that can be achieved for 100 pmol amounts with single-procedure Edman peptide or Sanger DNA sequencing methods. Difficulties in the development of facile synthetic schemes for oligosaccharides are also explained by this large number. No current method of chemical or physical analysis has the resolution necessary to distinguish among 10(12) structures having the same mass. Therefore the 'characterization' of a middle-weight oligosaccharide solely by NMR or mass spectrometry necessarily contains a very large margin of error. Greater uncertainty accompanies results performed solely by sequential enzyme degradation followed by gel-permeation chromatography or electrophoresis, as touted by some commercial advertisements. Much of the literature which uses these single methods to 'characterize' complex carbohydrates is, therefore, in question, and journals should beware of publishing structural characterizations unless the authors reveal all alternate possible structures which could result from their analysis.(ABSTRACT TRUNCATED AT 250 WORDS)

  17. Yeast genome sequencing:

    DEFF Research Database (Denmark)

    Piskur, Jure; Langkjær, Rikke Breinhold

    2004-01-01

    For decades, unicellular yeasts have been general models to help understand the eukaryotic cell and also our own biology. Recently, over a dozen yeast genomes have been sequenced, providing the basis to resolve several complex biological questions. Analysis of the novel sequence data has shown...... of closely related species helps in gene annotation and to answer how many genes there really are within the genomes. Analysis of non-coding regions among closely related species has provided an example of how to determine novel gene regulatory sequences, which were previously difficult to analyse because...... they are short and degenerate and occupy different positions. Comparative genomics helps to understand the origin of yeasts and points out crucial molecular events in yeast evolutionary history, such as whole-genome duplication and horizontal gene transfer(s). In addition, the accumulating sequence data provide...

  18. In Favor of Sequencing?

    NARCIS (Netherlands)

    van der Borgh, G.J.C.

    2014-01-01

    This short article is a contribution to an online discussion about political sequencing and stability. It argues that despite all the risks of democratization in fragile states,a more gradual approach should be preferred.

  19. Scope and Sequence.

    Science.gov (United States)

    Callison, Daniel

    2002-01-01

    Discusses scope and sequence plans for curriculum coordination in elementary and secondary education related to school libraries. Highlights include library skills; levels of learning objectives; technology skills; media literacy skills; and information inquiry skills across disciplines by grade level. (LRW)

  20. Pierre Robin sequence

    Science.gov (United States)

    Pierre Robin syndrome; Pierre Robin complex; Pierre Robin anomaly ... The exact causes of Pierre Robin sequence are unknown. It may be part of many genetic syndromes. The lower jaw develops slowly before birth, but may grow ...

  1. HIV Sequence Compendium 2010

    Energy Technology Data Exchange (ETDEWEB)

    Kuiken, Carla [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Foley, Brian [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Christian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Alabama, Tuscaloosa, AL (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  2. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene......This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis...

  3. Text Mining: (Asynchronous Sequences

    Directory of Open Access Journals (Sweden)

    Sheema Khan

    2014-12-01

    Full Text Available In this paper we tried to correlate text sequences those provides common topics for semantic clues. We propose a two step method for asynchronous text mining. Step one check for the common topics in the sequences and isolates these with their timestamps. Step two takes the topic and tries to give the timestamp of the text document. After multiple repetitions of step two, we could give optimum result.

  4. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed Affan

    2012-01-26

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  5. Controlled processing during sequencing

    Directory of Open Access Journals (Sweden)

    Malathi eThothathiri

    2015-10-01

    Full Text Available Longstanding evidence has identified a role for the frontal cortex in sequencing within both linguistic and non-linguistic domains. More recently, neuropsychological studies have suggested a specific role for the left premotor-prefrontal junction (BA 44/6 in selection between competing alternatives during sequencing. In this study, we used neuroimaging with healthy adults to confirm and extend knowledge about the neural correlates of sequencing. Participants reproduced visually presented sequences of syllables and words using manual button presses. Items in the sequence were presented either consecutively or concurrently. Concurrent presentation is known to trigger the planning of multiple responses, which might compete with one another. Therefore, we hypothesized that regions involved in controlled processing would show greater recruitment during the concurrent than the consecutive condition. Whole-brain analysis showed concurrent > consecutive activation in sensory, motor and somatosensory cortices and notably also in rostral-dorsal anterior cingulate cortex (ACC. Region of interest analyses showed increased activation within left BA 44/6 and correlation between this region’s activation and behavioral response times. Functional connectivity analysis revealed increased connectivity between left BA 44/6 and the posterior lobe of the cerebellum during the concurrent than the consecutive condition. These results corroborate recent evidence and demonstrate the involvement of BA 44/6 and other control regions when ordering co-activated representations.

  6. Controlled processing during sequencing.

    Science.gov (United States)

    Thothathiri, Malathi; Rattinger, Michelle

    2015-01-01

    Longstanding evidence has identified a role for the frontal cortex in sequencing within both linguistic and non-linguistic domains. More recently, neuropsychological studies have suggested a specific role for the left premotor-prefrontal junction (BA 44/6) in selection between competing alternatives during sequencing. In this study, we used neuroimaging with healthy adults to confirm and extend knowledge about the neural correlates of sequencing. Participants reproduced visually presented sequences of syllables and words using manual button presses. Items in the sequence were presented either consecutively or concurrently. Concurrent presentation is known to trigger the planning of multiple responses, which might compete with one another. Therefore, we hypothesized that regions involved in controlled processing would show greater recruitment during the concurrent than the consecutive condition. Whole-brain analysis showed concurrent > consecutive activation in sensory, motor and somatosensory cortices and notably also in rostral-dorsal anterior cingulate cortex. Region of interest analyses showed increased activation within left BA 44/6 and correlation between this region's activation and behavioral response times. Functional connectivity analysis revealed increased connectivity between left BA 44/6 and the posterior lobe of the cerebellum during the concurrent than the consecutive condition. These results corroborate recent evidence and demonstrate the involvement of BA 44/6 and other control regions when ordering co-activated representations.

  7. Program Synthesizes UML Sequence Diagrams

    Science.gov (United States)

    Barry, Matthew R.; Osborne, Richard N.

    2006-01-01

    A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.

  8. Sequencing BPS Spectra

    CERN Document Server

    Gukov, Sergei; Saberi, Ingmar; Stosic, Marko; Sulkowski, Piotr

    2015-01-01

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincar\\'e polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (refined) modular $S$-matrix. This leads to the identifi...

  9. Twin anemia polycythemia sequence

    NARCIS (Netherlands)

    Slaghekke, Femke

    2014-01-01

    In this thesis we describe that Twin Anemia Polycythemia Sequence (TAPS) is a form of chronic feto-fetal transfusion in monochorionic (identical) twins based on a small amount of blood transfusion through very small anastomoses. For the antenatal diagnosis of TAPS, Middle Cerebral Artery – Peak Syst

  10. Family Sequencing and Cooperation

    NARCIS (Netherlands)

    Grundel, S.; Ciftci, B.B.; Borm, P.E.M.; Hamers, H.J.M.

    2012-01-01

    To analyze the allocation problem of the maximal cost savings of the whole group of jobs, we define and analyze a so-called corresponding cooperative family sequencing game which explicitly takes into account the maximal cost savings for any coalition of jobs. Using nonstandard techniques we prove t

  11. Next-generation sequencing

    DEFF Research Database (Denmark)

    Rieneck, Klaus; Bak, Mads; Jønson, Lars

    2013-01-01

    , Illumina); several millions of PCR sequences were analyzed. RESULTS: The results demonstrated the feasibility of diagnosing the fetal KEL1 or KEL2 blood group from cell-free DNA purified from maternal plasma. CONCLUSION: This method requires only one primer pair, and the large amount of sequence......BACKGROUND: Maternal immunization against KEL1 of the Kell blood group system can have serious adverse consequences for the fetus as well as the newborn baby. Therefore, it is important to determine the phenotype of the fetus to predict whether it is at risk. We present data that show...... information obtained allows well for statistical analysis of the data. This general approach can be integrated into current laboratory practice and has numerous applications. Besides DNA-based predictions of blood group phenotypes, platelet phenotypes, or sickle cell anemia, and the determination of zygosity...

  12. Rapid-Sequence Intubation

    Directory of Open Access Journals (Sweden)

    Evangelina Dávila Cabo de Villa

    2015-09-01

    Full Text Available In medical practice there are several situations that require immediate intervention of the airway in some patients, in order to ensure proper entrance and exit of gases into and out of the lungs and prevent aspiration. Rapid-sequence intubation has been considered as the administration of a hypnotic agent and a neuromuscular relaxant consecutively (virtually simultaneously to facilitate orotracheal intubation in critically ill patients and minimize the risk of aspiration. This paper aims to collect elements that promote a successful medical management according to the situation presented, since there is no single way of proceeding in case of rapid-sequence intubation. The elements to consider include: knowing the anatomy of the upper respiratory tract, having a group of drugs to choose from, receiving adequate training and having an alternative plan for the difficulties that may arise.

  13. Sequencing of aromatase inhibitors

    OpenAIRE

    2005-01-01

    Since the development of the third-generation aromatase inhibitors (AIs), anastrozole, letrozole and exemestane, these agents have been the subject of intensive research to determine their optimal use in advanced breast cancer. Not only have they replaced progestins in second-line therapy and challenged the role of tamoxifen in first-line, but there is also evidence for a lack of cross-resistance between the steroidal and nonsteroidal AIs, meaning that they may be used in sequence to obtain p...

  14. Learning Sequence Neighbourhood Metrics

    CERN Document Server

    Bayer, Justin; van der Smagt, Patrick

    2011-01-01

    Recurrent neural networks (RNNs) in combination with a pooling operator and the neighbourhood components analysis (NCA) objective function are able to detect the characterizing dynamics of sequences and embed them into a fixed-length vector space of arbitrary dimensionality. Subsequently, the resulting features are meaningful and can be used for visualization or nearest neighbour classification in linear time. This kind of metric learning for sequential data enables the use of algorithms tailored towards fixed length vector spaces such as R^n.

  15. Sequence Classification: 885394 [

    Lifescience Database Archive (English)

    Full Text Available 703); The expression pattern of this gene is described in PMID:12000842; possible frameshift detected when compare...Non-TMB TMH Non-TMB Non-TMB Non-TMB Non-TMB >gi|23619146|ref|NP_705108.1| Slight difference exist when compa...red to the published sequence of EBL-1 from Dd2 strain of P. falciparum (PMID:10613

  16. Properties of Semijoin Sequences

    Institute of Scientific and Technical Information of China (English)

    BengC.Ooi; B.Srinivasan

    1989-01-01

    The problem of finding optimum semijoin sequ4ence of an arbitrary query under linear cost function for the transmission cost is NP.hard.Hence heuristic algorithms with desirable properties are explored.In this paper four properties of semijoin programs for distributed query processing are identified,The use of these properties in constructing semijoin sequence is justified.An existing algorithm is modified incorporating these properties.Empirical comparison with existing algorithms shows the superiority of the proposed algorithm.

  17. Image sequence analysis

    CERN Document Server

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  18. Sequencing BPS spectra

    Science.gov (United States)

    Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; Stošić, Marko; Sułkowski, Piotr

    2016-03-01

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincaré polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (refined) modular S-matrix. This leads to the identification of modular transformations in Chern-Simons theory and 3d {N}=2 theory via the 3d/3d correspondence. Lastly, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  19. The Galaxy End Sequence

    Science.gov (United States)

    Eales, Stephen; de Vis, Pieter; Smith, Matthew W. L.; Appah, Kiran; Ciesla, Laure; Duffield, Chris; Schofield, Simon

    2017-03-01

    A common assumption is that galaxies fall in two distinct regions of a plot of specific star formation rate (SSFR) versus galaxy stellar mass: a star-forming galaxy main sequence (GMS) and a separate region of 'passive' or 'red and dead galaxies'. Starting from a volume-limited sample of nearby galaxies designed to contain most of the stellar mass in this volume, and thus representing the end-point of ≃12 billion years of galaxy evolution, we investigate the distribution of galaxies in this diagram today. We show that galaxies follow a strongly curved extended GMS with a steep negative slope at high galaxy stellar masses. There is a gradual change in the morphologies of the galaxies along this distribution, but there is no clear break between early-type and late-type galaxies. Examining the other evidence that there are two distinct populations, we argue that the 'red sequence' is the result of the colours of galaxies changing very little below a critical value of the SSFR, rather than implying a distinct population of galaxies. Herschel observations, which show at least half of early-type galaxies contain a cool interstellar medium, also imply continuity between early-type and late-type galaxies. This picture of a unitary population of galaxies requires more gradual evolutionary processes than the rapid quenching process needed to explain two distinct populations. We challenge theorists to predict quantitatively the properties of this 'Galaxy End Sequence'.

  20. Information Theory of DNA Sequencing

    CERN Document Server

    Motahari, Abolfazl; Tse, David

    2012-01-01

    DNA sequencing is the basic workhorse of modern day biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original sequence. By drawing an analogy between the DNA sequencing problem and the classic communication problem, we define an information theoretic notion of sequencing capacity. This is the maximum number of DNA base pairs that can be resolved reliably per read, and provides a fundamental limit to the performance that can be achieved by any assembly algorithm. We compute the sequencing capacity explicitly for a simple statistical model of the DNA sequence and the read process. Using this framework, we also study the impact of noise in the read process on the sequencing capacity.

  1. A vision for ubiquitous sequencing.

    Science.gov (United States)

    Erlich, Yaniv

    2015-10-01

    Genomics has recently celebrated reaching the $1000 genome milestone, making affordable DNA sequencing a reality. With this goal successfully completed, the next goal of the sequencing revolution can be sequencing sensors--miniaturized sequencing devices that are manufactured for real-time applications and deployed in large quantities at low costs. The first part of this manuscript envisions applications that will benefit from moving the sequencers to the samples in a range of domains. In the second part, the manuscript outlines the critical barriers that need to be addressed in order to reach the goal of ubiquitous sequencing sensors.

  2. Psychoacoustic Properties of Fibonacci Sequences

    Directory of Open Access Journals (Sweden)

    J. Sokoll

    2008-01-01

    Full Text Available 1202, Fibonacci set up one of the most interesting sequences in number theory. This sequence can be represented by so-called Fibonacci Numbers, and by a binary sequence of zeros and ones. If such a binary Fibonacci Sequence is played back as an audio file, a very dissonant sound results. This is caused by the “almost-periodic”, “self-similar” property of the binary sequence. The ratio of zeros and ones converges to the golden ratio, as do the primary and secondary spectral components intheir frequencies and amplitudes. These Fibonacci Sequences will be characterized using listening tests and psychoacoustic analyses. 

  3. Infinite sequences and series

    CERN Document Server

    Knopp, Konrad

    1956-01-01

    One of the finest expositors in the field of modern mathematics, Dr. Konrad Knopp here concentrates on a topic that is of particular interest to 20th-century mathematicians and students. He develops the theory of infinite sequences and series from its beginnings to a point where the reader will be in a position to investigate more advanced stages on his own. The foundations of the theory are therefore presented with special care, while the developmental aspects are limited by the scope and purpose of the book. All definitions are clearly stated; all theorems are proved with enough detail to ma

  4. Allele Re-sequencing Technologies

    DEFF Research Database (Denmark)

    Byrne, Stephen; Farrell, Jacqueline Danielle; Asp, Torben

    2013-01-01

    The development of next-generation sequencing technologies has made sequencing an affordable approach for detection of genetic variations associated with various traits. However, the cost of whole genome re-sequencing still remains too high to be feasible for many plant species with large and com...... alternative to whole genome re-sequencing to identify causative genetic variations in plants. One challenge, however, will be efficient bioinformatics strategies for data handling and analysis from the increasing amount of sequence information.......The development of next-generation sequencing technologies has made sequencing an affordable approach for detection of genetic variations associated with various traits. However, the cost of whole genome re-sequencing still remains too high to be feasible for many plant species with large...

  5. Spaces of Ideal Convergent Sequences

    Directory of Open Access Journals (Sweden)

    M. Mursaleen

    2014-01-01

    Full Text Available In the present paper, we introduce some sequence spaces using ideal convergence and Musielak-Orlicz function ℳ=Mk. We also examine some topological properties of the resulting sequence spaces.

  6. Sequence Handling by Sequence Analysis Toolbox v1.0

    DEFF Research Database (Denmark)

    Ingrell, Christian Ravnsborg; Matthiesen, Rune; Jensen, Ole Nørregaard

    2006-01-01

    The fact that mass spectrometry have become a high-throughput method calls for bioinformatic tools for automated sequence handling and prediction. For efficient use of bioinformatic tools, it is important that these tools are integrated or interfaced with each other. The purpose of sequence...... analysis toolbox v1.0 was to have a general purpose sequence analyzing tool that can import sequences obtained by high-throughput sequencing methods. The program includes algorithms for calculation or prediction of isoelectric point, hydropathicity index, transmembrane segments, and glycosylphosphatidyl...

  7. Rapid Polymer Sequencer

    Science.gov (United States)

    Stolc, Viktor (Inventor); Brock, Matthew W (Inventor)

    2013-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component. The nanopore preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  8. The Galaxy End Sequence

    CERN Document Server

    Eales, Stephen; Smith, Matthew; Appah, Kiran; Ciesla, Laure; Duffield, Chris; Schofield, Simon

    2016-01-01

    A common assumption is that galaxies fall in two distinct regions on a plot of specific star-formation rate (SSFR) versus galaxy stellar mass: a star-forming Galaxy Main Sequence (GMS) and a separate region of `passive' or `red and dead galaxies'. Starting from a volume-limited sample of nearby galaxies designed to contain most of the stellar mass in this volume, and thus being a fair representation of the Universe at the end of 12 billion years of galaxy evolution, we investigate the distribution of galaxies in this diagram today. We show that galaxies follow a strongly curved extended GMS with a steep negative slope at high galaxy stellar masses. There is a gradual change in the morphologies of the galaxies along this distribution, but there is no clear break between early-type and late-type galaxies. Examining the other evidence that there are two distinct populations, we argue that the `red sequence' is the result of the colours of galaxies changing very little below a critical value of the SSFR, rather t...

  9. Novel sequences propel familiar folds.

    Science.gov (United States)

    Jawad, Zahra; Paoli, Massimo

    2002-04-01

    Recent structure determinations have made new additions to a set of strikingly different sequences that give rise to the same topology. Proteins with a beta propeller fold are characterized by extreme sequence diversity despite the similarity in their three-dimensional structures. Several fold predictions, based in part on sequence repeats thought to match modular beta sheets, have been proved correct.

  10. RIKEN integrated sequence analysis (RISA) system--384-format sequencing pipeline with 384 multicapillary sequencer.

    Science.gov (United States)

    Shibata, K; Itoh, M; Aizawa, K; Nagaoka, S; Sasaki, N; Carninci, P; Konno, H; Akiyama, J; Nishi, K; Kitsunai, T; Tashiro, H; Itoh, M; Sumi, N; Ishii, Y; Nakamura, S; Hazama, M; Nishine, T; Harada, A; Yamamoto, R; Matsumoto, H; Sakaguchi, S; Ikegami, T; Kashiwagi, K; Fujiwake, S; Inoue, K; Togawa, Y

    2000-11-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3' end and 5' end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can be

  11. Solid phase sequencing of biopolymers

    Energy Technology Data Exchange (ETDEWEB)

    Cantor, Charles (Del Mar, CA); Koster, Hubert (La Jolla, CA)

    2010-09-28

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include DNA or RNA in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  12. Graphene nanodevices for DNA sequencing

    Science.gov (United States)

    Heerema, Stephanie J.; Dekker, Cees

    2016-02-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with nanopores. Owing to its unique structure and properties, graphene provides interesting opportunities for the development of a new sequencing technology. In recent years, a wide range of creative ideas for graphene sequencers have been theoretically proposed and the first experimental demonstrations have begun to appear. Here, we review the different approaches to using graphene nanodevices for DNA sequencing, which involve DNA passing through graphene nanopores, nanogaps, and nanoribbons, and the physisorption of DNA on graphene nanostructures. We discuss the advantages and problems of each of these key techniques, and provide a perspective on the use of graphene in future DNA sequencing technology.

  13. Solid phase sequencing of biopolymers

    Science.gov (United States)

    Cantor, Charles R.; Hubert, Koster

    2014-06-24

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Probes may be affixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  14. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  15. Biosensors for DNA sequence detection

    Science.gov (United States)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  16. Nonlinear analysis of biological sequences

    Energy Technology Data Exchange (ETDEWEB)

    Torney, D.C.; Bruno, W.; Detours, V. [and others

    1998-11-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The main objectives of this project involved deriving new capabilities for analyzing biological sequences. The authors focused on tabulating the statistical properties exhibited by Human coding DNA sequences and on techniques of inferring the phylogenetic relationships among protein sequences related by descent.

  17. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal

    2011-08-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  18. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal

    2011-11-01

    Bioinformatics database is growing exponentially in size. Processing these large amount of data may take hours of time even if super computers are used. One of the most important processing tool in Bioinformatics is sequence alignment. We introduce fast alignment algorithm, called \\'Alignment By Scanning\\' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the \\'GAP\\' (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 51% enhancement in alignment score when it is compared with the GAP Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  19. Assembly sequencing with toleranced parts

    Energy Technology Data Exchange (ETDEWEB)

    Latombe, J.C. [Stanford Univ., CA (United States). Robotics Lab.; Wilson, R.H. [Sandia National Labs., Albuquerque, NM (United States). Intelligent Systems and Robotics Center

    1995-02-21

    The goal of assembly sequencing is to plan a feasible series of operations to construct a product from its individual parts. Previous research has thoroughly investigated assembly sequencing under the assumption that parts have nominal geometry. This paper considers the case where parts have toleranced geometry. Its main contribution is an efficient procedure that decides if a product admits an assembly sequence with infinite translations that is feasible for all possible instances of the components within the specified tolerances. If the product admits one such sequence, the procedure can also generate it. For the cases where there exists no such assembly sequence, another procedure is proposed which generates assembly sequences that are feasible only for some values of the toleranced dimensions. If this procedure produces no such sequence, then no instance of the product is assemblable. Finally, this paper analyzes the relation between assembly and disassembly sequences in the presence of toleranced parts. This work assumes a simple, but non-trivial tolerance language that falls short of capturing all imperfections of a manufacturing process. Hence, it is only one step toward assembly sequencing with toleranced parts.

  20. SNMR pulse sequence phase cycling

    Science.gov (United States)

    Walsh, David O; Grunewald, Elliot D

    2013-11-12

    Technologies applicable to SNMR pulse sequence phase cycling are disclosed, including SNMR acquisition apparatus and methods, SNMR processing apparatus and methods, and combinations thereof. SNMR acquisition may include transmitting two or more SNMR pulse sequences and applying a phase shift to a pulse in at least one of the pulse sequences, according to any of a variety cycling techniques. SNMR processing may include combining SNMR from a plurality of pulse sequences comprising pulses of different phases, so that desired signals are preserved and indesired signals are canceled.

  1. Blazar Sequence in Fermi Era

    Indian Academy of Sciences (India)

    Liang Chen

    2014-09-01

    In this paper, we review the latest research results on the topic of blazar sequence. It seems that the blazar sequence is phenomenally ruled out, while the theoretical blazar sequence still holds. We point out that black hole mass is a dominated parameter accounting for high-power-high-synchrotron-peaked and low-power-low-sychrotron-peaked blazars. Because most blazars have similar size of emission region, theoretical blazar sequence implies that the break of Spectral Energy Distribution (SED) is a cooling break in nature.

  2. Sequence Algebra, Sequence Decision Diagrams and Dynamic Fault Trees

    Energy Technology Data Exchange (ETDEWEB)

    Rauzy, Antoine B., E-mail: Antoine.Rauzy@lix.polytechnique.f [LIX-CNRS, Computer Science, Ecole Polytechnique, 91128 Palaiseau Cedex (France)

    2011-07-15

    A large attention has been focused on the Dynamic Fault Trees in the past few years. By adding new gates to static (regular) Fault Trees, Dynamic Fault Trees aim to take into account dependencies among events. Merle et al. proposed recently an algebraic framework to give a formal interpretation to these gates. In this article, we extend Merle et al.'s work by adopting a slightly different perspective. We introduce Sequence Algebras that can be seen as Algebras of Basic Events, representing failures of non-repairable components. We show how to interpret Dynamic Fault Trees within this framework. Finally, we propose a new data structure to encode sets of sequences of Basic Events: Sequence Decision Diagrams. Sequence Decision Diagrams are very much inspired from Minato's Zero-Suppressed Binary Decision Diagrams. We show that all operations of Sequence Algebras can be performed on this data structure.

  3. Bayesian analysis of binary sequences

    Science.gov (United States)

    Torney, David C.

    2005-03-01

    This manuscript details Bayesian methodology for "learning by example", with binary n-sequences encoding the objects under consideration. Priors prove influential; conformable priors are described. Laplace approximation of Bayes integrals yields posterior likelihoods for all n-sequences. This involves the optimization of a definite function over a convex domain--efficiently effectuated by the sequential application of the quadratic program.

  4. Gambling strategies for random sequences

    OpenAIRE

    George Davie

    2010-01-01

    There is a general consensus that it is not possible to gamble successfully against a random se-quence. This consensus is based on results from probability theory that all gambling systems arein some sense futile and the idea that at any stage of the sequence, the next outcome is entirelyunpredictable.

  5. DNA Sequencing Sensors: An Overview

    Directory of Open Access Journals (Sweden)

    Jose Antonio Garrido-Cardenas

    2017-03-01

    Full Text Available The first sequencing of a complete genome was published forty years ago by the double Nobel Prize in Chemistry winner Frederick Sanger. That corresponded to the small sized genome of a bacteriophage, but since then there have been many complex organisms whose DNA have been sequenced. This was possible thanks to continuous advances in the fields of biochemistry and molecular genetics, but also in other areas such as nanotechnology and computing. Nowadays, sequencing sensors based on genetic material have little to do with those used by Sanger. The emergence of mass sequencing sensors, or new generation sequencing (NGS meant a quantitative leap both in the volume of genetic material that was able to be sequenced in each trial, as well as in the time per run and its cost. One can envisage that incoming technologies, already known as fourth generation sequencing, will continue to cheapen the trials by increasing DNA reading lengths in each run. All of this would be impossible without sensors and detection systems becoming smaller and more precise. This article provides a comprehensive overview on sensors for DNA sequencing developed within the last 40 years.

  6. PERIODIC COMPLEMENTARY BINARY SEQUENCE PAIRS

    Institute of Scientific and Technical Information of China (English)

    XuChengqian; ZhaoXiaoqun

    2002-01-01

    A new set of binary sequences-Periodic Complementary Binary Sequence Pair (PCSP)is proposed .A new class of block design-Difference Family Pair (DFP)is also proposed .The relationship between PCSP and DFP,the properties and exising conditions of PCSP and the recursive constructions for PCSP are given.

  7. PERIODIC COMPLEMENTARY BINARY SEQUENCE PAIRS

    Institute of Scientific and Technical Information of China (English)

    Xu Chengqian; Zhao Xiaoqun

    2002-01-01

    A new set of binary sequences-Periodic Complementary Binary Sequence Pair (PCSP) is proposed. A new class of block design-Difference Family Pair (DFP) is also proposed.The relationship between PCSP and DFP, the properties and existing conditions of PCSP and the recursive constructions for PCSP are given.

  8. Sequence conserved for subcellular localization

    Science.gov (United States)

    Nair, Rajesh; Rost, Burkhard

    2002-01-01

    The more proteins diverged in sequence, the more difficult it becomes for bioinformatics to infer similarities of protein function and structure from sequence. The precise thresholds used in automated genome annotations depend on the particular aspect of protein function transferred by homology. Here, we presented the first large-scale analysis of the relation between sequence similarity and identity in subcellular localization. Three results stood out: (1) The subcellular compartment is generally more conserved than what might have been expected given that short sequence motifs like nuclear localization signals can alter the native compartment; (2) the sequence conservation of localization is similar between different compartments; and (3) it is similar to the conservation of structure and enzymatic activity. In particular, we found the transition between the regions of conserved and nonconserved localization to be very sharp, although the thresholds for conservation were less well defined than for structure and enzymatic activity. We found that a simple measure for sequence similarity accounting for pairwise sequence identity and alignment length, the HSSP distance, distinguished accurately between protein pairs of identical and different localizations. In fact, BLAST expectation values outperformed the HSSP distance only for alignments in the subtwilight zone. We succeeded in slightly improving the accuracy of inferring localization through homology by fine tuning the thresholds. Finally, we applied our results to the entire SWISS-PROT database and five entirely sequenced eukaryotes. PMID:12441382

  9. Rapid Diagnostics of Onboard Sequences

    Science.gov (United States)

    Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

    2012-01-01

    Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command

  10. Quantum Exchangeable Sequences of Algebras

    CERN Document Server

    Curran, Stephen

    2008-01-01

    We extend the notion of quantum exchangeability, introduced by K\\"ostler and Speicher in arXiv:0807.0677, to sequences (\\rho_1,\\rho_2,...c) of homomorphisms from an algebra C into a noncommutative probability space (A,\\phi), and prove a free de Finetti theorem: an infinite quantum exchangeable sequence (\\rho_1,\\rho_2,...c) is freely independent and identically distributed with respect to a conditional expectation. As a corollary we obtain a free analogue of the Hewitt Savage zero-one law. As in the classical case, the theorem fails for finite sequences. We give a characterization of finite quantum exchangeable sequences, which can be viewed as a noncommutative analogue of sampling without replacement. We then give an approximation to how far a finite quantum exchangeable sequence is from being freely independent with amalgamation.

  11. Spatiotemporal correlations of aftershock sequences

    CERN Document Server

    Peixoto, Tiago P; Davidsen, Jörn

    2010-01-01

    Aftershock sequences are of particular interest in seismic research since they may condition seismic activity in a given region over long time spans. While they are typically identified with periods of enhanced seismic activity after a large earthquake as characterized by the Omori law, our knowledge of the spatiotemporal correlations between events in an aftershock sequence is limited. Here, we study the spatiotemporal correlations of two aftershock sequences form California (Parkfield and Hector Mine) using the recently introduced concept of "recurrent" events. We find that both sequences have very similar properties and that most of them are captured by the space-time epidemic-type aftershock sequence (ETAS) model if one takes into account catalog incompleteness. However, the stochastic model does not capture the spatiotemporal correlations leading to the observed structure of seismicity on small spatial scales.

  12. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS......) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time...... and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from...

  13. Ossification sequence heterochrony among amphibians.

    Science.gov (United States)

    Harrington, Sean M; Harrison, Luke B; Sheil, Christopher A

    2013-01-01

    Heterochrony is an important mechanism in the evolution of amphibians. Although studies have centered on the relationship between size and shape and the rates of development, ossification sequence heterochrony also may have been important. Rigorous, phylogenetic methods for assessing sequence heterochrony are relatively new, and a comprehensive study of the relative timing of ossification of skeletal elements has not been used to identify instances of sequence heterochrony across Amphibia. In this study, a new version of the program Parsimov-based genetic inference (PGi) was used to identify shifts in ossification sequences across all extant orders of amphibians, for all major structural units of the skeleton. PGi identified a number of heterochronic sequence shifts in all analyses, the most interesting of which seem to be tied to differences in metamorphic patterns among major clades. Early ossification of the vomer, premaxilla, and dentary is retained by Apateon caducus and members of Gymnophiona and Urodela, which lack the strongly biphasic development seen in anurans. In contrast, bones associated with the jaws and face were identified as shifting late in the ancestor of Anura. The bones that do not shift late, and thereby occupy the earliest positions in the anuran cranial sequence, are those in regions of the skull that undergo the least restructuring throughout anuran metamorphosis. Additionally, within Anura, bones of the hind limb and pelvic girdle were also identified as shifting early in the sequence of ossification, which may be a result of functional constraints imposed by the drastic metamorphosis of most anurans.

  14. A Criterion for Regular Sequences

    Indian Academy of Sciences (India)

    D P Patil; U Storch; J Stückrad

    2004-05-01

    Let be a commutative noetherian ring and $f_1,\\ldots,f_r \\in R$. In this article we give (cf. the Theorem in $\\mathcal{x}$2) a criterion for $f_1,\\ldots,f_r$ to be regular sequence for a finitely generated module over which strengthens and generalises a result in [2]. As an immediate consequence we deduce that if $V(g_1,\\ldots,g_r) \\subseteq V(f_1,\\ldots,f_r)$ in Spec and if $f_1,\\ldots,f_r$ is a regular sequence in , then $g_1,\\ldots,g_r$ is also a regular sequence in .

  15. Weak disorder in Fibonacci sequences

    Energy Technology Data Exchange (ETDEWEB)

    Ben-Naim, E [Theoretical Division and Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM 87545 (United States); Krapivsky, P L [Department of Physics and Center for Molecular Cybernetics, Boston University, Boston, MA 02215 (United States)

    2006-05-19

    We study how weak disorder affects the growth of the Fibonacci series. We introduce a family of stochastic sequences that grow by the normal Fibonacci recursion with probability 1 - {epsilon}, but follow a different recursion rule with a small probability {epsilon}. We focus on the weak disorder limit and obtain the Lyapunov exponent that characterizes the typical growth of the sequence elements, using perturbation theory. The limiting distribution for the ratio of consecutive sequence elements is obtained as well. A number of variations to the basic Fibonacci recursion including shift, doubling and copying are considered. (letter to the editor)

  16. DNA Sequencing Using capillary Electrophoresis

    Energy Technology Data Exchange (ETDEWEB)

    Dr. Barry Karger

    2011-05-09

    The overall goal of this program was to develop capillary electrophoresis as the tool to be used to sequence for the first time the Human Genome. Our program was part of the Human Genome Project. In this work, we were highly successful and the replaceable polymer we developed, linear polyacrylamide, was used by the DOE sequencing lab in California to sequence a significant portion of the human genome using the MegaBase multiple capillary array electrophoresis instrument. In this final report, we summarize our efforts and success. We began our work by separating by capillary electrophoresis double strand oligonucleotides using cross-linked polyacrylamide gels in fused silica capillaries. This work showed the potential of the methodology. However, preparation of such cross-linked gel capillaries was difficult with poor reproducibility, and even more important, the columns were not very stable. We improved stability by using non-cross linked linear polyacrylamide. Here, the entangled linear chains could move when osmotic pressure (e.g. sample injection) was imposed on the polymer matrix. This relaxation of the polymer dissipated the stress in the column. Our next advance was to use significantly lower concentrations of the linear polyacrylamide that the polymer could be automatically blown out after each run and replaced with fresh linear polymer solution. In this way, a new column was available for each analytical run. Finally, while testing many linear polymers, we selected linear polyacrylamide as the best matrix as it was the most hydrophilic polymer available. Under our DOE program, we demonstrated initially the success of the linear polyacrylamide to separate double strand DNA. We note that the method is used even today to assay purity of double stranded DNA fragments. Our focus, of course, was on the separation of single stranded DNA for sequencing purposes. In one paper, we demonstrated the success of our approach in sequencing up to 500 bases. Other

  17. Classification of Base Sequences (+1,

    Directory of Open Access Journals (Sweden)

    Dragomir Ž. Ðoković

    2010-01-01

    Full Text Available Base sequences BS(+1, are quadruples of {±1}-sequences (;;;, with A and B of length +1 and C and D of length n, such that the sum of their nonperiodic autocor-relation functions is a -function. The base sequence conjecture, asserting that BS(+1, exist for all n, is stronger than the famous Hadamard matrix conjecture. We introduce a new definition of equivalence for base sequences BS(+1, and construct a canonical form. By using this canonical form, we have enumerated the equivalence classes of BS(+1, for ≤30. As the number of equivalence classes grows rapidly (but not monotonically with n, the tables in the paper cover only the cases ≤13.

  18. Molecular beacon sequence design algorithm.

    Science.gov (United States)

    Monroe, W Todd; Haselton, Frederick R

    2003-01-01

    A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.

  19. Pythagorean Triples from Harmonic Sequences.

    Science.gov (United States)

    DiDomenico, Angelo S.; Tanner, Randy J.

    2001-01-01

    Shows how all primitive Pythagorean triples can be generated from harmonic sequences. Use inductive and deductive reasoning to explore how Pythagorean triples are connected with another area of mathematics. (KHR)

  20. ISIS Individualized Support In Sequencing

    NARCIS (Netherlands)

    Drachsler, Hendrik; Hummel, Hans

    2007-01-01

    Drachsler, H., & Hummel, H. G. K. (2007). ISIS Individualized Support In Sequencing. Presentation given during the PIP meeting on March 22, 2007. Open University of the Netherlands: Heerlen, The Netherlands.

  1. Overview of Sequence Data Formats.

    Science.gov (United States)

    Zhang, Hongen

    2016-01-01

    Next-generation sequencing experiment can generate billions of short reads for each sample and processing of the raw reads will add more information. Various file formats have been introduced/developed in order to store and manipulate this information. This chapter presents an overview of the file formats including FASTQ, FASTA, SAM/BAM, GFF/GTF, BED, and VCF that are commonly used in analysis of next-generation sequencing data.

  2. Genome Sequence of Canine Herpesvirus.

    Directory of Open Access Journals (Sweden)

    Konstantinos V Papageorgiou

    Full Text Available Canine herpesvirus is a widespread alphaherpesvirus that causes a fatal haemorrhagic disease of neonatal puppies. We have used high-throughput methods to determine the genome sequences of three viral strains (0194, V777 and V1154 isolated in the United Kingdom between 1985 and 2000. The sequences are very closely related to each other. The canine herpesvirus genome is estimated to be 125 kbp in size and consists of a unique long sequence (97.5 kbp and a unique short sequence (7.7 kbp that are each flanked by terminal and internal inverted repeats (38 bp and 10.0 kbp, respectively. The overall nucleotide composition is 31.6% G+C, which is the lowest among the completely sequenced alphaherpesviruses. The genome contains 76 open reading frames predicted to encode functional proteins, all of which have counterparts in other alphaherpesviruses. The availability of the sequences will facilitate future research on the diagnosis and treatment of canine herpesvirus-associated disease.

  3. Long-range barcode labeling-sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Chen, Feng; Zhang, Tao; Singh, Kanwar K.; Pennacchio, Len A.; Froula, Jeff L.; Eng, Kevin S.

    2016-10-18

    Methods for sequencing single large DNA molecules by clonal multiple displacement amplification using barcoded primers. Sequences are binned based on barcode sequences and sequenced using a microdroplet-based method for sequencing large polynucleotide templates to enable assembly of haplotype-resolved complex genomes and metagenomes.

  4. Pig genome sequence - analysis and publication strategy

    NARCIS (Netherlands)

    Archibald, A.L.; Bolund, L.; Churcher, C.; Fredholm, M.; Groenen, M.A.M.; Harlizius, B.

    2010-01-01

    Background - The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. Results - Assemblies of the B

  5. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    Science.gov (United States)

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  6. ARC Code TI: sequenceMiner

    Data.gov (United States)

    National Aeronautics and Space Administration — The sequenceMiner was developed to address the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. sequenceMiner works...

  7. Sequence-dependent nucleosome positioning.

    Science.gov (United States)

    Chung, Ho-Ryun; Vingron, Martin

    2009-03-13

    Eukaryotic DNA is organized into a macromolecular structure called chromatin. The basic repeating unit of chromatin is the nucleosome, which consists of two copies of each of the four core histones and DNA. The nucleosomal organization and the positions of nucleosomes have profound effects on all DNA-dependent processes. Understanding the factors that influence nucleosome positioning is therefore of general interest. Among the many determinants of nucleosome positioning, the DNA sequence has been proposed to have a major role. Here, we analyzed more than 860,000 nucleosomal DNA sequences to identify sequence features that guide the formation of nucleosomes in vivo. We found that both a periodic enrichment of AT base pairs and an out-of-phase oscillating enrichment of GC base pairs as well as the overall preference for GC base pairs are determinants of nucleosome positioning. The preference for GC pairs can be related to a lower energetic cost required for deformation of the DNA to wrap around the histones. In line with this idea, we found that only incorporation of both signal components into a sequence model for nucleosome formation results in maximal predictive performance on a genome-wide scale. In this manner, one achieves greater predictive power than published approaches. Our results confirm the hypothesis that the DNA sequence has a major role in nucleosome positioning in vivo.

  8. Sequencing Needs for Viral Diagnostics

    Energy Technology Data Exchange (ETDEWEB)

    Gardner, S N; Lam, M; Mulakken, N J; Torres, C L; Smith, J R; Slezak, T

    2004-01-26

    We built a system to guide decisions regarding the amount of genomic sequencing required to develop diagnostic DNA signatures, which are short sequences that are sufficient to uniquely identify a viral species. We used our existing DNA diagnostic signature prediction pipeline, which selects regions of a target species genome that are conserved among strains of the target (for reliability, to prevent false negatives) and unique relative to other species (for specificity, to avoid false positives). We performed simulations, based on existing sequence data, to assess the number of genome sequences of a target species and of close phylogenetic relatives (''near neighbors'') that are required to predict diagnostic signature regions that are conserved among strains of the target species and unique relative to other bacterial and viral species. For DNA viruses such as variola (smallpox), three target genomes provide sufficient guidance for selecting species-wide signatures. Three near neighbor genomes are critical for species specificity. In contrast, most RNA viruses require four target genomes and no near neighbor genomes, since lack of conservation among strains is more limiting than uniqueness. SARS and Ebola Zaire are exceptional, as additional target genomes currently do not improve predictions, but near neighbor sequences are urgently needed. Our results also indicate that double stranded DNA viruses are more conserved among strains than are RNA viruses, since in most cases there was at least one conserved signature candidate for the DNA viruses and zero conserved signature candidates for the RNA viruses.

  9. Sequence Patterns of Identity Authentication Protocols

    Institute of Scientific and Technical Information of China (English)

    Tao Hongcai; He Dake

    2006-01-01

    From the viewpoint of protocol sequence, analyses are made of the sequence patterns of possible identity authentication protocol under two cases: with or without the trusted third party (TTP). Ten feasible sequence patterns of authentication protocol with TTP and 5 sequence patterns without TTP are gained. These gained sequence patterns meet the requirements for identity authentication,and basically cover almost all the authentication protocols with TTP and without TTP at present. All of the sequence patterns gained are classified into unilateral or bilateral authentication. Then , according to the sequence symmetry, several good sequence patterns with TTP are evaluated. The accompolished results can provide a reference to design of new identity authentication protocols.

  10. Comparative analysis of sequences from PT 2013

    DEFF Research Database (Denmark)

    Mikkelsen, Susie Sommer

    . All but one sequence mapped to the MCP gene while the last sequence mapped to the Neurofilament gene. Approx. half of the sequences contained no errors while the rest differed with 88-99 percent similarity with most having 99% similarity. One sequence, when BLASTed, showed most similarity to European...... Sheatfish and not EHNV. Generally, mistakes occurred at the ends of the sequences. This can be due to several factors. One is that the sequence has not been trimmed of the sequence primer sites. Another is the lack of quality control of the chromatogram. Finally, sequencing in just one direction can result...

  11. On the base sequence conjecture

    CERN Document Server

    Djokovic, Dragomir Z

    2010-01-01

    Let BS(m,n) denote the set of base sequences (A;B;C;D), with A and B of length m and C and D of length n. The base sequence conjecture (BSC) asserts that BS(n+1,n) exist (i.e., are non-empty) for all n. This is known to be true for n <= 36 and when n is a Golay number. We show that it is also true for n=37 and n=38. It is worth pointing out that BSC is stronger than the famous Hadamard matrix conjecture. In order to demonstrate the abundance of base sequences, we have previously attached to BS(n+1,n) a graph Gamma_n and computed the Gamma_n for n <= 27. We now extend these computations and determine the Gamma_n for n=28,...,35. We also propose a conjecture describing these graphs in general.

  12. Explaining the harmonic sequence paradox.

    Science.gov (United States)

    Schmidt, Ulrich; Zimper, Alexander

    2012-05-01

    According to the harmonic sequence paradox, an expected utility decision maker's willingness to pay for a gamble whose expected payoffs evolve according to the harmonic series is finite if and only if his marginal utility of additional income becomes zero for rather low payoff levels. Since the assumption of zero marginal utility is implausible for finite payoff levels, expected utility theory - as well as its standard generalizations such as cumulative prospect theory - are apparently unable to explain a finite willingness to pay. This paper presents first an experimental study of the harmonic sequence paradox. Additionally, it demonstrates that the theoretical argument of the harmonic sequence paradox only applies to time-patient decision makers, whereas the paradox is easily avoided if time-impatience is introduced.

  13. Transgressive Surface as Sequence Boundary

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    Analysis of the four cases of the sequence boundary (SB)-transgressive surface (TS) relation in nature shows that applying transgressive surfaces as sequence boundaries has the following merits: it improves the methodology of stratigraphic subdivision; the position of transgressive surface in a sea level curve is relatively fixed; the transgressive surface is a transforming surface of the stratal structure; in platforms or ramps, the transgressive surface is the only choice for determining the sequence boundary; the transgressive surface is a readily recognized physical surface reflected by seismic records in seismostratigraphy. The paper reaches a conclusion that to delineate a SB in terms of the TS is theoretically and practically better than to delineate it between highstand and lowstand sediments as has been done traditionally.

  14. Sequences and series involving the sequence of composite numbers

    Directory of Open Access Journals (Sweden)

    Panayiotis Vlamos

    2002-01-01

    Full Text Available Denoting by pn and cn the nth prime number and the nth composite number, respectively, we prove that both the sequence (xnn≥1, defined by xn=∑k=1n (ck+1−ck / k−pn / n, and the series ∑n=1∞ (pcn−cpn / npn are convergent.

  15. KERNEL WORDS AND GAP SEQUENCE OF THE TRIBONACCI SEQUENCE

    Institute of Scientific and Technical Information of China (English)

    Yuke HUANG; Zhiying WEN

    2016-01-01

    In this paper, we investigate the factor properties and gap sequence of the Tri-bonacci sequence, the fixed point of the substitution σ(a, b, c) = (ab, ac, a). Let ωp be the p-th occurrence of ω and Gp(ω) be the gap between ωp and ωp+1. We introduce a notion of kernel for each factor ω, and then give the decomposition of the factor ω with respect to its kernel. Using the kernel and the decomposition, we prove the main result of this paper:for each factorω, the gap sequence{Gp(ω)}p≥1 is the Tribonacci sequence over the alphabet{G1(ω), G2(ω), G4(ω)}, and the expressions of gaps are determined completely. As an appli-cation, for each factorω and p∈N, we determine the position ofωp. Finally we introduce a notion of spectrum for studying some typical combinatorial properties, such as power, overlap and separate of factors.

  16. Convergence of Fuzzy Set Sequences

    Institute of Scientific and Technical Information of China (English)

    FENG Yu-hu

    2002-01-01

    There are more than one mode of convergence with respect to the fuzzy set sequences. In this paper,common six modes of convergence and their relationships are discussed. These six modes are convergence in uniform metric D, convergence in separable metric Dp or D*p, 1 ≤ p <∞, convergence in level set, strong convergence in level set and weak convergence. Suitable counterexamples are given. The necessary and sufficient conditions of convergence in uniform metric D are described. Some comme nts on the convergence of LRfuzzy number sequences are represented.

  17. Integrated sequence analysis. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Andersson, K.; Pyy, P

    1998-02-01

    The NKS/RAK subprojet 3 `integrated sequence analysis` (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term `methodology` denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  18. DNA Sequencing Using capillary Electrophoresis

    Energy Technology Data Exchange (ETDEWEB)

    Dr. Barry Karger

    2011-05-09

    The overall goal of this program was to develop capillary electrophoresis as the tool to be used to sequence for the first time the Human Genome. Our program was part of the Human Genome Project. In this work, we were highly successful and the replaceable polymer we developed, linear polyacrylamide, was used by the DOE sequencing lab in California to sequence a significant portion of the human genome using the MegaBase multiple capillary array electrophoresis instrument. In this final report, we summarize our efforts and success. We began our work by separating by capillary electrophoresis double strand oligonucleotides using cross-linked polyacrylamide gels in fused silica capillaries. This work showed the potential of the methodology. However, preparation of such cross-linked gel capillaries was difficult with poor reproducibility, and even more important, the columns were not very stable. We improved stability by using non-cross linked linear polyacrylamide. Here, the entangled linear chains could move when osmotic pressure (e.g. sample injection) was imposed on the polymer matrix. This relaxation of the polymer dissipated the stress in the column. Our next advance was to use significantly lower concentrations of the linear polyacrylamide that the polymer could be automatically blown out after each run and replaced with fresh linear polymer solution. In this way, a new column was available for each analytical run. Finally, while testing many linear polymers, we selected linear polyacrylamide as the best matrix as it was the most hydrophilic polymer available. Under our DOE program, we demonstrated initially the success of the linear polyacrylamide to separate double strand DNA. We note that the method is used even today to assay purity of double stranded DNA fragments. Our focus, of course, was on the separation of single stranded DNA for sequencing purposes. In one paper, we demonstrated the success of our approach in sequencing up to 500 bases. Other

  19. Network motifs in music sequences

    CERN Document Server

    Zanette, Damian H

    2010-01-01

    In this note, I summarize ongoing research on motif distribution in networks built up out of symbolic sequences of Western musical origin. Their motif significance profiles exhibit remarkable consistency over different styles and periods, and define a class that cannot be identified with any of the four "superfamilies" to which most real networks seem to belong. Networks from music sequences possess an unusual abundance of bidirectional connections, due to the inherent reversibility of short musical note patterns. This property contributes to motif significance from both local and large-scale features of musical structure.

  20. The origin of biased sequence depth in sequence-independent nucleic acid amplification and optimization for efficient massive parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Toon Rosseel

    Full Text Available Sequence Independent Single Primer Amplification is one of the most widely used random amplification approaches in virology for sequencing template preparation. This technique relies on oligonucleotides consisting of a 3' random part used to prime complementary DNA synthesis and a 5' defined tag sequence for subsequent amplification. Recently, this amplification method was combined with next generation sequencing to obtain viral sequences. However, these studies showed a biased distribution of the resulting sequence reads over the analyzed genomes. The aim of this study was to elucidate the mechanisms that lead to biased sequence depth when using random amplification. Avian paramyxovirus type 8 was used as a model RNA virus to investigate these mechanisms. We showed, based on in silico analysis of the sequence depth in relation to GC-content, predicted RNA secondary structure and sequence complementarity to the 3' part of the tag sequence, that the tag sequence has the main contribution to the observed bias in sequence depth. We confirmed this finding experimentally using both fragmented and non-fragmented viral RNAs as well as primers differing in random oligomer length (6 or 12 nucleotides and in the sequence of the amplification tag. The observed oligonucleotide annealing bias can be reduced by extending the random oligomer sequence and by in silico combining sequence data from SISPA experiments using different 5' defined tag sequences. These findings contribute to the optimization of random nucleic acid amplification protocols that are currently required for downstream applications such as viral metagenomics and microarray analysis.

  1. Sequences in language and text

    CERN Document Server

    Mikros, George K

    2015-01-01

    The aim of this volume is to present the diverse but highly interesting area of the quantitative analysis of the sequence of various linguistic structures. The collected articles present a wide spectrum of quantitative analyses of linguistic syntagmatic structures and explore novel sequential linguistic entities. This volume will be interesting to all researchers studying linguistics using quantitative methods.

  2. Farey Sequences and Resistor Networks

    Indian Academy of Sciences (India)

    Sameen Ahmed Khan

    2012-05-01

    In this article, we employ the Farey sequence and Fibonacci numbers to establish strict upper and lower bounds for the order of the set of equivalent resistances for a circuit constructed from equal resistors combined in series and in parallel. The method is applicable for networks involving bridge and non-planar circuits.

  3. Multifractal analyses of music sequences

    Science.gov (United States)

    Su, Zhi-Yuan; Wu, Tzuyin

    2006-09-01

    Multifractal analysis is applied to study the fractal property of music. In this paper, a method is proposed to transform both the melody and rhythm of a music piece into individual sets of distributed points along a one-dimensional line. The structure of the musical composition is thus manifested and characterized by the local clustering pattern of these sequences of points. Specifically, the local Hölder exponent and the multifractal spectrum are calculated for the transformed music sequences according to the multifractal formalism. The observed fluctuations of the Hölder exponent along the music sequences confirm the non-uniformity feature in the structures of melodic and rhythmic motions of music. Our present result suggests that the shape and opening width of the multifractal spectrum plot can be used to distinguish different styles of music. In addition, a characteristic curve is constructed by mapping the point sequences converted from the melody and rhythm of a musical work into a two-dimensional graph. Each different pieces of music has its own unique characteristic curve. This characteristic curve, which also exhibits a fractal trait, unveils the intrinsic structure of music.

  4. Single-primer fluorescent sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Ruth, J.L.; Morgan, C.A.; Middendorf, L.R.; Grone, D.L.; Brumbaugh, J.A.

    1987-05-01

    Modified linker arm oligonucleotides complementary to standard M13 priming sites were synthesized, labelled with either one, two, or three fluoresceins, and purified by reverse-phase HPLC. When used as primers in standard dideoxy M13 sequencing with /sup 32/P-dNTPs, normal autoradiographic patterns were obtained. To eliminate the radioactivity, direct on-line fluorescence detection was achieved by the use of a scanning 10 mW Argon laser emitting 488 nm light. Fluorescent bands were detected directly in standard 0.2 or 0.35 mm thick polyacrylamide gels at a distance of 24 cm from the loading wells by a photomultiplier tube filtered at 520 nm. Horizontal and temporal location of each band was displayed by computer as a band in real time, providing visual appearance similar to normal 4-lane autoradiograms. Using a single primer labelled with two fluoresceins, sequences of between 500 and 600 bases have been read in a single loading with better than 98% accuracy; up to 400 bases can be read reproducibly with no errors. More than 50 sequences have been determined by this method. This approach requires only 1-2 ug of cloned template, and produces continuous sequence data at about one band per minute.

  5. Fractals in DNA sequence analysis

    Institute of Scientific and Technical Information of China (English)

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  6. The Toothpick Sequence and Other Sequences from Cellular Automata

    CERN Document Server

    Applegate, David; Sloane, N J A

    2010-01-01

    A two-dimensional arrangement of toothpicks is constructed by the following iterative procedure. At stage 1, place a single toothpick of length 1 on a square grid, aligned with the y-axis. At each subsequent stage, for every exposed toothpick end, place an orthogonal toothpick centered at that end. The resulting structure has a fractal-like appearance. We will analyze the toothpick sequence, which gives the total number of toothpicks after n steps. We also study several related sequences that arise from enumerating active cells in cellular automata. Some unusual recurrences appear: a typical example is that instead of the Fibonacci recurrence, which we may write as a(2+i) = a(i) + a(i+1), we set n = 2^k+i (0 = 0} (1+x^{2^k-1}+2x^{2^k}) and variations thereof.

  7. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol;

    2010-01-01

    BACKGROUND: The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. RESULTS: Assemblies......) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30x genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were...

  8. Stream cipher based on GSS sequences

    Institute of Scientific and Technical Information of China (English)

    HU Yupu; XIAO Guozhen

    2004-01-01

    Generalized self-shrinking sequences, simply named the GSS sequences,are novel periodic sequences that have many advantages in cryptography. In this paper,we give several results about GSS sequence's application to cryptography. First, we give a simple method for selecting those GSS sequences whose least periods reach the maximum. Second, we give a method for describing and computing the auto-correlation coefficients of GSS sequences. Finally, we point out that some GSS sequences, when used as stream ciphers, have a security weakness.

  9. Sequence-structure relations of biopolymers

    CERN Document Server

    Barrett, Christopher; Reidys, Christian M

    2015-01-01

    Motivation: DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identify more general embedded "patterns" in DNA and RNA sequences. Results: We compute the partition function of sequences with respect to a fixed structure and connect this computation to the mutual information of a sequence-structure pair for RNA secondary structures. We present a Boltzmann sampler and obtain the a priori probability of specific sequence patterns. We present a detailed analysis for the three PDB-structures, 2JXV (hairpin), 2N3R (3-branch multi-loop) and 1EHZ (tRNA). We localize specific sequence patterns, contrast the energy spectrum of the Boltzmann sampled sequences versus those sequences that refold ...

  10. Expressing stochastic filters via number sequences

    OpenAIRE

    Capponi, A.; Farina, A; Pilotto, C.

    2010-01-01

    We generalize the results presented in [1] regarding the relation between the Kalman filter and the Fibonacci sequence. We consider more general filtering models and relate the finite dimensional Kalman and Benes filters to the Fibonacci sequence and to the Golden Section. We also prove that Fibonacci numbers may be expressed as the convolution of the Fibonacci and Padovan sequence, thus extending the connection between stochastic filtering and Fibonacci sequence to the Padovan sequence.

  11. Nonparametric Inference for Periodic Sequences

    KAUST Repository

    Sun, Ying

    2012-02-01

    This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.

  12. Genome Sequence of Mycobacteriophage Momo.

    Science.gov (United States)

    Pope, Welkin H; Bina, Elizabeth A; Brahme, Indraneel S; Hill, Amy B; Himmelstein, Philip H; Hunsicker, Sara M; Ish, Amanda R; Le, Tinh S; Martin, Mary M; Moscinski, Catherine N; Shetty, Sameer A; Swierzewski, Tomasz; Iyengar, Varun B; Kim, Hannah; Schafer, Claire E; Grubb, Sarah R; Warner, Marcie H; Bowman, Charles A; Russell, Daniel A; Hatfull, Graham F

    2015-06-18

    Momo is a newly discovered phage of Mycobacterium smegmatis mc(2)155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages.

  13. Multiplicative LSTM for sequence modelling

    OpenAIRE

    Krause, Ben; Lu, Liang; Murray, Iain; Renals, Steve

    2016-01-01

    This paper introduces multiplicative LSTM, a novel hybrid recurrent neural network architecture for sequence modelling that combines the long short-term memory (LSTM) and multiplicative recurrent neural network architectures. Multiplicative LSTM is motivated by its flexibility to have very different recurrent transition functions for each possible input, which we argue helps make it more expressive in autoregressive density estimation. We show empirically that multiplicative LSTM outperforms ...

  14. Multineuronal Spike Sequences Repeat with Millisecond Precision

    Directory of Open Access Journals (Sweden)

    Koki eMatsumoto

    2013-06-01

    Full Text Available Cortical microcircuits are nonrandomly wired by neurons. As a natural consequence, spikes emitted by microcircuits are also nonrandomly patterned in time and space. One of the prominent spike organizations is a repetition of fixed patterns of spike series across multiple neurons. However, several questions remain unsolved, including how precisely spike sequences repeat, how the sequences are spatially organized, how many neurons participate in sequences, and how different sequences are functionally linked. To address these questions, we monitored spontaneous spikes of hippocampal CA3 neurons ex vivo using a high-speed functional multineuron calcium imaging technique that allowed us to monitor spikes with millisecond resolution and to record the location of spiking and nonspiking neurons. Multineuronal spike sequences were overrepresented in spontaneous activity compared to the statistical chance level. Approximately 75% of neurons participated in at least one sequence during our observation period. The participants were sparsely dispersed and did not show specific spatial organization. The number of sequences relative to the chance level decreased when larger time frames were used to detect sequences. Thus, sequences were precise at the millisecond level. Sequences often shared common spikes with other sequences; parts of sequences were subsequently relayed by following sequences, generating complex chains of multiple sequences.

  15. Method and apparatus for biological sequence comparison

    Science.gov (United States)

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  16. Static multiplicities in heterogeneous azeotropic distillation sequences

    DEFF Research Database (Denmark)

    Esbjerg, Klavs; Andersen, Torben Ravn; Jørgensen, Sten Bay;

    1998-01-01

    In this paper the results of a bifurcation analysis on heterogeneous azeotropic distillation sequences are given. Two sequences suitable for ethanol dehydration are compared: The 'direct' and the 'indirect' sequence. It is shown, that the two sequences, despite their similarities, exhibit very...... performances are compared for minimal impurities in both products, where the direct sequence exhibits output multiplicity, while the indirect sequence exhibits state multiplicity. The latter multiplicity may be avoided by accepting a slightly increased impurity in the ethanol product. Copyright (C) 1998 IFAC....

  17. On Inclusion Relations between Some Sequence Spaces

    Directory of Open Access Journals (Sweden)

    R. Çolak

    2016-01-01

    Full Text Available We determine the relations between the classes S^λ of almost λ-statistically convergent sequences and the relations between the classes V^,λ of strongly almost V,λ-summable sequences for various sequences λ, μ in the class Λ. Furthermore we also give the relations between the classes S^λ of almost λ-statistically convergent sequences and the classes V^,λ of strongly almost V,λ-summable sequences for various sequences λ,μ∈Λ.

  18. Bernoulli measure of complex admissible kneading sequences

    CERN Document Server

    Bruin, Henk

    2012-01-01

    Iterated quadratic polynomials give rise to a rich collection of different dynamical systems that are parametrized by a simple complex parameter $c$. The different dynamical features are encoded by the \\emph{kneading sequence} which is an infinite sequence over $\\{0,\\1\\}$. Not every such sequence actually occurs in complex dynamics. The set of admissible kneading sequences was described by Milnor and Thurston for real quadratic polynomials, and by the authors in the complex case. We prove that the set of admissible kneading sequences has positive Bernoulli measure within the set of sequences over $\\{0,\\1\\}$.

  19. Modified Genetic Algorithm for DNA Sequence Assembly by Shotgun and Hybridization Sequencing Techniques

    Directory of Open Access Journals (Sweden)

    Prof.Narayan Kumar Sahu

    2012-09-01

    Full Text Available Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion of pair wise fragment overlap. While shotgun sequencing infers a DNA sequence given the sequences of overlapping fragments, a recent and complementary method, called sequencing by hybridization (SBH, infers a DNA sequence given the set of oligomers that represents all sub words of some fixed length, k. In this paper, we propose a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods. Based on our preliminary investigations, the algorithm promises- to be very fast and practical for DNA sequence assembly [1].

  20. Biomolecule Sequencer: Nanopore Sequencing Technology for In-Situ Environmental Monitoring and Astrobiology

    Science.gov (United States)

    John, K. K.; Botkin, D. J.; Burton, A. S.; Castro-Wallace, S. L.; Chaput, J. D.; Dworkin, J. P.; Lupisella, M. L.; Mason, C. E.; Rubins, K. H.; Smith, D. J.; Stahl, S.; Switzer, C.

    2016-10-01

    Biomolecule Sequencer will demonstrate, for the first time, that DNA sequencing is feasible as a tool for in-situ environmental monitoring and astrobiology. A space-based sequencer could identify microbes, diseases, and help detect DNA-based life.

  1. Integration of retinal image sequences

    Science.gov (United States)

    Ballerini, Lucia

    1998-10-01

    In this paper a method for noise reduction in ocular fundus image sequences is described. The eye is the only part of the human body where the capillary network can be observed along with the arterial and venous circulation using a non invasive technique. The study of the retinal vessels is very important both for the study of the local pathology (retinal disease) and for the large amount of information it offers on systematic haemodynamics, such as hypertension, arteriosclerosis, and diabetes. In this paper a method for image integration of ocular fundus image sequences is described. The procedure can be divided in two step: registration and fusion. First we describe an automatic alignment algorithm for registration of ocular fundus images. In order to enhance vessel structures, we used a spatially oriented bank of filters designed to match the properties of the objects of interest. To evaluate interframe misalignment we adopted a fast cross-correlation algorithm. The performances of the alignment method have been estimated by simulating shifts between image pairs and by using a cross-validation approach. Then we propose a temporal integration technique of image sequences so as to compute enhanced pictures of the overall capillary network. Image registration is combined with image enhancement by fusing subsequent frames of a same region. To evaluate the attainable results, the signal-to-noise ratio was estimated before and after integration. Experimental results on synthetic images of vessel-like structures with different kind of Gaussian additive noise as well as on real fundus images are reported.

  2. Discrete low-discrepancy sequences

    CERN Document Server

    Angel, Omer; Martin, James B; Propp, James

    2009-01-01

    Holroyd and Propp used Hall's marriage theorem to show that, given a probability distribution pi on a finite set S, there exists an infinite sequence s_1,s_2,... in S such that for all integers k >= 1 and all s in S, the number of i in [1,k] with s_i = s differs from k pi(s) by at most 1. We prove a generalization of this result using a simple explicit algorithm. A special case of this algorithm yields an extension of Holroyd and Propp's result to the case of discrete probability distributions on infinite sets.

  3. Infinite matrices and sequence spaces

    CERN Document Server

    Cooke, Richard G

    2014-01-01

    This clear and correct summation of basic results from a specialized field focuses on the behavior of infinite matrices in general, rather than on properties of special matrices. Three introductory chapters guide students to the manipulation of infinite matrices, covering definitions and preliminary ideas, reciprocals of infinite matrices, and linear equations involving infinite matrices.From the fourth chapter onward, the author treats the application of infinite matrices to the summability of divergent sequences and series from various points of view. Topics include consistency, mutual consi

  4. Asymptotics of Lagged Fibonacci Sequences

    CERN Document Server

    Mertens, Stephan

    2009-01-01

    Consider "lagged" Fibonacci sequences $a(n) = a(n-1)+a(\\lfloor n/k\\rfloor)$ for $k > 1$. We show that $\\lim_{n\\to\\infty} a(kn)/a(n)\\cdot\\ln n/n = k\\ln k$ and we demonstrate the slow numerical convergence to this limit and how to deal with this slow convergence. We also discuss the connection between two classical results of N.G. de Bruijn and K. Mahler on the asymptotics of $a(n)$.

  5. Genome Sequences of Eight Morphologically Diverse Alphaproteobacteria▿

    OpenAIRE

    Brown, Pamela J.B.; Kysela, David T.; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V

    2011-01-01

    The Alphaproteobacteriacomprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium.

  6. Genome sequences of eight morphologically diverse Alphaproteobacteria.

    Science.gov (United States)

    Brown, Pamela J B; Kysela, David T; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V

    2011-09-01

    The Alphaproteobacteria comprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium.

  7. Genome Sequences of Eight Morphologically Diverse Alphaproteobacteria▿

    Science.gov (United States)

    Brown, Pamela J. B.; Kysela, David T.; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V.

    2011-01-01

    The Alphaproteobacteriacomprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium. PMID:21705585

  8. On topological spaces possessing uniformly distributed sequences

    CERN Document Server

    Bogachev, V I

    2007-01-01

    Two classes of topological spaces are introduced on which every probability Radon measure possesses a uniformly distributed sequence or a uniformly tight uniformly distributed sequence. It is shown that these classes are stable under multiplication by completely regular Souslin spaces

  9. The Art of Gymnastics: Creating Sequences.

    Science.gov (United States)

    Rovegno, Inez

    1988-01-01

    Offering students opportunities for creating movement sequences in gymnastics allows them to understand the essence of gymnastics, have creative experiences, and learn about themselves. The process of creating sequences is described. (MT)

  10. FOGSAA: Fast Optimal Global Sequence Alignment Algorithm

    Science.gov (United States)

    Chakraborty, Angana; Bandyopadhyay, Sanghamitra

    2013-04-01

    In this article we propose a Fast Optimal Global Sequence Alignment Algorithm, FOGSAA, which aligns a pair of nucleotide/protein sequences faster than any optimal global alignment method including the widely used Needleman-Wunsch (NW) algorithm. FOGSAA is applicable for all types of sequences, with any scoring scheme, and with or without affine gap penalty. Compared to NW, FOGSAA achieves a time gain of (70-90)% for highly similar nucleotide sequences (> 80% similarity), and (54-70)% for sequences having (30-80)% similarity. For other sequences, it terminates with an approximate score. For protein sequences, the average time gain is between (25-40)%. Compared to three heuristic global alignment methods, the quality of alignment is improved by about 23%-53%. FOGSAA is, in general, suitable for aligning any two sequences defined over a finite alphabet set, where the quality of the global alignment is of supreme importance.

  11. On Paranorm Zweier -Convergent Sequence Spaces

    Directory of Open Access Journals (Sweden)

    Vakeel A. Khan

    2013-01-01

    Full Text Available In this paper, we introduce the paranorm Zweier -convergent sequence spaces , , and , a sequence of positive real numbers. We study some topological properties, prove the decomposition theorem, and study some inclusion relations on these spaces.

  12. On General Fibonacci Sequences in Groups

    OpenAIRE

    Özkan, Engin

    2003-01-01

    In this paper, we have constituted 3-step general Fibonacci sequences in a nilpotent group with exponent p (p is a prime number) and nilpotency class 4 and given formulas to find the a term of the sequence.

  13. Cross-correlation properties of cyclotomic sequences

    CERN Document Server

    Cai, Kai; Zheng, Zhiming

    2009-01-01

    Sequences with good correlation properties are widely used in engineering applications, especially in the area of communications. Among the known sequences, cyclotomic families have the optimal autocorrelation property. In this paper, we decide the cross-correlation function of the known cyclotomic sequences completely. Moreover, to get our results, the relations between the multiplier group and the decimations of the characteristic sequence are also established for an arbitrary difference set.

  14. Information Analysis of DNA Sequences

    CERN Document Server

    Mohammed, Riyazuddin

    2010-01-01

    The problem of differentiating the informational content of coding (exons) and non-coding (introns) regions of a DNA sequence is one of the central problems of genomics. The introns are estimated to be nearly 95% of the DNA and since they do not seem to participate in the process of transcription of amino-acids, they have been termed "junk DNA." Although it is believed that the non-coding regions in genomes have no role in cell growth and evolution, demonstration that these regions carry useful information would tend to falsify this belief. In this paper, we consider entropy as a measure of information by modifying the entropy expression to take into account the varying length of these sequences. Exons are usually much shorter in length than introns; therefore the comparison of the entropy values needs to be normalized. A length correction strategy was employed using randomly generated nucleonic base strings built out of the alphabet of the same size as the exons under question. Our analysis shows that intron...

  15. Strong sequences and independent sets

    Directory of Open Access Journals (Sweden)

    Joanna Jureczko

    2016-05-01

    Full Text Available A family $\\mathcal{S} \\in \\mathcal{P}(\\omega$ is \\textit{an independent family} if for each pair $\\mathcal{A, B}$ of disjoint finite subsets of $\\mathcal{S}$ the set $\\bigcap \\mathcal{A} \\cap (\\omega \\setminus \\bigcup \\mathcal{B}$ is nonempty. The fact that there is an independent family on $\\omega$ of size continuum was proved by Fichtenholz and Kantorowicz in \\cite{FK}. If we substitute $\\mathcal{P}(\\omega$ by a set $(X, r$ with arbitrary relation \\textit{r} it is natural question about existence and length of an independent set on $(X, r$. In this paper special assumptions of such existence will be considered. On the other hand in 60s' of the last century the strong sequences method was introduced by Efimov. He used it for proving some famous theorems in dyadic spaces like: Marczewski theorem on cellularity, Shanin theorem on a calibre, Esenin-Volpin theorem and others. In this paper there will be considered: length of strong sequences, the length of independent sets and other well known cardinal invariants and there will be examined inequalities among them.

  16. Sequence Analysis in Demographic Research

    Directory of Open Access Journals (Sweden)

    Billari, Francesco C.

    2001-01-01

    Full Text Available EnglishThis paper examines the salient features of sequence analysis in demogrpahicresearch. The new approach allows a holistic perspective on life course analysis and is based on arepresentation of lives as sequences of states. Some of the methods for analyzing such data aresketched, from complex description to optimal matching ot monoethetic divisive algorithms. Afer ashort ilustration of a demographically-relevant example, the needs in terms of data collection and theopportunities of applying the same aproach to synthetic data are discussed.FrenchOn examine ici les principaux éléments de l’analyse par séquence endémographie. Cette nouvelle technique permet une perspective unifiée del’analyse du cours de la vie, en représentant la vie comme une série d’états.Certaines des méthodes pour de telles analyses sont décrites, en commençant parla description complexe, pour considérer ensuite les alignements optimales, etles algorithmes de division. Après un court exemple en démographie, onconsidère les besoins en données et les possibilités d’application aux donnéessynthétique.

  17. Hardware Acceleration of Bioinformatics Sequence Alignment Applications

    NARCIS (Netherlands)

    Hasan, L.

    2011-01-01

    Biological sequence alignment is an important and challenging task in bioinformatics. Alignment may be defined as an arrangement of two or more DNA or protein sequences to highlight the regions of their similarity. Sequence alignment is used to infer the evolutionary relationship between a set of pr

  18. Incidental Sequence Learning across the Lifespan

    Science.gov (United States)

    Weiermann, Brigitte; Meier, Beat

    2012-01-01

    The purpose of the present study was to investigate incidental sequence learning across the lifespan. We tested 50 children (aged 7-16), 50 young adults (aged 20-30), and 50 older adults (aged >65) with a sequence learning paradigm that involved both a task and a response sequence. After several blocks of practice, all age groups slowed down…

  19. Comparative genomics beyond sequence-based alignments

    DEFF Research Database (Denmark)

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.;

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...

  20. PacBio Sequencing and Its Applications

    Institute of Scientific and Technical Information of China (English)

    Anthony Rhoads; Kin Fai Au

    2015-01-01

    Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with dis-eases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Addition-ally, PacBio’s sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone.

  1. The recurrence sequence via the Fibonacci groups

    Science.gov (United States)

    Aküzüm, Yeşim; Deveci, Ömür

    2016-04-01

    This work develops properties of the recurrence sequence defined by the aid of the relation matrix of the Fibonacci groups. The study of this sequence modulo m yields cyclic groups and semigroups from generating matrix. Finally, we extend the sequence defined to groups and then, we obtain its period in the Fibonacci groups.

  2. RNAome sequencing delineates the complete RNA landscape

    NARCIS (Netherlands)

    K.W.J. Derks (Kasper); J. Pothof (Joris)

    2015-01-01

    textabstractStandard RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species. For example, small and large RNAs from the same sample cannot be sequenced in a single sequence run. We designed RNAome sequencing, which is a strand-specif

  3. Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

    Directory of Open Access Journals (Sweden)

    Leho Tedersoo

    Full Text Available Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/ for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/, the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

  4. Exome sequencing: what clinicians need to know

    Directory of Open Access Journals (Sweden)

    Sastre L

    2014-03-01

    Full Text Available Leandro SastreInstituto de Investigaciones Biomédicas, CSIC/UAM, C/Arturo Duperier 4, Madrid, Spain; Terapias Experimentales y Biomarcadores en Cáncer, IdiPaz, Madrid, Spain; CIBER de Enfermedades Raras, CIBERER, Valencia, SpainAbstract: The recent development of high throughput methods of deoxyribonucleic acid (DNA sequencing has made it possible to determine individual genome sequences and their specific variations. A region of particular interest is the protein-coding part of the genome, or exome, which is composed of gene exons. The principles of exome purification and sequencing will be described in this review, as well as analyses of the data generated. Results will be discussed in terms of their possible functional and clinical significance. The advantages and limitations of exome sequencing will be compared to those of other massive sequencing approaches such as whole-genome sequencing, ribonucleic acid sequencing or selected DNA sequencing. Exome sequencing has been used recently in the study of various diseases. Monogenic diseases with Mendelian inheritance are among these, but studies have also been carried out on genetic variations that represent risk factors for complex diseases. Cancer is another intensive area for exome sequencing studies. Several examples of the use of exome sequencing in the diagnosis, prognosis, and treatment of these diseases will be described. Finally, remaining challenges and some practical and ethical considerations for the clinical application of exome sequencing will be discussed.Keywords: massively parallel sequencing, RNA sequencing, whole-genome sequencing, genetic variants, molecular diagnosis, pharmacogenomics, personalized medicine, NGS, SGS, SNP, SNV

  5. Value of a newly sequenced bacterial genome

    DEFF Research Database (Denmark)

    Barbosa, Eudes; Aburjaile, Flavia F; Ramos, Rommel Tj;

    2014-01-01

    Next-generation sequencing (NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also...... in an exponential increase in draft (partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome...

  6. MatrixPlot: visualizing sequence constraints

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Stærfeldt, Hans Henrik; Lund, Ole

    1999-01-01

    MatrixPlot: visualizing sequence constraints. Sub-title Abstract Summary : MatrixPlot is a program for making high-quality matrix plots, such as mutual information plots of sequence alignments and distance matrices of sequences with known three-dimensional coordinates. The user can add information...... about the sequences (e.g. a sequence logo profile) along the edges of the plot, as well as zoom in on any region in the plot. Availability : MatrixPlot can be obtained on request, and can also be accessed online at http://www. cbs.dtu.dk/services/MatrixPlot. Contact : gorodkin@cbs.dtu.dk...

  7. Chip-based sequencing nucleic acids

    Science.gov (United States)

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  8. Permutation Entropy for Random Binary Sequences

    Directory of Open Access Journals (Sweden)

    Lingfeng Liu

    2015-12-01

    Full Text Available In this paper, we generalize the permutation entropy (PE measure to binary sequences, which is based on Shannon’s entropy, and theoretically analyze this measure for random binary sequences. We deduce the theoretical value of PE for random binary sequences, which can be used to measure the randomness of binary sequences. We also reveal the relationship between this PE measure with other randomness measures, such as Shannon’s entropy and Lempel–Ziv complexity. The results show that PE is consistent with these two measures. Furthermore, we use PE as one of the randomness measures to evaluate the randomness of chaotic binary sequences.

  9. Computing with Hereditarily Finite Sequences

    CERN Document Server

    Tarau, Paul

    2011-01-01

    e use Prolog as a flexible meta-language to provide executable specifications of some fundamental mathematical objects and their transformations. In the process, isomorphisms are unraveled between natural numbers and combinatorial objects (rooted ordered trees representing hereditarily finite sequences and rooted ordered binary trees representing G\\"odel's System {\\bf T} types). This paper focuses on an application that can be seen as an unexpected "paradigm shift": we provide recursive definitions showing that the resulting representations are directly usable to perform symbolically arbitrary-length integer computations. Besides the theoretically interesting fact of "breaking the arithmetic/symbolic barrier", the arithmetic operations performed with symbolic objects like trees or types turn out to be genuinely efficient -- we derive implementations with asymptotic performance comparable to ordinary bitstring implementations of arbitrary-length integer arithmetic. The source code of the paper, organized as a ...

  10. [DNA sequencing technology and automatization of it].

    Science.gov (United States)

    Kraev, A S

    1991-01-01

    Precise manipulations with genetic material, typical for modern experiments in molecular biology and in new biotechnology, require a capability to determine DNA base sequence. This capability enables today to exploit specific genetic knowledge for the dissection of complex cell processes and for modulation of cell metabolism in transgenic organisms. The review focuses on such DNA sequencing technologies that are widespread in general laboratory practice. They can safely be called, with the availability of commercial reagents, industrial techniques. Modern DNA sequencing requires recurrent breakdown of large genomic DNA into smaller pieces, that are then amplified, sequenced and the initial long stretch reconstructed via overlap of small pieces. The DNA sequencing process has several steps: a DNA fragment is obtained in sufficient quantity and purity, it is converted to a form suitable for a particular sequencing method, a sequencing reaction is performed and its products fractionated; and finally the resultant data are interpreted (i.e. an autoradiograph is read into a computer memory) and a long sequence in reconstructed via overlap of short stretches. These steps are considered in separate parts; an accent is made on sequencing strategies with respect to their biological task. In the last part, possibilities for automation of sequencing experiment are considered, followed by a discussion of domestic problems in DNA sequencing.

  11. Design of Digital Hybrid Chaotic Sequence Generator

    Institute of Scientific and Technical Information of China (English)

    RAO Nini; ZENG Dong

    2004-01-01

    The feasibility of the hybrid chaotic sequences as the spreading codes in code divided multiple access(CDMA) system is analyzed.The design and realization of the digital hybrid chaotic sequence generator by very high speed integrated circuit hardware description language(VHDL) are described.A valid hazard canceledl method is presented.Computer simulations show that the stable digital sequence waveforms can be produced.The correlations of the digital hybrid chaotic sequences are compared with those of m-sequences.The results show that the correlations of the digital hybrid chaotic sequences are almost as good as those of m-sequences.The works in this paper explored a road for the practical applications of chaos.

  12. Assembly Sequence Planning for Mechanical Products

    Institute of Scientific and Technical Information of China (English)

    1999-01-01

    A method for assembly sequence planning is proposed in this paper. First, two methods for assembly sequence planning are compared, which are indirect method and direct method. Then, the limits of the previous assembly planning system are pointed out. On the basis of indirect method, an improved method for assembly sequence planning is put forward. This method is composed of four parts, which are assembly modeling for products, assembly sequence representing, assembly sequence planning, and evaluation and optimization. The assembly model is established by human machine interaction, and the assembly model contains components' information and the assembly relation among the components. The assembly sequence planning is based on the breaking up of the assembly model. And/or graph is used to represent assembly sequence set. Every component which satisfies the disassembly condition is recorded as a node of an and/or graph. After the disassembly sequence and/or graph is generated, heuristic algorithm - AO* algorithm is used to search the disassembly sequence and/or graph, and the optimum assembly sequence planning is realized. This method is proved to be effective in a prototype system which is a sub-project of a state 863/CIMS research project of China - ‘Concurrent Engineering’.

  13. Randomness in Sequence Evolution Increases over Time.

    Directory of Open Access Journals (Sweden)

    Guangyu Wang

    Full Text Available The second law of thermodynamics states that entropy, as a measure of randomness in a system, increases over time. Although studies have investigated biological sequence randomness from different aspects, it remains unknown whether sequence randomness changes over time and whether this change consists with the second law of thermodynamics. To capture the dynamics of randomness in molecular sequence evolution, here we detect sequence randomness based on a collection of eight statistical random tests and investigate the randomness variation of coding sequences with an application to Escherichia coli. Given that core/essential genes are more ancient than specific/non-essential genes, our results clearly show that core/essential genes are more random than specific/non-essential genes and accordingly indicate that sequence randomness indeed increases over time, consistent well with the second law of thermodynamics. We further find that an increase in sequence randomness leads to increasing randomness of GC content and longer sequence length. Taken together, our study presents an important finding, for the first time, that sequence randomness increases over time, which may provide profound insights for unveiling the underlying mechanisms of molecular sequence evolution.

  14. Comparison of sequence reads obtained from three next-generation sequencing platforms.

    Directory of Open Access Journals (Sweden)

    Shingo Suzuki

    Full Text Available Next-generation sequencing technologies enable the rapid cost-effective production of sequence data. To evaluate the performance of these sequencing technologies, investigation of the quality of sequence reads obtained from these methods is important. In this study, we analyzed the quality of sequence reads and SNP detection performance using three commercially available next-generation sequencers, i.e., Roche Genome Sequencer FLX System (FLX, Illumina Genome Analyzer (GA, and Applied Biosystems SOLiD system (SOLiD. A common genomic DNA sample obtained from Escherichia coli strain DH1 was applied to these sequencers. The obtained sequence reads were aligned to the complete genome sequence of E. coli DH1, to evaluate the accuracy and sequence bias of these sequence methods. We found that the fraction of "junk" data, which could not be aligned to the reference genome, was largest in the data set of SOLiD, in which about half of reads could not be aligned. Among data sets after alignment to the reference, sequence accuracy was poorest in GA data sets, suggesting relatively low fidelity of the elongation reaction in the GA method. Furthermore, by aligning the sequence reads to the E. coli strain W3110, we screened sequence differences between two E. coli strains using data sets of three different next-generation platforms. The results revealed that the detected sequence differences were similar among these three methods, while the sequence coverage required for the detection was significantly small in the FLX data set. These results provided valuable information on the quality of short sequence reads and the performance of SNP detection in three next-generation sequencing platforms.

  15. Pareto optimal pairwise sequence alignment.

    Science.gov (United States)

    DeRonne, Kevin W; Karypis, George

    2013-01-01

    Sequence alignment using evolutionary profiles is a commonly employed tool when investigating a protein. Many profile-profile scoring functions have been developed for use in such alignments, but there has not yet been a comprehensive study of Pareto optimal pairwise alignments for combining multiple such functions. We show that the problem of generating Pareto optimal pairwise alignments has an optimal substructure property, and develop an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. All possible sets of two, three, and four profile scoring functions are used from a pool of 11 functions and applied to 588 pairs of proteins in the ce_ref data set. The performance of the best objective combinations on ce_ref is also evaluated on an independent set of 913 protein pairs extracted from the BAliBASE RV11 data set. Our dynamic-programming-based heuristic approach produces approximated Pareto optimal frontiers of pairwise alignments that contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments whose quality is better than the alignments obtained by single objectives. However, the task of identifying a single high-quality alignment among those in the Pareto frontier remains challenging.

  16. Continued fractions and heavy sequences

    CERN Document Server

    Boshernitzan, Michael

    2009-01-01

    We initiate the study of the sets $H(c)$, $0=x-[x]$ stands for the fractional part of $x\\in \\mathbb R$. We prove that, for rational $c$, the sets $H(c)$ are of positive Hausdorff dimension and, in particular, are uncountable. For integers $m\\geq1$, we obtain a surprising characterization of the numbers $\\alpha\\in H_m= H(\\frac1m)$ in terms of their continued fraction expansions: The odd entries (partial quotients) of these expansions are divisible by $m$. The characterization implies that $x\\in H_m$ if and only if $\\frac 1{mx} \\in H_m$, for $x>0$. We are unaware of a direct proof of this equivalence, without making a use of the mentioned characterization of the sets $H_m$. We also introduce the dual sets $\\hat H_m$ of reals $y$ for which the sequence of integers $\\big([ky]\\big)_{k\\geq1}$ consistently hits the set $m\\mathbb Z$ with the at least expected frequency $\\frac1m$ and establish the connection with the sets $H_m$: {2mm} If $xy=m$ for $x,y>0$, then $x\\in H_m$ if and only if $y\\in \\hat H_m$. The motivatio...

  17. Predicting Contextual Sequences via Submodular Function Maximization

    CERN Document Server

    Dey, Debadeepta; Hebert, Martial; Bagnell, J Andrew

    2012-01-01

    Sequence optimization, where the items in a list are ordered to maximize some reward has many applications such as web advertisement placement, search, and control libraries in robotics. Previous work in sequence optimization produces a static ordering that does not take any features of the item or context of the problem into account. In this work, we propose a general approach to order the items within the sequence based on the context (e.g., perceptual information, environment description, and goals). We take a simple, efficient, reduction-based approach where the choice and order of the items is established by repeatedly learning simple classifiers or regressors for each "slot" in the sequence. Our approach leverages recent work on submodular function maximization to provide a formal regret reduction from submodular sequence optimization to simple cost-sensitive prediction. We apply our contextual sequence prediction algorithm to optimize control libraries and demonstrate results on two robotics problems: ...

  18. Evolutionarily conserved sequences on human chromosome 21

    Energy Technology Data Exchange (ETDEWEB)

    Frazer, Kelly A.; Sheehan, John B.; Stokowski, Renee P.; Chen, Xiyin; Hosseini, Roya; Cheng, Jan-Fang; Fodor, Stephen P.A.; Cox, David R.; Patil, Nila

    2001-09-01

    Comparison of human sequences with the DNA of other mammals is an excellent means of identifying functional elements in the human genome. Here we describe the utility of high-density oligonucleotide arrays as a rapid approach for comparing human sequences with the DNA of multiple species whose sequences are not presently available. High-density arrays representing approximately 22.5 Mb of nonrepetitive human chromosome 21 sequence were synthesized and then hybridized with mouse and dog DNA to identify sequences conserved between humans and mice (human-mouse elements) and between humans and dogs (human-dog elements). Our data show that sequence comparison of multiple species provides a powerful empiric method for identifying actively conserved elements in the human genome. A large fraction of these evolutionarily conserved elements are present in regions on chromosome 21 that do not encode known genes.

  19. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  20. Some fundamenltal problems in outcrop sequence stratigraphy

    Institute of Scientific and Technical Information of China (English)

    王训练

    1999-01-01

    Some fundamental problems in outcrop sequence stratigraphy are discussed, and the following ideas are obtained: (i) Detailed sedimentary facies analysis and study on stacking pattern of parasequences, careful and accurate study of biostratigraphy, and stratigraphical correlation of different facies areas are the essential conditions for proper identification of sequences. (ii) The first flooding surface may be an ideal sequence boundary in outcrop sequence stratigraphy, where the most distinct palaeontological and sedimentary changes take place and make the surface readily recognizable in outcrop. (iii) The distribution in space, specially in different facies belts, is regarded as an important criterion for defining and recognizing the various orders of sequences. The third-order sequence is probably global in nature , which may be discerned in various depositional facies belts at least on one continental margin, and can be correlated over long distances, sometimes worldwide. (iv) The first flooding surf

  1. Application of ecostratigraphy to sequences tratigraphy

    Institute of Scientific and Technical Information of China (English)

    殷鸿福; 童金南; 张克信; 吴顺宝

    1997-01-01

    The results of ecostratigraphy can directly serve sequence stratigraphy. The habitat type curve is useful not only in the analysis of sequences and parasequences, but also in demonstration of the process of regional sea level change. The various biological surfaces usually coincide with or relate to the boundaries of sequences or system tracts. The ecostratigraphic framework composed of coenozones, community sequences and ecotracts with good timing completely corresponds to the sequence stratigraphic framework of the sedimentary basin. Therefore, through establishment of the habitat type curve in individual section, recognition of the various biological surfaces, regional ecostratigraphic correlation and the formation of an ecostratigraphic framework of the sedimentary basin, ecostratigraphy plays an important role in the study of sequence stratigraphy and the reconstruction of regional and even global sea level changes.

  2. Sequencing intractable DNA to close microbial genomes.

    Science.gov (United States)

    Hurt, Richard A; Brown, Steven D; Podar, Mircea; Palumbo, Anthony V; Elias, Dwayne A

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  3. Comparison of next-generation sequencing systems.

    Science.gov (United States)

    Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie

    2012-01-01

    With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized.

  4. On the activation of bovine plasma factor XIII. Amino acid sequence of the peptide released by thrombin and the terminal residues of the subunit polypeptides.

    Science.gov (United States)

    Nakamura, S; Iwanaga, S; Suzuki, T

    1975-12-01

    A blood coagulation factor, Factor XIII, was highly purified from bovine fresh plasma by a method similar to those used for human plasma Factor XIII. The isolated Factor XIII consisted of two subunit polypeptides, a and b chains, with molecular weights of 79,000 +/- 2,000 and 75,000 +/- 2,000, respectively. In the conversion of Factor XIII to the active enzyme, Factor XIIIa, by bovine thrombin [EC 3.4.21.5], a peptide was liberated. This peptide, designated tentatively as "activation peptide," was isolated by gel-filtration on a Sephadex G-75 column. It contained a total of 37 amino acid residues with a masked N-terminal residue and C-terminal arginine. The whole amino acid sequence of "Activation peptide" was established by the dansyl-Edman method and standard enzymatic techniques, and the masked N-terminal residue was identified as N-acetylserine by using a rat liver acylamino acid-releasing enzyme. This enzyme specifically cleaved the N-acetylserylglutamyl peptide bond serine and the remaining peptide, which was now reactive to 1-dimethylamino-naphthalene-5-sulfonyl chloride. A comparison of the sequences of human and bovine "Activation peptide" revealed five amino acids replacements, Ser-3 to Thr; Gly-5 to Arg; Ile-14 to Val; Thr-18 to Asn, and Pro-26 to Leu. Another difference was the deletion of Leu-34 in the human peptide. Adsorption chromatography on a hydroxylapatite column in the presence of 0.1% sodium dodecyl sulfate was developed as a preparative procedure for the resolution of the two subunit polypeptides, a or a' chain and b chain, constituting the protein molecule of Factor XIII or Factor XIIIa. End group analyses on the isolated pure chains revealed that the structural change of Factor XIII during activation with thrombin occurs only in the N-terminal portion of the a chain, not in the N-terminal end of the b chain or in the C-terminal ends of the a and b chains. From these results, it was concluded that the activation of bovine plasma Factor XIII

  5. Preparing DNA libraries for multiplexed paired-end deep sequencing for Illumina GA sequencers.

    Science.gov (United States)

    Son, Mike S; Taylor, Ronald K

    2011-02-01

    Whole-genome sequencing, also known as deep sequencing, is becoming a more affordable and efficient way to identify SNP mutations, deletions, and insertions in DNA sequences across several different strains. Two major obstacles preventing the widespread use of deep sequencers are the costs involved in services used to prepare DNA libraries for sequencing and the overall accuracy of the sequencing data. This unit describes the preparation of DNA libraries for multiplexed paired-end sequencing using the Illumina GA series sequencer. Self-preparation of DNA libraries can help reduce overall expenses, especially if optimization is required for the different samples, and use of the Illumina GA Sequencer can improve the quality of the data.

  6. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    DEFF Research Database (Denmark)

    de Souza, S J; Camargo, A A; Briones, M R;

    2000-01-01

    by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48......Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central...... coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1, 181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes...

  7. cis sequence effects on gene expression

    Directory of Open Access Journals (Sweden)

    Jacobs Kevin

    2007-08-01

    Full Text Available Abstract Background Sequence and transcriptional variability within and between individuals are typically studied independently. The joint analysis of sequence and gene expression variation (genetical genomics provides insight into the role of linked sequence variation in the regulation of gene expression. We investigated the role of sequence variation in cis on gene expression (cis sequence effects in a group of genes commonly studied in cancer research in lymphoblastoid cell lines. We estimated the proportion of genes exhibiting cis sequence effects and the proportion of gene expression variation explained by cis sequence effects using three different analytical approaches, and compared our results to the literature. Results We generated gene expression profiling data at N = 697 candidate genes from N = 30 lymphoblastoid cell lines for this study and used available candidate gene resequencing data at N = 552 candidate genes to identify N = 30 candidate genes with sufficient variance in both datasets for the investigation of cis sequence effects. We used two additive models and the haplotype phylogeny scanning approach of Templeton (Tree Scanning to evaluate association between individual SNPs, all SNPs at a gene, and diplotypes, with log-transformed gene expression. SNPs and diplotypes at eight candidate genes exhibited statistically significant (p cis sequence effects in our study, respectively. Conclusion Based on analysis of our results and the extant literature, one in four genes exhibits significant cis sequence effects, and for these genes, about 30% of gene expression variation is accounted for by cis sequence variation. Despite diverse experimental approaches, the presence or absence of significant cis sequence effects is largely supported by previously published studies.

  8. Exome sequencing and genetic testing for MODY.

    Directory of Open Access Journals (Sweden)

    Stefan Johansson

    Full Text Available CONTEXT: Genetic testing for monogenic diabetes is important for patient care. Given the extensive genetic and clinical heterogeneity of diabetes, exome sequencing might provide additional diagnostic potential when standard Sanger sequencing-based diagnostics is inconclusive. OBJECTIVE: The aim of the study was to examine the performance of exome sequencing for a molecular diagnosis of MODY in patients who have undergone conventional diagnostic sequencing of candidate genes with negative results. RESEARCH DESIGN AND METHODS: We performed exome enrichment followed by high-throughput sequencing in nine patients with suspected MODY. They were Sanger sequencing-negative for mutations in the HNF1A, HNF4A, GCK, HNF1B and INS genes. We excluded common, non-coding and synonymous gene variants, and performed in-depth analysis on filtered sequence variants in a pre-defined set of 111 genes implicated in glucose metabolism. RESULTS: On average, we obtained 45 X median coverage of the entire targeted exome and found 199 rare coding variants per individual. We identified 0-4 rare non-synonymous and nonsense variants per individual in our a priori list of 111 candidate genes. Three of the variants were considered pathogenic (in ABCC8, HNF4A and PPARG, respectively, thus exome sequencing led to a genetic diagnosis in at least three of the nine patients. Approximately 91% of known heterozygous SNPs in the target exomes were detected, but we also found low coverage in some key diabetes genes using our current exome sequencing approach. Novel variants in the genes ARAP1, GLIS3, MADD, NOTCH2 and WFS1 need further investigation to reveal their possible role in diabetes. CONCLUSION: Our results demonstrate that exome sequencing can improve molecular diagnostics of MODY when used as a complement to Sanger sequencing. However, improvements will be needed, especially concerning coverage, before the full potential of exome sequencing can be realized.

  9. WEAK CONVERGENCE OF HENSTOCK INTEGRABLE SEQUENCES

    Institute of Scientific and Technical Information of China (English)

    LuisaDiPiazza

    1994-01-01

    Some relationships between pointwise and weak convergence of a sequence of Henstock integrable functions are studied, In particular it is provided an example of a sequence of Henstock integrable functions whose pointwise limit is different from the weak one. By introducing an asymptotic version of the Henstock equiintegrability notion it is given a necessary and sufficient condition in order that a pointwisely convergent sequence of Henstock integrable functions is weakly convergent to its pointwise limit.

  10. EGNAS: an exhaustive DNA sequence design algorithm

    Directory of Open Access Journals (Sweden)

    Kick Alfred

    2012-06-01

    Full Text Available Abstract Background The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of sequences with defined properties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences offers the possibility of controlling both interstrand and intrastrand properties. The guanine-cytosine content can be adjusted. Sequences can be forced to start and end with guanine or cytosine. This option reduces the risk of “fraying” of DNA strands. It is possible to limit cross hybridizations of a defined length, and to adjust the uniqueness of sequences. Self-complementarity and hairpin structures of certain length can be avoided. Sequences and subsequences can optionally be forbidden. Furthermore, sequences can be designed to have minimum interactions with predefined strands and neighboring sequences. Results The algorithm is realized in a C++ program. TAG sequences can be generated and combined with primers for single-base extension reactions, which were described for multiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldback through intrastrand interaction of TAG-primer pairs can be limited. The design of sequences for specific attachment of molecular constructs to DNA origami is presented. Conclusions We developed a new software tool called EGNAS for the design of unique nucleic acid sequences. The presented exhaustive algorithm allows to generate greater sets of sequences than with previous software and equal constraints. EGNAS is freely available for noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS.

  11. Some properties of generalized Fibonacci sequence

    Science.gov (United States)

    Chong, Chin-Yoon; Ho, C. K.

    2015-12-01

    For all non-negative integer n and real constants a, b, p and q, the generalized Fibonacci sequence {U n } is defined by Un+2 = pUn+1 + qUn with the initial values U0 = a and U1 = b. Throughout the paper, we study some properties of the generalized Fibonacci sequence. Our results will motivate some new research problems concerning the contribution of the generalized sequence.

  12. Researches on Sequence of Plant Cystatin: Phytocystatin

    Institute of Scientific and Technical Information of China (English)

    QINQingfeng; HEWei; LIANGJun; ZHANGXingyao

    2005-01-01

    Plant cystatins or phytocystatins are cysteine proteinase inhibitors exist widely in different plant species. Because they can kill insects by inhibiting the digestive function of the cysteine proteinase in gut, they are believed to play an important role in plant's defense against pests. Phytocystatins contain the conserved QXVXG motif and show some features on their sequence different to animal cystatins.After sequencing the protein directly and the cDNA clone, a large number of plant cystatins have been characterized. A multialignment with BLAST software and a detail analysis of 38 phytocystatins show that phytocystatins possess a specific conserved amino acid sequence [LRVI]-[AGT]-[RQKE]-[FY]-[AS]-[VI]-X-[EGHDQV]-[HYFQ]-N different to the conserved sequence demonstrated by Margis in 1998. This conserved sequence can be enough to detect with exclusivity phytocystatin sequences on protein data banks. A classification of these phytocystatins is performed and they can be divided into 3 groups according to their features on amino acid sequence, and the group-I can be still divided into 3 subgroups based on the feature of their amino acid and genomic sequence. By the CLUSTALX software,the most conserved nucleotide sequences of phytocystatins were found, which could be used to design the degenerate premiers to search new phytocystatins with PCR reaction.

  13. Multiplexed microsatellite recovery using massively parallel sequencing.

    Science.gov (United States)

    Jennings, T N; Knaus, B J; Mullins, T D; Haig, S M; Cronn, R C

    2011-11-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356,958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5 M (USD).

  14. Locomotor sequence learning in visually guided walking.

    Science.gov (United States)

    Choi, Julia T; Jensen, Peter; Nielsen, Jens Bo

    2016-04-01

    Voluntary limb modifications must be integrated with basic walking patterns during visually guided walking. In this study we tested whether voluntary gait modifications can become more automatic with practice. We challenged walking control by presenting visual stepping targets that instructed subjects to modify step length from one trial to the next. Our sequence learning paradigm is derived from the serial reaction-time (SRT) task that has been used in upper limb studies. Both random and ordered sequences of step lengths were used to measure sequence-specific and sequence-nonspecific learning during walking. In addition, we determined how age (i.e., healthy young adults vs. children) and biomechanical factors (i.e., walking speed) affected the rate and magnitude of locomotor sequence learning. The results showed that healthy young adults (age 24 ± 5 yr,n= 20) could learn a specific sequence of step lengths over 300 training steps. Younger children (age 6-10 yr,n= 8) had lower baseline performance, but their magnitude and rate of sequence learning were the same compared with those of older children (11-16 yr,n= 10) and healthy adults. In addition, learning capacity may be more limited at faster walking speeds. To our knowledge, this is the first study to demonstrate that spatial sequence learning can be integrated with a highly automatic task such as walking. These findings suggest that adults and children use implicit knowledge about the sequence to plan and execute leg movement during visually guided walking.

  15. Detecting Emotions from Connected Action Sequences

    Science.gov (United States)

    Bernhardt, Daniel; Robinson, Peter

    In this paper we deal with the problem of detecting emotions from the body movements produced by naturally connected action sequences. Although action sequences are one of the most common forms of body motions in everyday scenarios their potential for emotion recognition has not been explored in the past. We show that there are fundamental differences between actions recorded in isolation and in natural sequences and demonstrate a number of techniques which allow us to correctly label action sequences with one of four emotions up to 86% of the time. Our results bring us an important step closer to recognizing emotions from body movements in natural scenarios.

  16. Recursive sequences in first-year calculus

    Science.gov (United States)

    Krainer, Thomas

    2016-02-01

    This article provides ready-to-use supplementary material on recursive sequences for a second-semester calculus class. It equips first-year calculus students with a basic methodical procedure based on which they can conduct a rigorous convergence or divergence analysis of many simple recursive sequences on their own without the need to invoke inductive arguments as is typically required in calculus textbooks. The sequences that are accessible to this kind of analysis are predominantly (eventually) monotonic, but also certain recursive sequences that alternate around their limit point as they converge can be considered.

  17. Visible periodicity of strong nucleosome DNA sequences.

    Science.gov (United States)

    Salih, Bilal; Tripathi, Vijay; Trifonov, Edward N

    2015-01-01

    Fifteen years ago, Lowary and Widom assembled nucleosomes on synthetic random sequence DNA molecules, selected the strongest nucleosomes and discovered that the TA dinucleotides in these strong nucleosome sequences often appear at 10-11 bases from one another or at distances which are multiples of this period. We repeated this experiment computationally, on large ensembles of natural genomic sequences, by selecting the strongest nucleosomes--i.e. those with such distances between like-named dinucleotides, multiples of 10.4 bases, the structural and sequence period of nucleosome DNA. The analysis confirmed the periodicity of TA dinucleotides in the strong nucleosomes, and revealed as well other periodic sequence elements, notably classical AA and TT dinucleotides. The matrices of DNA bendability and their simple linear forms--nucleosome positioning motifs--are calculated from the strong nucleosome DNA sequences. The motifs are in full accord with nucleosome positioning sequences derived earlier, thus confirming that the new technique, indeed, detects strong nucleosomes. Species- and isochore-specific variations of the matrices and of the positioning motifs are demonstrated. The strong nucleosome DNA sequences manifest the highest hitherto nucleosome positioning sequence signals, showing the dinucleotide periodicities in directly observable rather than in hidden form.

  18. Genomic sequencing of Pleistocene cave bears

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  19. Maize genome sequencing by methylation filtration.

    Science.gov (United States)

    Palmer, Lance E; Rabinowicz, Pablo D; O'Shaughnessy, Andrew L; Balija, Vivekanand S; Nascimento, Lidia U; Dike, Sujit; de la Bastide, Melissa; Martienssen, Robert A; McCombie, W Richard

    2003-12-19

    Gene enrichment strategies offer an alternative to sequencing large and repetitive genomes such as that of maize. We report the generation and analysis of nearly 100,000 undermethylated (or methylation filtration) maize sequences. Comparison with the rice genome reveals that methylation filtration results in a more comprehensive representation of maize genes than those that result from expressed sequence tags or transposon insertion sites sequences. About 7% of the repetitive DNA is unmethylated and thus selected in our libraries, but potentially active transposons and unmethylated organelle genomes can be identified. Reverse transcription polymerase chain reaction can be used to finish the maize transcriptome.

  20. Hardware Accelerated Sequence Alignment with Traceback

    Directory of Open Access Journals (Sweden)

    Scott Lloyd

    2009-01-01

    in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations. With 256 processing elements in FPGA technology, a performance gain over 300 times that of a desktop computer is demonstrated on sequence lengths of 16000. For greater performance, the architecture is scalable to more processing elements.

  1. A measurement of disorder in binary sequences

    Science.gov (United States)

    Gong, Longyan; Wang, Haihong; Cheng, Weiwen; Zhao, Shengmei

    2015-03-01

    We propose a complex quantity, AL, to characterize the degree of disorder of L-length binary symbolic sequences. As examples, we respectively apply it to typical random and deterministic sequences. One kind of random sequences is generated from a periodic binary sequence and the other is generated from the logistic map. The deterministic sequences are the Fibonacci and Thue-Morse sequences. In these analyzed sequences, we find that the modulus of AL, denoted by |AL | , is a (statistically) equivalent quantity to the Boltzmann entropy, the metric entropy, the conditional block entropy and/or other quantities, so it is a useful quantitative measure of disorder. It can be as a fruitful index to discern which sequence is more disordered. Moreover, there is one and only one value of |AL | for the overall disorder characteristics. It needs extremely low computational costs. It can be easily experimentally realized. From all these mentioned, we believe that the proposed measure of disorder is a valuable complement to existing ones in symbolic sequences.

  2. Strebel differentials and Hamilton sequences

    Institute of Scientific and Technical Information of China (English)

    LI; Zhong(

    2001-01-01

    [1]Strebel, K., Point shift differentials and extremal quasiconformal mappings, Annale Acad. Scle. Fenn. Math., 1998, 23: 475 -494.[2]Gardiner, F. P., Approximation of infinite dimensional Teichmutller space, Trans. Amer. Soc., 1999, 282: 367-383.[3]Lakic, N. , The Strebel points, Comptemp. Math. , 1997, 211: 417-431.[4]Wu Sheng jian, Hamilton sequences for extremal quasiconformal mappings of the unit disc, Science in China, Ser. A, 1999,42(10): 1033-1042.[5]Li Zhong, Qi Yi, A note on point shift differentials, Science in China, Ser. A, 1999, 42(5): 449-455.[6]Hamilton, R. S., Extremal quasiconformal mappings with prescribed boundary values, Trans. Amer. Math. Soc. , 1969,138: 399-406.[7]Krushkal, S. , Extremal quasiconformal mappings, Sirbirsk. Mat. Zh., 1969, 10: 573-583.[8]Reich, E., Strebel, K., Extremal quasiconformal mappings with given boundary values, Contributions to Analysis, New York: Academic Press, 1974, 375-391.[9]Strebel, K. , On quasiconformal mappings of open Riemann surfaces, Commemt. Math. Helr., 1978, 53: 301-321.[10]Earle, C., Li Zhong, Extremal quasiconformal mappings in plane domains, Quasiconformal Mappings and Analysis A Col-lection of Papers Honoring F. W. Gehring, New York: Springer-Verlag, 1998, 141-158.[11]Strebel, K., On quadratic differentials and extremal quasiconforrnal mappings, in Proc. of the Intern. Congress of Math.,Vancouver, 1974.[12]Li Zhong, Some new results on the geometry of infinite dimensional Teichmuller space, in Proceedings of the 3rd International Colloquium on Finite or Infinite Dimensional Complex Analysis, 1995, 369-378.

  3. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  4. Timing-Sequence Testing of Parallel Programs

    Institute of Scientific and Technical Information of China (English)

    LIANG Yu; LI Shu; ZHANG Hui; HAN Chengde

    2000-01-01

    Testing of parallel programs involves two parts-testing of controlflow within the processes and testing of timing-sequence.This paper focuses on the latter, particularly on the timing-sequence of message-passing paradigms.Firstly the coarse-grained SYN-sequence model is built up to describe the execution of distributed programs. All of the topics discussed in this paper are based on it. The most direct way to test a program is to run it. A fault-free parallel program should be of both correct computing results and proper SYN-sequence. In order to analyze the validity of observed SYN-sequence, this paper presents the formal specification (Backus Normal Form) of the valid SYN-sequence. Till now there is little work about the testing coverage for distributed programs. Calculating the number of the valid SYN-sequences is the key to coverage problem, while the number of the valid SYN-sequences is terribly large and it is very hard to obtain the combination law among SYN-events. In order to resolve this problem, this paper proposes an efficient testing strategy-atomic SYN-event testing, which is to linearize the SYN-sequence (making it only consist of serial atomic SYN-events) first and then test each atomic SYN-event independently. This paper particularly provides the calculating formula about the number of the valid SYN-sequences for tree-topology atomic SYN-event (broadcast and combine). Furthermore,the number of valid SYN-sequences also,to some degree, mirrors the testability of parallel programs. Taking tree-topology atomic SYN-event as an example, this paper demonstrates the testability and communication speed of the tree-topology atomic SYN-event under different numbers of branches in order to achieve a more satisfactory tradeoff between testability and communication efficiency.

  5. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding.

    Directory of Open Access Journals (Sweden)

    Marta Brozynska

    Full Text Available Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina and Ion Torrent (Life Technology sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare. Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.

  6. Generalized Identities of Companion Fibonacci-Like Sequences

    OpenAIRE

    Shikha Bhatnagar; Bijendra Singh; Omprakash Sikhwal

    2013-01-01

    The Fibonacci sequence, Lucas sequence, Pell sequence, Pell-Lucas sequence, Jacobsthalsequence and Jacobsthal-Lucas sequence are most prominent examples of second order recursivesequences. In this paper, we deal with two companion Fibonacci- Like sequences which aregeneralization of Fibonacci-Like sequence. Further we obtain some generalized identities amongthe terms of companion Fibonacci-Like sequences, Jacobsthal and Jacobsthal-Lucas sequencesthrough Binet’s formulae.

  7. Sequencing for the cream of the crop

    Science.gov (United States)

    In this invited commentary, we discuss how next-generation sequencing methods are beginning to find their way into plant genetics, promising substantial improvements in crop yields over the coming decades. Next-generation sequencing facilitates the construction of high-resolution variation maps, whi...

  8. Massively parallel sequencing of forensic STRs

    DEFF Research Database (Denmark)

    Parson, Walther; Ballard, David; Budowle, Bruce

    2016-01-01

    accessible genome assembly, and in place before the uptake of MPS by the general forensic community starts to generate sequence data on a large scale. While the established nomenclature for CE-based STR analysis will remain unchanged in the future, the nomenclature of sequence-based STR genotypes will need...

  9. SPARSE SEQUENCE CONSTRUCTION OF LDPC CODES

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    This letter proposes a novel and simple construction of regular Low-Density Parity-Check (LDPC) codes using sparse binary sequences. It utilizes the cyclic cross correlation function of sparse sequences to generate codes with girth8. The new codes perform well using the sumproduct decoding. Low encodingcomplexity can also be achieved due to the inherent quasi-cyclic structure of the codes.

  10. Archaebacterial rhodopsin sequences: Implications for evolution

    Science.gov (United States)

    Lanyi, J. K.

    1991-01-01

    It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.

  11. Novel algorithms for protein sequence analysis

    NARCIS (Netherlands)

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  12. Controlling monomer-sequence using supramolecular templates

    OpenAIRE

    ten Brummelhuis, Niels

    2014-01-01

    The transcription and translation of information contained in nucleic acids that has been perfected by nature serves as inspiration for chemists to devise strategies for the creation of polymers with welldefined monomer sequences. In this review the various approaches in which templates (either biopolymers or synthetic ones) are used to influence the monomer-sequence are discussed.

  13. Bonobos extract meaning from call sequences.

    Directory of Open Access Journals (Sweden)

    Zanna Clay

    Full Text Available Studies on language-trained bonobos have revealed their remarkable abilities in representational and communication tasks. Surprisingly, however, corresponding research into their natural communication has largely been neglected. We address this issue with a first playback study on the natural vocal behaviour of bonobos. Bonobos produce five acoustically distinct call types when finding food, which they regularly mix together into longer call sequences. We found that individual call types were relatively poor indicators of food quality, while context specificity was much greater at the call sequence level. We therefore investigated whether receivers could extract meaning about the quality of food encountered by the caller by integrating across different call sequences. We first trained four captive individuals to find two types of foods, kiwi (preferred and apples (less preferred at two different locations. We then conducted naturalistic playback experiments during which we broadcasted sequences of four calls, originally produced by a familiar individual responding to either kiwi or apples. All sequences contained the same number of calls but varied in the composition of call types. Following playbacks, we found that subjects devoted significantly more search effort to the field indicated by the call sequence. Rather than attending to individual calls, bonobos attended to the entire sequences to make inferences about the food encountered by a caller. These results provide the first empirical evidence that bonobos are able to extract information about external events by attending to vocal sequences of other individuals and highlight the importance of call combinations in their natural communication system.

  14. Sequencing Events: Exploring Art and Art Jobs.

    Science.gov (United States)

    Stephens, Pamela Geiger; Shaddix, Robin K.

    2000-01-01

    Presents an activity for upper-elementary students that correlates the actions of archaeologists, patrons, and artists with the sequencing of events in a logical order. Features ancient Egyptian art images. Discusses the preparation of materials, motivation, a pre-writing activity, and writing a story in sequence. (CMK)

  15. Sequencing the Cotton Genomes-Gossypium spp.

    Institute of Scientific and Technical Information of China (English)

    PATERSON Andrew H

    2008-01-01

    @@ The genomes of most major crops,including cotton,will be fully sequenced in the next fewyears.Cotton is unusual,although not unique,in that we will need to sequence not only cultivated(tetraploid) genotypes but their diploid progenitors,to understand how elite cottons have surpassedthe productivity and quality of their progenitors.

  16. Convergence of a Linear Recursive Sequence

    Science.gov (United States)

    Tay, E. G.; Toh, T. L.; Dong, F. M.; Lee, T. Y.

    2004-01-01

    A necessary and sufficient condition is found for a linear recursive sequence to be convergent, no matter what initial values are given. Its limit is also obtained when the sequence is convergent. Methods from various areas of mathematics are used to obtain the results.

  17. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. CONCLUSIONS...

  18. Using Conventional Sequences in L2 French

    Science.gov (United States)

    Forsberg, Fanny

    2010-01-01

    By means of a phraseological identification method, this study provides a general description of the use of conventional sequences (CSs) in interviews at four different levels of spoken L2 French as well as in interviews with native speakers. Use of conventional sequences is studied with regard to overall quantity, category distribution and type…

  19. Discrepancy of LS-sequences of partitions

    CERN Document Server

    Carbone, Ingrid

    2010-01-01

    In this paper we give a precise estimate of the discrepancy of a class of uniformly distributed sequences of partitions. Among them we found a large class having low discrepancy (which means of order 1/N. One of them is the Kakutani-Fibonacci sequence.

  20. Some identities of generalized Fibonacci sequence

    Science.gov (United States)

    Chong, Chin-Yoon; Cheah, C. L.; Ho, C. K.

    2014-07-01

    We introduced the generalized Fibonacci sequence {Un} defined by U0 = 0, U1 = 1, and Un+2 = pUn+1+qUn for all p, q∈Z+ and for all non-negative integers n. In this paper, we obtained some recursive formulas of the sequence.

  1. On the sum of generalized Fibonacci sequence

    Science.gov (United States)

    Chong, Chin-Yoon; Ho, C. K.

    2014-06-01

    We consider the generalized Fibonacci sequence {Un defined by U0 = 0, U1 = 1, and Un+2 = pUn+1+qUn for all n∈Z0+ and p, q∈Z+. In this paper, we derived various sums of the generalized Fibonacci sequence from their recursive relations.

  2. Regular Pentagons and the Fibonacci Sequence.

    Science.gov (United States)

    French, Doug

    1989-01-01

    Illustrates how to draw a regular pentagon. Shows the sequence of a succession of regular pentagons formed by extending the sides. Calculates the general formula of the Lucas and Fibonacci sequences. Presents a regular icosahedron as an example of the golden ratio. (YP)

  3. Wolbachia Sequence Typing in Butterflies Using Pyrosequencing.

    Science.gov (United States)

    Choi, Sungmi; Shin, Su-Kyoung; Jeong, Gilsang; Yi, Hana

    2015-09-01

    Wolbachia is an obligate symbiotic bacteria that is ubiquitous in arthropods, with 25-70% of insect species estimated to be infected. Wolbachia species can interact with their insect hosts in a mutualistic or parasitic manner. Sequence types (ST) of Wolbachia are determined by multilocus sequence typing (MLST) of housekeeping genes. However, there are some limitations to MLST with respect to the generation of clone libraries and the Sanger sequencing method when a host is infected with multiple STs of Wolbachia. To assess the feasibility of massive parallel sequencing, also known as next-generation sequencing, we used pyrosequencing for sequence typing of Wolbachia in butterflies. We collected three species of butterflies (Eurema hecabe, Eurema laeta, and Tongeia fischeri) common to Korea and screened them for Wolbachia STs. We found that T. fischeri was infected with a single ST of Wolbachia, ST41. In contrast, E. hecabe and E. laeta were each infected with two STs of Wolbachia, ST41 and ST40. Our results clearly demonstrate that pyrosequencing-based MLST has a higher sensitivity than cloning and Sanger sequencing methods for the detection of minor alleles. Considering the high prevalence of infection with multiple Wolbachia STs, next-generation sequencing with improved analysis would assist with scaling up approaches to Wolbachia MLST.

  4. What's Next? Judging Sequences of Binary Events

    Science.gov (United States)

    Oskarsson, An T.; Van Boven, Leaf; McClelland, Gary H.; Hastie, Reid

    2009-01-01

    The authors review research on judgments of random and nonrandom sequences involving binary events with a focus on studies documenting gambler's fallacy and hot hand beliefs. The domains of judgment include random devices, births, lotteries, sports performances, stock prices, and others. After discussing existing theories of sequence judgments,…

  5. Concept For Generation Of Long Pseudorandom Sequences

    Science.gov (United States)

    Wang, C. C.

    1990-01-01

    Conceptual very-large-scale integrated (VLSI) digital circuit performs exponentiation in finite field. Algorithm that generates unusually long sequences of pseudorandom numbers executed by digital processor that includes such circuits. Concepts particularly advantageous for such applications as spread-spectrum communications, cryptography, and generation of ranging codes, synthetic noise, and test data, where usually desirable to make pseudorandom sequences as long as possible.

  6. Fibonacci-triple sequences and some fundamental properties

    Directory of Open Access Journals (Sweden)

    Bijendra Singh

    2010-12-01

    Full Text Available Fibonacci sequence stands as a kind of super sequence with fabulous properties. This note presents Fibonacci-Triple sequences that may also be called 3-F sequences. This is the explosive development in the region of Fibonacci sequence. Our purpose of this paper is to demonstrate fundamental properties of Fibonacci-Triple sequence.

  7. Fibonacci-triple sequences and some fundamental properties

    OpenAIRE

    Bijendra Singh; Omprakash Sikhwal

    2010-01-01

    Fibonacci sequence stands as a kind of super sequence with fabulous properties. This note presents Fibonacci-Triple sequences that may also be called 3-F sequences. This is the explosive development in the region of Fibonacci sequence. Our purpose of this paper is to demonstrate fundamental properties of Fibonacci-Triple sequence.

  8. NGS-based deep bisulfite sequencing.

    Science.gov (United States)

    Lee, Suman; Kim, Joomyeong

    2016-01-01

    We have developed an NGS-based deep bisulfite sequencing protocol for the DNA methylation analysis of genomes. This approach allows the rapid and efficient construction of NGS-ready libraries with a large number of PCR products that have been individually amplified from bisulfite-converted DNA. This approach also employs a bioinformatics strategy to sort the raw sequence reads generated from NGS platforms and subsequently to derive DNA methylation levels for individual loci. The results demonstrated that this NGS-based deep bisulfite sequencing approach provide not only DNA methylation levels but also informative DNA methylation patterns that have not been seen through other existing methods.•This protocol provides an efficient method generating NGS-ready libraries from individually amplified PCR products.•This protocol provides a bioinformatics strategy sorting NGS-derived raw sequence reads.•This protocol provides deep bisulfite sequencing results that can measure DNA methylation levels and patterns of individual loci.

  9. Locomotor sequence learning in visually guided walking

    DEFF Research Database (Denmark)

    Choi, Julia T; Jensen, Peter; Nielsen, Jens Bo

    2016-01-01

    at faster walking speeds. To our knowledge, this is the first study to demonstrate that spatial sequence learning can be integrated with a highly automatic task like walking. These findings suggest that adults and children use implicit knowledge about the sequence to plan and execute leg movement during...... walking. In addition, we determined how age (i.e., healthy young adults vs. children) and biomechanical factors (i.e., walking speed) affected the rate and magnitude of locomotor sequence learning. The results showed that healthy young adults (age 24 ± 5 years, N = 20) could learn a specific sequence...... of step lengths over 300 training steps. Younger children (age 6-10 years, N = 8) have lower baseline performance, but their magnitude and rate of sequence learning was the same compared to older children (11-16 years, N = 10) and healthy adults. In addition, learning capacity may be more limited...

  10. Sequencing and comparing whole mitochondrial genomes ofanimals

    Energy Technology Data Exchange (ETDEWEB)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  11. Metagenomics using next-generation sequencing.

    Science.gov (United States)

    Bragg, Lauren; Tyson, Gene W

    2014-01-01

    Traditionally, microbial genome sequencing has been restricted to the small number of species that can be grown in pure culture. The progressive development of culture-independent methods over the last 15 years now allows researchers to sequence microbial communities directly from environmental samples. This approach is commonly referred to as "metagenomics" or "community genomics". However, the term metagenomics is applied liberally in the literature to describe any culture-independent analysis of microbial communities. Here, we define metagenomics as shotgun ("random") sequencing of the genomic DNA of a sample taken directly from the environment. The metagenome can be thought of as a sampling of the collective genome of the microbial community. We outline the considerations and analyses that should be undertaken to ensure the success of a metagenomic sequencing project, including the choice of sequencing platform and methods for assembly, binning, annotation, and comparative analysis.

  12. Strategies for complete plastid genome sequencing.

    Science.gov (United States)

    Twyford, Alex D; Ness, Rob W

    2016-10-28

    Plastid sequencing is an essential tool in the study of plant evolution. This high-copy organelle is one of the most technically accessible regions of the genome, and its sequence conservation makes it a valuable region for comparative genome evolution, phylogenetic analysis and population studies. Here, we discuss recent innovations and approaches for de novo plastid assembly that harness genomic tools. We focus on technical developments including low-cost sequence library preparation approaches for genome skimming, enrichment via hybrid baits and methylation-sensitive capture, sequence platforms with higher read outputs and longer read lengths, and automated tools for assembly. These developments allow for a much more streamlined assembly than via conventional short-range PCR. Although newer methods make complete plastid sequencing possible for any land plant or green alga, there are still challenges for producing finished plastomes particularly from herbarium material or from structurally divergent plastids such as those of parasitic plants.

  13. Reading biological processes from nucleotide sequences

    Science.gov (United States)

    Murugan, Anand

    Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical

  14. Finding Common Sequence and Structure Motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, Laurie J.; Stormo, Gary D.

    1997-01-01

    We present a computational scheme to search for the most common motif, composed of a combination of sequence and structure constraints, among a collection of RNA sequences. The method uses a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences...

  15. Targeted next-generation sequencing can replace Sanger sequencing in clinical diagnostics

    NARCIS (Netherlands)

    Sikkema-Raddatz, B.; Johansson, L.F.; de Boer, E.N.; Almomani, R.; Boven, L.G.; van den Berg, M.P.; van Spaendonck-Zwarts, K.Y.; van Tintelen, J.P.; Sijmons, R.H.; Jongbloed, J.D.H.; Sinke, R.J.

    2013-01-01

    Mutation detection through exome sequencing allows simultaneous analysis of all coding sequences of genes. However, it cannot yet replace Sanger sequencing (SS) in diagnostics because of incomplete representation and coverage of exons leading to missing clinically relevant mutations. Targeted next-g

  16. SOME GEOMETRIC PROPERTIES OF A NEW DIFFERENCE SEQUENCE SPACE INVOLVING LACUNARY SEQUENCES

    Institute of Scientific and Technical Information of China (English)

    Murat KARAKAŞ; Mikail ET; Vatan KARAKAYA

    2013-01-01

    In this paper, we define a new generalized difference sequence space involving lacunary sequence. Then, we examine k-NUC property and property (β) for this space and also show that it is not rotund where p=(pr) is a bounded sequence of positive real numbers with pr ≥1 for all r∈N.

  17. On some difference sequence spaces defined by a sequence of Orlicz functions

    Institute of Scientific and Technical Information of China (English)

    ASMA BEKTA(S) (C)i(g)dem

    2006-01-01

    The idea of difference sequence spaces was introduced in (Kizmaz, 1981) and this concept was generalized in (Et and Colak, 1995). In this paper we define some difference sequence spaces by a sequence of Orlicz functions and establish some inclusion relations.

  18. Sequencing and Analysis of a Genomic Fragment Provide an Insight into the Dunaliella viridis Genomic Sequence

    Institute of Scientific and Technical Information of China (English)

    Xiao-Ming SUN; Yuan-Ping TANG; Xiang-Zong MENG; Wen-Wen ZHANG; Shan LI; Zhi-Rui DENG; Zheng-Kai XU; Ren-Tao SONG

    2006-01-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)n type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features.

  19. Value of a newly sequenced bacterial genome

    Institute of Scientific and Technical Information of China (English)

    Eudes; GV; Barbosa; Flavia; F; Aburjaile; Rommel; TJ; Ramos; Adriana; R; Carneiro; Yves; Le; Loir; Jan; Baumbach; Anderson; Miyoshi; Artur; Silva; Vasco; Azevedo

    2014-01-01

    Next-generation sequencing(NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft(partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information.

  20. Fungal genome sequencing: basic biology to biotechnology.

    Science.gov (United States)

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research.

  1. CATEGORIZATION OF EVENT SEQUENCES FOR LICENSE APPLICATION

    Energy Technology Data Exchange (ETDEWEB)

    G.E. Ragan; P. Mecheret; D. Dexheimer

    2005-04-14

    The purposes of this analysis are: (1) Categorize (as Category 1, Category 2, or Beyond Category 2) internal event sequences that may occur before permanent closure of the repository at Yucca Mountain. (2) Categorize external event sequences that may occur before permanent closure of the repository at Yucca Mountain. This includes examining DBGM-1 seismic classifications and upgrading to DBGM-2, if appropriate, to ensure Beyond Category 2 categorization. (3) State the design and operational requirements that are invoked to make the categorization assignments valid. (4) Indicate the amount of material put at risk by Category 1 and Category 2 event sequences. (5) Estimate frequencies of Category 1 event sequences at the maximum capacity and receipt rate of the repository. (6) Distinguish occurrences associated with normal operations from event sequences. It is beyond the scope of the analysis to propose design requirements that may be required to control radiological exposure associated with normal operations. (7) Provide a convenient compilation of the results of the analysis in tabular form. The results of this analysis are used as inputs to the consequence analyses in an iterative design process that is depicted in Figure 1. Categorization of event sequences for permanent retrieval of waste from the repository is beyond the scope of this analysis. Cleanup activities that take place after an event sequence and other responses to abnormal events are also beyond the scope of the analysis.

  2. Gelada vocal sequences follow Menzerath's linguistic law.

    Science.gov (United States)

    Gustison, Morgan L; Semple, Stuart; Ferrer-I-Cancho, Ramon; Bergman, Thore J

    2016-05-10

    Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath's law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath's law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath's law reflects compression-the principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language.

  3. Exploration of noncoding sequences in metagenomes.

    Directory of Open Access Journals (Sweden)

    Fabián Tobar-Tosse

    Full Text Available Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C content, Codon Usage (Cd, Trinucleotide Usage (Tn, and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.

  4. Enhanced Dynamic Algorithm of Genome Sequence Alignments

    Directory of Open Access Journals (Sweden)

    Arabi E. keshk

    2014-05-01

    Full Text Available The merging of biology and computer science has created a new field called computational biology that explore the capacities of computers to gain knowledge from biological data, bioinformatics. Computational biology is rooted in life sciences as well as computers, information sciences, and technologies. The main problem in computational biology is sequence alignment that is a way of arranging the sequences of DNA, RNA or protein to identify the region of similarity and relationship between sequences. This paper introduces an enhancement of dynamic algorithm of genome sequence alignment, which called EDAGSA. It is filling the three main diagonals without filling the entire matrix by the unused data. It gets the optimal solution with decreasing the execution time and therefore the performance is increased. To illustrate the effectiveness of optimizing the performance of the proposed algorithm, it is compared with the traditional methods such as Needleman-Wunsch, Smith-Waterman and longest common subsequence algorithms. Also, database is implemented for using the algorithm in multi-sequence alignments for searching the optimal sequence that matches the given sequence.

  5. Identification of 10 882 porcine microsatellite sequences and virtual mapping of 4528 of these sequences

    DEFF Research Database (Denmark)

    Karlskov-Mortensen, Peter; Hu, Z.L.; Gorodkin, Jan

    2007-01-01

    the human genome (BLAST cut-off threshold = 1 x 10-5). All microsatellite sequences placed on the comparative map are accessible at http://www.animalgenome.org/QTLdb/pig.html . These sequences increase the number of identified microsatellites in the porcine genome by several orders of magnitude......A total of 10 882 porcine microsatelite repeats were identified in genomic shotgun sequences from the Sino-Danish Pig Genome Sequencing Consortium ( http://piggenome.dk ). Of these, 4528 microsatellites were placed on a pig-human comparative map by BLAST analysis of porcine sequences against...

  6. Large Zero Autocorrelation Zone of Golay Sequences and $4^q$-QAM Golay Complementary Sequences

    CERN Document Server

    Gong, Guang; Yang, Yang

    2011-01-01

    Sequences with good correlation properties have been widely adopted in modern communications, radar and sonar applications. In this paper, we present our new findings on some constructions of single $H$-ary Golay sequence and $4^q$-QAM Golay complementary sequence with a large zero autocorrelation zone, where $H\\ge 2$ is an arbitrary even integer and $q\\ge 2$ is an arbitrary integer. Those new results on Golay sequences and QAM Golay complementary sequences can be explored during synchronization and detection at the receiver end and thus improve the performance of the communication system.

  7. Robot Sequencing and Visualization Program (RSVP)

    Science.gov (United States)

    Cooper, Brian K.; Maxwell,Scott A.; Hartman, Frank R.; Wright, John R.; Yen, Jeng; Toole, Nicholas T.; Gorjian, Zareh; Morrison, Jack C

    2013-01-01

    The Robot Sequencing and Visualization Program (RSVP) is being used in the Mars Science Laboratory (MSL) mission for downlink data visualization and command sequence generation. RSVP reads and writes downlink data products from the operations data server (ODS) and writes uplink data products to the ODS. The primary users of RSVP are members of the Rover Planner team (part of the Integrated Planning and Execution Team (IPE)), who use it to perform traversability/articulation analyses, take activity plan input from the Science and Mission Planning teams, and create a set of rover sequences to be sent to the rover every sol. The primary inputs to RSVP are downlink data products and activity plans in the ODS database. The primary outputs are command sequences to be placed in the ODS for further processing prior to uplink to each rover. RSVP is composed of two main subsystems. The first, called the Robot Sequence Editor (RoSE), understands the MSL activity and command dictionaries and takes care of converting incoming activity level inputs into command sequences. The Rover Planners use the RoSE component of RSVP to put together command sequences and to view and manage command level resources like time, power, temperature, etc. (via a transparent realtime connection to SEQGEN). The second component of RSVP is called HyperDrive, a set of high-fidelity computer graphics displays of the Martian surface in 3D and in stereo. The Rover Planners can explore the environment around the rover, create commands related to motion of all kinds, and see the simulated result of those commands via its underlying tight coupling with flight navigation, motor, and arm software. This software is the evolutionary replacement for the Rover Sequencing and Visualization software used to create command sequences (and visualize the Martian surface) for the Mars Exploration Rover mission.

  8. Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius.

    Directory of Open Access Journals (Sweden)

    Abdulaziz M Al-Swailem

    Full Text Available Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and approximately 40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/, hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism.

  9. Effects of the Ion PGM™ Hi-Q™ sequencing chemistry on sequence data quality.

    Science.gov (United States)

    Churchill, Jennifer D; King, Jonathan L; Chakraborty, Ranajit; Budowle, Bruce

    2016-09-01

    Massively parallel sequencing (MPS) offers substantial improvements over current forensic DNA typing methodologies such as increased resolution, scalability, and throughput. The Ion PGM™ is a promising MPS platform for analysis of forensic biological evidence. The system employs a sequencing-by-synthesis chemistry on a semiconductor chip that measures a pH change due to the release of hydrogen ions as nucleotides are incorporated into the growing DNA strands. However, implementation of MPS into forensic laboratories requires a robust chemistry. Ion Torrent's Hi-Q™ Sequencing Chemistry was evaluated to determine if it could improve on the quality of the generated sequence data in association with selected genetic marker targets. The whole mitochondrial genome and the HID-Ion STR 10-plex panel were sequenced on the Ion PGM™ system with the Ion PGM™ Sequencing 400 Kit and the Ion PGM™ Hi-Q™ Sequencing Kit. Concordance, coverage, strand balance, noise, and deletion ratios were assessed in evaluating the performance of the Ion PGM™ Hi-Q™ Sequencing Kit. The results indicate that reliable, accurate data are generated and that sequencing through homopolymeric regions can be improved with the use of Ion Torrent's Hi-Q™ Sequencing Chemistry. Overall, the quality of the generated sequencing data supports the potential for use of the Ion PGM™ in forensic genetic laboratories.

  10. The double main sequence of Omega Centauri

    CERN Document Server

    Bedin, L R

    2004-01-01

    Recent, high precision photometry of Omega Centauri, the biggest Galactic globular cluster, has been obtained with Hubble Space Telescope. The color magnitude diagram reveals an unexpected bifurcation of colors in the main sequence (MS). The newly found double MS, the multiple turnoffs and subgiant branches, and other sequences discovered in the past along the red giant branch of this cluster add up to a fascinating but frustrating puzzle. Among the possible explanations for the blue main sequence an anomalous overabundance of helium is suggested. The hypothesis will be tested with a set of FLAMES@VLT data we have recently obtained (ESO DDT program), and with forthcoming ACS@HST images.

  11. Initial retrieval sequence and blending strategy

    Energy Technology Data Exchange (ETDEWEB)

    Pemwell, D.L.; Grenard, C.E.

    1996-09-01

    This report documents the initial retrieval sequence and the methodology used to select it. Waste retrieval, storage, pretreatment and vitrification were modeled for candidate single-shell tank retrieval sequences. Performance of the sequences was measured by a set of metrics (for example,high-level waste glass volume, relative risk and schedule).Computer models were used to evaluate estimated glass volumes,process rates, retrieval dates, and blending strategy effects.The models were based on estimates of component inventories and concentrations, sludge wash factors and timing, retrieval annex limitations, etc.

  12. Persistence and NIP in the characteristic sequence

    CERN Document Server

    Malliaris, M E

    2009-01-01

    For a first-order formula $\\phi(x;y)$ we introduce and study the characteristic sequence $$ of hypergraphs defined by $P_n(y_1,...,y_n) := (\\exists x) \\bigwedge_{i \\leq n} \\phi(x;y_i)$. We show that combinatorial and classification theoretic properties of the characteristic sequence reflect classification theoretic properties of $\\varphi$ and vice versa. Specifically, we show that some tree properties are detected by the presence of certain combinatorial configurations in the characteristic sequence while other properties such as instability and the independence property manifest themselves in the persistence of complicated configurations under localization.

  13. Scale-PC shielding analysis sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bowman, S.M.

    1996-05-01

    The SCALE computational system is a modular code system for analyses of nuclear fuel facility and package designs. With the release of SCALE-PC Version 4.3, the radiation shielding analysis community now has the capability to execute the SCALE shielding analysis sequences contained in the control modules SAS1, SAS2, SAS3, and SAS4 on a MS- DOS personal computer (PC). In addition, SCALE-PC includes two new sequences, QADS and ORIGEN-ARP. The capabilities of each sequence are presented, along with example applications.

  14. Nanopore-CMOS Interfaces for DNA Sequencing.

    Science.gov (United States)

    Magierowski, Sebastian; Huang, Yiyun; Wang, Chengjie; Ghafar-Zadeh, Ebrahim

    2016-08-06

    DNA sequencers based on nanopore sensors present an opportunity for a significant break from the template-based incumbents of the last forty years. Key advantages ushered by nanopore technology include a simplified chemistry and the ability to interface to CMOS technology. The latter opportunity offers substantial promise for improvement in sequencing speed, size and cost. This paper reviews existing and emerging means of interfacing nanopores to CMOS technology with an emphasis on massively-arrayed structures. It presents this in the context of incumbent DNA sequencing techniques, reviews and quantifies nanopore characteristics and models and presents CMOS circuit methods for the amplification of low-current nanopore signals in such interfaces.

  15. How Long is an Aftershock Sequence?

    Science.gov (United States)

    Godano, Cataldo; Tramelli, Anna

    2016-07-01

    The occurrence of a mainschok is always followed by aftershocks spatially distributed within the fault area. The aftershocks rate decay with time is described by the empirical Omori law which was inferred by catalogues analysis. The sequences discrimination within catalogues is not a straightforward operation, especially for low-magnitude mainshocks. Here, we describe the rate decay of the Omori law obtained using different sequence discrimination tools and we discover that, when the background seismicity is excluded, the sequences tend to last for the temporal extension of the catalogue.

  16. Microbial genomics: from sequence to function.

    OpenAIRE

    Schwartz, I

    2000-01-01

    The era of genomics (the study of genes and their function) began a scant dozen years ago with a suggestion by James Watson that the complete DNA sequence of the human genome be determined. Since that time, the human genome project has attracted a great deal of attention in the scientific world and the general media; the scope of the sequencing effort, and the extraordinary value that it will provide, has served to mask the enormous progress in sequencing other genomes. Microbial genome seque...

  17. DNA Sequence Determination by Hybridization: A Strategy for Efficient Large-Scale Sequencing

    Science.gov (United States)

    Drmanac, R.; Drmanac, S.; Strezoska, Z.; Paunesku, T.; Labat, I.; Zeremski, M.; Snoddy, J.; Funkhouser, W. K.; Koop, B.; Hood, L.; Crkvenjakov, R.

    1993-06-01

    The concept of sequencing by hybridization (SBH) makes use of an array of all possible n-nucleotide oligomers (n-mers) to identify n-mers present in an unknown DNA sequence. Computational approaches can then be used to assemble the complete sequence. As a validation of this concept, the sequences of three DNA fragments, 343 base pairs in length, were determined with octamer oligonucleotides. Possible applications of SBH include physical mapping (ordering) of overlapping DNA clones, sequence checking, DNA fingerprinting comparisons of normal and disease-causing genes, and the identification of DNA fragments with particular sequence motifs in complementary DNA and genomic libraries. The SBH techniques may accelerate the mapping and sequencing phases of the human genome project.

  18. Optimization of a sequence of reactors

    DEFF Research Database (Denmark)

    Vidal, Rene Victor Valqui

    1991-01-01

    Concerns the optimal production of sulphuric acid in a sequence of reactors. Using a suitable approximation to the objective function, this problem can easily be solved using the maximum principle. A numerical example documents the applicability of the suggested approach...

  19. Characterizing leader sequences of CRISPR loci

    DEFF Research Database (Denmark)

    Alkhnbashi, Omer; Shah, Shiraz Ali; Garrett, Roger Antony

    2016-01-01

    The CRISPR-Cas system is an adaptive immune system in many archaea and bacteria, which provides resistance against invading genetic elements. The first phase of CRISPR-Cas immunity is called adaptation, in which small DNA fragments are excised from genetic elements and are inserted into a CRISPR...... array generally adjacent to its so called leader sequence at one end of the array. It has been shown that transcription initiation and adaptation signals of the CRISPR array are located within the leader. However, apart from promoters, there is very little knowledge of sequence or structural motifs...... sequences by focusing on the consensus repeat of the adjacent CRISPR array and weak upstream conservation signals. We applied our tool to the analysis of a comprehensive genomic database and identified several characteristic properties of leader sequences specific to archaea and bacteria, ranging from...

  20. Sequencing Information Management System (SIMS). Final report

    Energy Technology Data Exchange (ETDEWEB)

    Fields, C.

    1996-02-15

    A feasibility study to develop a requirements analysis and functional specification for a data management system for large-scale DNA sequencing laboratories resulted in a functional specification for a Sequencing Information Management System (SIMS). This document reports the results of this feasibility study, and includes a functional specification for a SIMS relational schema. The SIMS is an integrated information management system that supports data acquisition, management, analysis, and distribution for DNA sequencing laboratories. The SIMS provides ad hoc query access to information on the sequencing process and its results, and partially automates the transfer of data between laboratory instruments, analysis programs, technical personnel, and managers. The SIMS user interfaces are designed for use by laboratory technicians, laboratory managers, and scientists. The SIMS is designed to run in a heterogeneous, multiplatform environment in a client/server mode. The SIMS communicates with external computational and data resources via the internet.

  1. Extracting biological knowledge from DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    De La Vega, F.M. [CINVESTAV-IPN (Mexico); Thieffry, D. [Universite Libre de Bruxelles, Rhode-Saint-Genese (Belgium)]|[Universidad Nacional Autonoma de Mexico, Morelos (Mexico); Collado-Vides, J. [Universidad Nacional Autonoma de Mexico, Morelos (Mexico)

    1996-12-31

    This session describes the elucidation of information from dna sequences and what challenges computational biologists face in their task of summarizing and deciphering the human genome. Techniques discussed include methods from statistics, information theory, artificial intelligence and linguistics. 1 ref.

  2. Fathead minnow genome sequencing and assembly

    Data.gov (United States)

    U.S. Environmental Protection Agency — The dataset provides the URLs for accessing the genome sequence data and two draft assemblies as well as fathead minnow genotyping data associated with estimating...

  3. Glycome mapping on DNA sequencing equipment.

    Science.gov (United States)

    Laroy, Wouter; Contreras, Roland; Callewaert, Nico

    2006-01-01

    Here we provide a detailed protocol for the analysis of protein-linked glycans on DNA sequencing equipment. This protocol satisfies the glyco-analytical needs of many projects and can form the basis of 'glycomics' studies, in which robustness, high throughput, high sensitivity and reliable quantification are of paramount importance. The protocol routinely resolves isobaric glycan stereoisomers, which is much more difficult by mass spectrometry (MS). Earlier methods made use of polyacrylamide gel-based sequencers, but we have now adapted the technique to multicapillary DNA sequencers, which represent the state of the art today. In addition, we have integrated an option for HPLC-based fractionation of highly anionic 8-amino-1,3,6-pyrenetrisulfonic acid (APTS)-labeled glycans before rapid capillary electrophoretic profiling. This option facilitates either two-dimensional profiling of complex glycan mixtures and exoglycosidase sequencing, or MS analysis of particular compounds of interest rather than of the total pool of glycans in a sample.

  4. Long range correlations in DNA sequences

    CERN Document Server

    Mohanty, A K

    2002-01-01

    The so called long range correlation properties of DNA sequences are studied using the variance analyses of the density distribution of a single or a group of nucleotides in a model independent way. This new method which was suggested earlier has been applied to extract slope parameters that characterize the correlation properties for several intron containing and intron less DNA sequences. An important aspect of all the DNA sequences is the properties of complimentarity by virtue of which any two complimentary distributions (like GA is complimentary to TC or G is complimentary to ATC) have identical fluctuations at all scales although their distribution functions need not be identical. Due to this complimentarity, the famous DNA walk representation whose statistical interpretation is still unresolved is shown to be a special case of the present formalism with a density distribution corresponding to a purine or a pyrimidine group. Another interesting aspect of most of the DNA sequences is that the factorial m...

  5. Identifying driver mutations in sequenced cancer genomes

    DEFF Research Database (Denmark)

    Raphael, Benjamin J; Dobson, Jason R; Oesper, Layla

    2014-01-01

    High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise......, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA...... sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein...

  6. Sequence finishing and mapping of Drosophila melanogasterheterochromatin

    Energy Technology Data Exchange (ETDEWEB)

    Hoskins, Roger A.; Carlson, Joseph W.; Kennedy, Cameron; Acevedo,David; Evans-Holm, Martha; Frise, Erwin; Wan, Kenneth H.; Park, Soo; Mendez-Lago, Maria; Rossi, Fabrizio; Villasante, Alfredo; Dimitri,Patrizio; Karpen, Gary H.; Celniker, Susan E.

    2007-06-15

    Genome sequences for most metazoans are incomplete due tothe presence of repeated DNA in the pericentromeric heterochromatin. Theheterochromatic regions of D. melanogaster contain 20 Mb of sequenceamenable to mapping, sequence assembly and finishing. Here we describethe generation of 15 Mb of finished or improved heterochromatic sequenceusing available clone resources and assembly and mapping methods. We alsoconstructed a BAC-based physical map that spans approximately 13 Mb ofthe pericentromeric heterochromatin, and a cytogenetic map that positionsapproximately 11 Mb of BAC contigs and sequence scaffolds in specificchromosomal locations. The integrated sequence assembly and maps greatlyimprove our understanding of the structure and composition of this poorlyunderstood fraction of a metazoan genome and provide a framework forfunctional analyses.

  7. Improved polynomial remainder sequences for Ore polynomials.

    Science.gov (United States)

    Jaroschek, Maximilian

    2013-11-01

    Polynomial remainder sequences contain the intermediate results of the Euclidean algorithm when applied to (non-)commutative polynomials. The running time of the algorithm is dependent on the size of the coefficients of the remainders. Different ways have been studied to make these as small as possible. The subresultant sequence of two polynomials is a polynomial remainder sequence in which the size of the coefficients is optimal in the generic case, but when taking the input from applications, the coefficients are often larger than necessary. We generalize two improvements of the subresultant sequence to Ore polynomials and derive a new bound for the minimal coefficient size. Our approach also yields a new proof for the results in the commutative case, providing a new point of view on the origin of the extraneous factors of the coefficients.

  8. The DWPF Melter proposed heat up sequence

    Energy Technology Data Exchange (ETDEWEB)

    Smith, M.E.

    1989-08-11

    Per the request of DWPT supervision, a proposed heatup sequence for the DWPF Melter has been documented in this report. DWPF personnel will use this report as a guide to write the detailed DWPF Melter startup plan. 6 refs.

  9. Supervised Sequence Labelling with Recurrent Neural Networks

    CERN Document Server

    Graves, Alex

    2012-01-01

    Supervised sequence labelling is a vital area of machine learning, encompassing tasks such as speech, handwriting and gesture recognition, protein secondary structure prediction and part-of-speech tagging. Recurrent neural networks are powerful sequence learning tools—robust to input noise and distortion, able to exploit long-range contextual information—that would seem ideally suited to such problems. However their role in large-scale sequence labelling systems has so far been auxiliary.    The goal of this book is a complete framework for classifying and transcribing sequential data with recurrent neural networks only. Three main innovations are introduced in order to realise this goal. Firstly, the connectionist temporal classification output layer allows the framework to be trained with unsegmented target sequences, such as phoneme-level speech transcriptions; this is in contrast to previous connectionist approaches, which were dependent on error-prone prior segmentation. Secondly, multidimensional...

  10. Nanopore DNA sequencing using kinetic proofreading

    Science.gov (United States)

    Ling, Xinsheng

    We propose a method of DNA sequencing by combining the physical method of nanopore electrical measurements and Southern's sequencing-by-hybridization. The new key ingredient, essential to both lowering the costs and increasing the precision, is an asymmetric nanopore sandwich device capable of measuring the DNA hybridization probe twice separated by a designed waiting time. Those incorrect probes appearing only once in nanopore ionic current traces are discriminated from the correct ones that appear twice. This method of discrimination is similar to the principle of kinetic proofreading proposed by Hopfield and Ninio in gene transcription and translation processes. An error analysis is of this nanopore kinetic proofreading (nKP) technique for DNA sequencing is carried out in comparison with the most precise 3' dideoxy termination method developed by Sanger. Nanopore DNA sequencing using kinetic proofreading.

  11. Network of tRNA Gene Sequences

    Institute of Scientific and Technical Information of China (English)

    WEI Fang-ping; LI Sheng; MA Hong-ru

    2008-01-01

    A network of 3719 tRNA gene sequences was constructed using simplest alignment. Its topology, degree distribution and clustering coefficient were studied. The behaviors of the network shift from fluctuated distribution to scale-free distribution when the similarity degree of the tRNA gene sequences increases. The tRNA gene sequences with the same anticodon identity are more self-organized than those with different anticodon identities and form local clusters in the network. Some vertices of the local cluster have a high connection with other local clusters, and the probable reason was given. Moreover, a network constructed by the same number of random tRNA sequences was used to make comparisons. The relationships between the properties of the tRNA similarity network and the characters of tRNA evolutionary history were discussed.

  12. New stopping criteria for segmenting DNA sequences

    CERN Document Server

    Li, W

    2001-01-01

    We propose a solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns. This new stopping criterion is based on Bayesian Information Criterion (BIC) in the model selection framework. When this stopping criterion is applied to a left telomere sequence of yeast Saccharomyces cerevisiae and the complete genome sequence of bacterium Escherichia coli, borders of biologically meaningful units were identified (e.g. subtelomeric units, replication origin, and replication terminus), and a more reasonable number of domains was obtained. We also introduce a measure called segmentation strength which can be used to control the delineation of large domains. The relationship between the average domain size and the threshold of segmentation strength is determined for several genome sequences.

  13. Simultaneous sensorimotor adaptation and sequence learning.

    Science.gov (United States)

    Overduin, Simon A; Richardson, Andrew G; Bizzi, Emilio; Press, Daniel Z

    2008-01-01

    Sensorimotor adaptation and sequence learning have often been treated as distinct forms of motor learning. But frequently the motor system must acquire both types of experience simultaneously. Here, we investigated the interaction of these two forms of motor learning by having subjects adapt to predictable forces imposed by a robotic manipulandum while simultaneously reaching to an implicit sequence of targets. We show that adaptation to novel dynamics and learning of a sequence of movements can occur simultaneously and without significant interference or facilitation. When both conditions were presented simultaneously to subjects, their trajectory error and reaction time decreased to the same extent as those of subjects who experienced the force field or sequence independently.

  14. Genetics Home Reference: isolated lissencephaly sequence

    Science.gov (United States)

    ... Facebook Share on Twitter Your Guide to Understanding Genetic Conditions Search MENU Toggle navigation Home Page Search ... Conditions Genes Chromosomes & mtDNA Resources Help Me Understand Genetics Home Health Conditions isolated lissencephaly sequence isolated lissencephaly ...

  15. Female-specific DNA sequences in geese.

    Science.gov (United States)

    Huang, M C; Lin, W C; Horng, Y M; Rouvier, R; Huang, C W

    2003-07-01

    1. The OPAE random primers (Operon Technologies, Inc., CA) were used for random amplified polymorphic DNA (RAPD) fingerprinting in Chinese, White Roman and Landaise geese. One of these primers, OPAE-06, produced a 938-bp sex-specific fragment in all females and in no males of Chinese geese only. 2. A novel female-specific DNA sequence in Chinese goose was cloned and sequenced. Two primers, CGSex-F and CGSex-R, were designed in order to amplify a 912-bp sex-specific polymerase chain reaction (PCR) fragment on genomic DNA from female geese. 3. It was shown that a simple and effective PCR-based sexing technique could be used in the three goose breeds studied. 4. Nucleotide sequencing of the sex-specific fragments in White Roman and Landaise geese was performed and sequence differences were observed among these three breeds.

  16. "X"-tending the Fibonacci Sequence.

    Science.gov (United States)

    Moran, Glenn T.

    2002-01-01

    Outlines a lesson on the Fibonacci and Lucas sequences that captures student interest by presenting the opportunity for computation practice, mental mathematics, and proof for algebra students. Discusses an extension for solving simultaneous equations. (YDS)

  17. Fibonacci Sequence and Supramolecular Structure of DNA.

    Science.gov (United States)

    Shabalkin, I P; Grigor'eva, E Yu; Gudkova, M V; Shabalkin, P I

    2016-05-01

    We proposed a new model of supramolecular DNA structure. Similar to the previously developed by us model of primary DNA structure [11-15], 3D structure of DNA molecule is assembled in accordance to a mathematic rule known as Fibonacci sequence. Unlike primary DNA structure, supramolecular 3D structure is assembled from complex moieties including a regular tetrahedron and a regular octahedron consisting of monomers, elements of the primary DNA structure. The moieties of the supramolecular DNA structure forming fragments of regular spatial lattice are bound via linker (joint) sequences of the DNA chain. The lattice perceives and transmits information signals over a considerable distance without acoustic aberrations. Linker sequences expand conformational space between lattice segments allowing their sliding relative to each other under the action of external forces. In this case, sliding is provided by stretching of the stacked linker sequences.

  18. The complete DNA sequence of vaccinia virus.

    Science.gov (United States)

    Goebel, S J; Johnson, G P; Perkus, M E; Davis, S W; Winslow, J P; Paoletti, E

    1990-11-01

    The complete DNA sequence of the genome of vaccinia virus has been determined. The genome consisted of 191,636 bp with a base composition of 66.6% A + T. We have identified 198 "major" protein-coding regions and 65 overlapping "minor" regions, for a total of 263 potential genes. Genes encoded by the virus were located by examination of DNA sequence characteristics and compared with existing vaccinia virus mapping analyses, sequence data, and transcription data. These genes were found to be compactly organized along the genome with relatively few regions of noncoding sequences. Whereas several similarities to proteins of known function were discerned, the function of the majority of proteins encoded by these open reading frames is as yet undetermined.

  19. ASAP: Amplification, sequencing & annotation of plastomes

    Directory of Open Access Journals (Sweden)

    Folta Kevin M

    2005-12-01

    Full Text Available Abstract Background Availability of DNA sequence information is vital for pursuing structural, functional and comparative genomics studies in plastids. Traditionally, the first step in mining the valuable information within a chloroplast genome requires sequencing a chloroplast plasmid library or BAC clones. These activities involve complicated preparatory procedures like chloroplast DNA isolation or identification of the appropriate BAC clones to be sequenced. Rolling circle amplification (RCA is being used currently to amplify the chloroplast genome from purified chloroplast DNA and the resulting products are sheared and cloned prior to sequencing. Herein we present a universal high-throughput, rapid PCR-based technique to amplify, sequence and assemble plastid genome sequence from diverse species in a short time and at reasonable cost from total plant DNA, using the large inverted repeat region from strawberry and peach as proof of concept. The method exploits the highly conserved coding regions or intergenic regions of plastid genes. Using an informatics approach, chloroplast DNA sequence information from 5 available eudicot plastomes was aligned to identify the most conserved regions. Cognate primer pairs were then designed to generate ~1 – 1.2 kb overlapping amplicons from the inverted repeat region in 14 diverse genera. Results 100% coverage of the inverted repeat region was obtained from Arabidopsis, tobacco, orange, strawberry, peach, lettuce, tomato and Amaranthus. Over 80% coverage was obtained from distant species, including Ginkgo, loblolly pine and Equisetum. Sequence from the inverted repeat region of strawberry and peach plastome was obtained, annotated and analyzed. Additionally, a polymorphic region identified from gel electrophoresis was sequenced from tomato and Amaranthus. Sequence analysis revealed large deletions in these species relative to tobacco plastome thus exhibiting the utility of this method for structural and

  20. Sequencing and Analysis of Neanderthal Genomic DNA

    OpenAIRE

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo, Svante; Pritchard, Jonathan K; Rubin, Edward M.

    2006-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library a...

  1. Inconsistencies in Neanderthal genomic DNA sequences.

    Directory of Open Access Journals (Sweden)

    Jeffrey D Wall

    2007-10-01

    Full Text Available Two recently published papers describe nuclear DNA sequences that were obtained from the same Neanderthal fossil. Our reanalyses of the data from these studies show that they are not consistent with each other and point to serious problems with the data quality in one of the studies, possibly due to modern human DNA contaminants and/or a high rate of sequencing errors.

  2. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  3. A Method to Construct Generalized Fibonacci Sequences

    Directory of Open Access Journals (Sweden)

    Adalberto García-Máynez

    2016-01-01

    Full Text Available The main purpose of this paper is to study the convergence properties of Generalized Fibonacci Sequences and the series of partial sums associated with them. When the proper values of an s×s real matrix A are real and different, we give a necessary and sufficient condition for the convergence of the matrix sequence A,A2,A3,… to a matrix B.

  4. A Method to Construct Generalized Fibonacci Sequences

    OpenAIRE

    Adalberto García-Máynez; Adolfo Pimienta Acosta

    2016-01-01

    The main purpose of this paper is to study the convergence properties of Generalized Fibonacci Sequences and the series of partial sums associated with them. When the proper values of an $s\\timess$ real matrix $A$ are real and different, we give a necessary and sufficient condition for the convergence of the matrix sequence $A,{A}^{\\mathrm{2}},{A}^{\\mathrm{3}},\\dots $ to a matrix $B$ .

  5. Recursive Polynomial Remainder Sequence and its Subresultants

    OpenAIRE

    Terui, Akira

    2008-01-01

    We introduce concepts of "recursive polynomial remainder sequence (PRS)" and "recursive subresultant," along with investigation of their properties. A recursive PRS is defined as, if there exists the GCD (greatest common divisor) of initial polynomials, a sequence of PRSs calculated "recursively" for the GCD and its derivative until a constant is derived, and recursive subresultants are defined by determinants representing the coefficients in recursive PRS as functions of coefficients of init...

  6. Fibonacci difference sequence spaces for modulus functions

    Directory of Open Access Journals (Sweden)

    Kuldip Raj

    2015-05-01

    Full Text Available In the present paper we introduce Fibonacci difference sequence spaces l(F, Ƒ, p, u and  l_∞(F, Ƒ, p, u by using a sequence of modulus functions and a new band matrix F. We also make an effort to study some inclusion relations, topological and geometric properties of these spaces. Furthermore, the alpha, beta, gamma duals and matrix transformation of the space l(F, Ƒ, p, u are determined.

  7. Determinant Representations of Sequences: A Survey

    Directory of Open Access Journals (Sweden)

    Moghaddamfar A. R.

    2014-01-01

    Full Text Available This is a survey of recent results concerning (integer matrices whose leading principal minors are well-known sequences such as Fibonacci, Lucas, Jacobsthal and Pell (subsequences. There are different ways for constructing such matrices. Some of these matrices are constructed by homogeneous or nonhomogeneous recurrence relations, and others are constructed by convolution of two sequences. In this article, we will illustrate the idea of these methods by constructing some integer matrices of this type.

  8. Cactus: Algorithms for genome multiple sequence alignment

    OpenAIRE

    Paten, Benedict; Earl, Dent; Nguyen, Ngan; Diekhans, Mark; Zerbino, Daniel; Haussler, David

    2011-01-01

    Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms...

  9. Probabilistic motor sequence yields greater offline and less online learning than fixed sequence

    Directory of Open Access Journals (Sweden)

    Yue eDu

    2016-03-01

    Full Text Available It is well acknowledged that motor sequences can be learned quickly through online learning. Subsequently, the initial acquisition of a motor sequence is boosted or consolidated by offline learning. However, little is known whether offline learning can drive the fast learning of motor sequences (i.e., initial sequence learning in the first training session. To examine offline learning in the fast learning stage, we asked four groups of young adults to perform the serial reaction time (SRT task with either a fixed or probabilistic sequence and with or without preliminary knowledge of the presence of a sequence. The sequence and instruction types were manipulated to emphasize either procedural (probabilistic sequence; no preliminary knowledge or declarative (fixed sequence; with preliminary knowledge memory that were found to either facilitate or inhibit offline learning. In the SRT task, there were six learning blocks with a two-minute break between each consecutive block. Throughout the session, stimuli followed the same fixed or probabilistic pattern except in Block 5, in which stimuli appeared in a random order. We found that preliminary knowledge facilitated the learning of a fixed sequence, but not a probabilistic sequence. In addition to overall learning measured by the mean reaction time (RT, we examined the progressive changes in RT within and between blocks (i.e., online and offline learning, respectively. It was found that the two groups who performed the fixed sequence, regardless of preliminary knowledge, showed greater online learning than the other two groups who performed the probabilistic sequence. The groups who performed the probabilistic sequence, regardless of preliminary knowledge, did not display online learning, as indicated by a decline in performance within the learning blocks. However, they did demonstrate remarkably greater offline improvement in RT, which suggests that they are learning the probabilistic sequence

  10. Sequence determinants of human microsatellite variability

    Directory of Open Access Journals (Sweden)

    Jakobsson Mattias

    2009-12-01

    Full Text Available Abstract Background Microsatellite loci are frequently used in genomic studies of DNA sequence repeats and in population studies of genetic variability. To investigate the effect of sequence properties of microsatellites on their level of variability we have analyzed genotypes at 627 microsatellite loci in 1,048 worldwide individuals from the HGDP-CEPH cell line panel together with the DNA sequences of these microsatellites in the human RefSeq database. Results Calibrating PCR fragment lengths in individual genotypes by using the RefSeq sequence enabled us to infer repeat number in the HGDP-CEPH dataset and to calculate the mean number of repeats (as opposed to the mean PCR fragment length, under the assumption that differences in PCR fragment length reflect differences in the numbers of repeats in the embedded repeat sequences. We find the mean and maximum numbers of repeats across individuals to be positively correlated with heterozygosity. The size and composition of the repeat unit of a microsatellite are also important factors in predicting heterozygosity, with tetra-nucleotide repeat units high in G/C content leading to higher heterozygosity. Finally, we find that microsatellites containing more separate sets of repeated motifs generally have higher heterozygosity. Conclusions These results suggest that sequence properties of microsatellites have a significant impact in determining the features of human microsatellite variability.

  11. A Unified Theoretical Framework for Cognitive Sequencing

    Directory of Open Access Journals (Sweden)

    Tejas Savalia

    2016-11-01

    Full Text Available The capacity to sequence information is central to human performance. Sequencing ability forms the foundation stone for higher order cognition related to language and goal-directed planning. Information related to the order of items, their timing, chunking and hierarchical organization are important aspects in sequencing. Past research on sequencing has emphasized two distinct and independent dichotomies: implicit versus explicit and goal-directed versus habits. We propose a theoretical framework unifying these two streams. Our proposal relies on brain's ability to implicitly extract statistical regularities from the stream of stimuli and with attentional engagement organizing sequences explicitly and hierarchically. Similarly, sequences that need to be assembled purposively to accomplish a goal require engagement of attentional processes. With repetition, these goal-directed plans become habits with concomitant disengagement of attention. Thus attention and awareness play a crucial role in the implicit-to-explicit transition as well as in how goal-directed plans become automatic habits. Cortico-subcortical loops ─ basal ganglia-frontal cortex and hippocampus-frontal cortex loops ─ mediate the transition process. We show how the computational principles of model-free and model-based learning paradigms, along with a pivotal role for attention and awareness, offer a unifying framework for these two dichotomies. Based on this framework, we make testable predictions related to the potential influence of response-to-stimulus interval (RSI on developing awareness in implicit learning tasks.

  12. Sequence characteristics of T4-like bacteriophage IME08 benome termini revealed by high throughput sequencing

    Directory of Open Access Journals (Sweden)

    An Xiaoping

    2011-04-01

    Full Text Available Abstract Background T4 phage is a model species that has contributed broadly to our understanding of molecular biology. T4 DNA replication and packaging share various mechanisms with human double-stranded DNA viruses such as herpes virus. The literature indicates that T4-like phage genomes have permuted terminal sequences, and are generated by a DNA terminase in a sequence-independent manner; Methods genomic DNA of T4-like bacteriophage IME08 was subjected to high throughput sequencing, and the read sequences with extraordinarily high occurrences were analyzed; Results we demonstrate that both the 5' and 3' termini of the IME08 genome starts with base G or A. The presence of a consensus sequence TTGGA|G around the breakpoint of the high frequency read sequences suggests that the terminase cuts the branched pre-genome in a sequence-preferred manner. Our analysis also shows that terminal cleavage is asymmetric, with one end cut at a consensus sequence, and the other end generated randomly. The sequence-preferred cleavage may produce sticky-ends, but with each end being packaged with different efficiencies; Conclusions this study illustrates how high throughput sequencing can be used to probe replication and packaging mechanisms in bacteriophages and/or viruses.

  13. Monitoring genomic sequences during SELEX using high-throughput sequencing: neutral SELEX.

    Directory of Open Access Journals (Sweden)

    Bob Zimmermann

    Full Text Available BACKGROUND: SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. METHODOLOGY/PRINCIPAL FINDINGS: To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. CONCLUSIONS/SIGNIFICANCE: Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

  14. Generating matrix and sums of Fibonacci and Pell sequences

    Science.gov (United States)

    Ho, C. K.; Woon, H. S.; Chong, Chin-Yoon

    2014-07-01

    In this paper, we study the Fibonacci sequence and Pell sequence and developed generating matrices for them. First we proved two results on the even sum of the Fibonacci sequence and the Pell sequence, using the generating matrix approach. We then deduce the odd sums, some identities and recursive formulas for these two sequences.

  15. Papaya (Carica papaya) lysozyme is a member of the family 19 (Basic, class II) chitinases

    NARCIS (Netherlands)

    Subroto, T; Sufiati, S; Beintema, JJ

    1999-01-01

    The most comprehensive studies on a plant lysozyme (EC 3.2.1.17) are those on the enzyme from papaya (Carica papaya) latex, published in 1967 and 1969. However, the N-terminal amino acid sequence of five amino acid sequence of this enzyme, determined by manual Edman degradation, did not allow assign

  16. Synaptotagmin gene content of the sequenced genomes

    Directory of Open Access Journals (Sweden)

    Craxton Molly

    2004-07-01

    Full Text Available Abstract Background Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. Results I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. Conclusions I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their

  17. Processing Aftershock Sequences Using Waveform Correlation

    Science.gov (United States)

    Resor, M. E.; Procopio, M. J.; Young, C. J.; Carr, D. B.

    2008-12-01

    For most event monitoring systems, the objective is to keep up with the flow of incoming data, producing a bulletin with some modest, relatively constant, time delay after present time, often a period of a few hours or less. Because the association problem scales exponentially and not linearly with the number of detections, a dramatic increase in seismicity due to an aftershock sequence can easily cause the bulletin delay time to increase dramatically. In some cases, the production of a bulletin may cease altogether, until the automatic system can catch up. For a nuclear monitoring system, the implications of such a delay could be dire. Given the expected similarity between a mainshock and aftershocks, it has been proposed that waveform correlation may provide a powerful means to simultaneously increase the efficiency of processing aftershock sequences, while also lowering the detection threshold and improving the quality of the event solutions. However, many questions remain unanswered. What are the key parameters for achieving the best correlations between waveforms (window length, filtering, etc.), and are they sequence-dependent? What is the overall percentage of similar events in an aftershock sequence, i.e. what is the maximum level of efficiency that a waveform correlation could be expected to achieve? Finally, how does this percentage of events vary among sequences? Using data from the aftershock sequence for the December 26, 2004 Mw 9.1 Sumatra event, we investigate these issues by building and testing a prototype waveform correlation event detection system that automatically expands its library of known events as new signatures are indentified in the aftershock sequence (by traditional signal detection and event processing). Our system tests all incoming data against this dynamic library, thereby identify any similar events before traditional processing takes place. In the region surrounding the Sumatra event, the NEIC EDR contains 4997 events in the 9

  18. Artificial duplicate reads in sequencing data of 454 Genome Sequencer FLX System

    Institute of Scientific and Technical Information of China (English)

    Hui Dong; Yangyi Chen; Yan Shen; Shengyue Wang; Guoping Zhao; Weirong Jin

    2011-01-01

    The 454 Genome Sequencer (GS) FLX System is one of the next-generation sequencing systems featured by long reads, high accuracy, and ultra-high throughput.Based on the mechanism of emulsion PCR, a unique DNA template would only generate a unique sequence read after being amplified and sequenced on GS FLX.However,biased amplification of DNA templates might occur in the process of emulsion PCR, which results in production of artificial duplicate reads.Under the condition that each DNA template is unique to another, 3.49%-18.14% of total reads in GS FLX-sequencing data were found to be artificial duplicate reads.These duplicate reads may lead to misunderstanding of sequencing data and special attention should be paid to the potential biases they introduced to the data.

  19. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    Directory of Open Access Journals (Sweden)

    Baldwin Stephen A

    2011-03-01

    Full Text Available Abstract Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  20. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    Science.gov (United States)

    2011-01-01

    Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/. PMID:21385349

  1. Inverted temperature sequences: role of deformation partitioning

    Science.gov (United States)

    Grujic, D.; Ashley, K. T.; Coble, M. A.; Coutand, I.; Kellett, D.; Whynot, N.

    2015-12-01

    The inverted metamorphism associated with the Main Central thrust zone in the Himalaya has been historically attributed to a number of tectonic processes. Here we show that there is actually a composite peak and deformation temperature sequence that formed in succession via different tectonic processes. The deformation partitioning seems to the have played a key role, and the magnitude of each process has varied along strike of the orogen. To explain the formation of the inverted metamorphic sequence across the Lesser Himalayan Sequence (LHS) in eastern Bhutan, we used Raman spectroscopy of carbonaceous material (RSCM) to determine the peak metamorphic temperatures and Ti-in-quartz thermobarometry to determine the deformation temperatures combined with thermochronology including published apatite and zircon U-Th/He and fission-track data and new 40Ar/39Ar dating of muscovite. The dataset was inverted using 3D-thermal-kinematic modeling to constrain the ranges of geological parameters such as fault geometry and slip rates, location and rates of localized basal accretion, and thermal properties of the crust. RSCM results indicate that there are two peak temperature sequences separated by a major thrust within the LHS. The internal temperature sequence shows an inverted peak temperature gradient of 12 °C/km; in the external (southern) sequence, the peak temperatures are constant across the structural sequence. Thermo-kinematic modeling suggest that the thermochronologic and thermobarometric data are compatible with a two-stage scenario: an Early-Middle Miocene phase of fast overthrusting of a hot hanging wall over a downgoing footwall and inversion of the synkinematic isotherms, followed by the formation of the external duplex developed by dominant underthrusting and basal accretion. To reconcile our observations with the experimental data, we suggest that pervasive ductile deformation within the upper LHS and along the Main Central thrust zone at its top stopped at

  2. Compilation of tRNA sequences.

    Science.gov (United States)

    Sprinzl, M; Grueter, F; Spelzhaus, A; Gauss, D H

    1980-01-11

    This compilation presents in a small space the tRNA sequences so far published. The numbering of tRNAPhe from yeast is used following the rules proposed by the participants of the Cold Spring Harbor Meeting on tRNA 1978 (1,2;Fig. 1). This numbering allows comparisons with the three dimensional structure of tRNAPhe. The secondary structure of tRNAs is indicated by specific underlining. In the primary structure a nucleoside followed by a nucleoside in brackets or a modification in brackets denotes that both types of nucleosides can occupy this position. Part of a sequence in brackets designates a piece of sequence not unambiguosly analyzed. Rare nucleosides are named according to the IUPACIUB rules (for complicated rare nucleosides and their identification see Table 1); those with lengthy names are given with the prefix x and specified in the footnotes. Footnotes are numbered according to the coordinates of the corresponding nucleoside and are indicated in the sequence by an asterisk. The references are restricted to the citation of the latest publication in those cases where several papers deal with one sequence. For additional information the reader is referred either to the original literature or to other tRNA sequence compilations (3-7). Mutant tRNAs are dealt with in a compilation by J. Celis (8). The compilers would welcome any information by the readers regarding missing material or erroneous presentation. On the basis of this numbering system computer printed compilations of tRNA sequences in a linear form and in cloverleaf form are in preparation.

  3. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Science.gov (United States)

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid...

  4. Sequencing of BAC pools by different next generation sequencing platforms and strategies

    Directory of Open Access Journals (Sweden)

    Scholz Uwe

    2011-10-01

    Full Text Available Abstract Background Next generation sequencing of BACs is a viable option for deciphering the sequence of even large and highly repetitive genomes. In order to optimize this strategy, we examined the influence of read length on the quality of Roche/454 sequence assemblies, to what extent Illumina/Solexa mate pairs (MPs improve the assemblies by scaffolding and whether barcoding of BACs is dispensable. Results Sequencing four BACs with both FLX and Titanium technologies revealed similar sequencing accuracy, but showed that the longer Titanium reads produce considerably less misassemblies and gaps. The 454 assemblies of 96 barcoded BACs were improved by scaffolding 79% of the total contig length with MPs from a non-barcoded library. Assembly of the unmasked 454 sequences without separation by barcodes revealed chimeric contig formation to be a major problem, encompassing 47% of the total contig length. Masking the sequences reduced this fraction to 24%. Conclusion Optimal BAC pool sequencing should be based on the longest available reads, with barcoding essential for a comprehensive assessment of both repetitive and non-repetitive sequence information. When interest is restricted to non-repetitive regions and repeats are masked prior to assembly, barcoding is non-essential. In any case, the assemblies can be improved considerably by scaffolding with non-barcoded BAC pool MPs.

  5. Only correlated sequences that are actively processed contribute to implicit sequence learning.

    Science.gov (United States)

    Meier, Beat; Weiermann, Brigitte; Cock, Josephine

    2012-09-01

    The purpose of the study was to investigate how implicit sequence learning is affected by the presence of secondary information that is correlated with the primary sequence but not necessarily relevant to performance. In a previous work, we have shown that correlation plays an important role but other prerequisites may also be involved. In Experiments 1 and 2, using a task sequence learning paradigm, we found that primary sequence learning was not affected by secondary information that was sequenced but irrelevant to performance, even though the two streams of information were correlated. In contrast, in Experiment 3, we found that sensitivity to the main sequence was greater with the provision of extra sequenced information that was relevant to performance in addition to being correlated. This suggests that sequence learning was enhanced through the integration of information. We conclude that information in secondary as well as primary sequences must be actively processed if it is to have a beneficial impact. By actively processed we mean information that is selectively attended and necessary for carrying out the tasks.

  6. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Science.gov (United States)

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697

  7. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Directory of Open Access Journals (Sweden)

    Chun-Tien Chang

    2012-01-01

    Full Text Available The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs, insertion-deletions (indels, short tandem repeats (STRs, and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR, which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS; (iii determine human papilloma virus (HPV genotypes by searching current viral databases in cases of double infections; (iv estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4 and its paralog HSPDP3.

  8. Mixed sequence reader: a program for analyzing DNA sequences with heterozygous base calling.

    Science.gov (United States)

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3.

  9. Criteria for defining and recognizing the various orders of sequences in outcrop sequence stratigraphy

    Institute of Scientific and Technical Information of China (English)

    WANG Xunlian

    2004-01-01

    The regional distribution in different depositional facies belts is here regarded as an important criterion for defining and recognizing the various orders of sequences. The third-order sequence is possibly global in nature, which may be discerned in different depositional facies belts in one continental margin and can be correlated over long distances, sometimes even worldwide. Commonly, correlation of subsequence (fourth-order sequence with time interval of 0.5-1.5 Ma) is difficult in different facies belts, although some of them may also be worldwide in distribution. A subsequence should be able to discern and correlate within at least one facies belt.The higher-order sequences, including microsequence (fifth-order sequence) and minisequence (sixth-order sequence), are regional or local in distribution. They may reflect the longer and shorter Milankovitch cycles respectively. Sequence and subsequence are usually recognizable in different facies belts, while microsequence and minisequence may be distinguished only in shallow marine deposits, but not in slope and basin facies deposits. A brief discussion is made on the essential conditions for correct identification of sequences, useful methods of study, and problems meriting special attention in outcrop sequence stratigraphy.

  10. A survey of sequence alignment algorithms for next-generation sequencing.

    Science.gov (United States)

    Li, Heng; Homer, Nils

    2010-09-01

    Rapidly evolving sequencing technologies produce data on an unparalleled scale. A central challenge to the analysis of this data is sequence alignment, whereby sequence reads must be compared to a reference. A wide variety of alignment algorithms and software have been subsequently developed over the past two years. In this article, we will systematically review the current development of these algorithms and introduce their practical applications on different types of experimental data. We come to the conclusion that short-read alignment is no longer the bottleneck of data analyses. We also consider future development of alignment algorithms with respect to emerging long sequence reads and the prospect of cloud computing.

  11. Detection of M-Sequences from Spike Sequence in Neuronal Networks

    Directory of Open Access Journals (Sweden)

    Yoshi Nishitani

    2012-01-01

    Full Text Available In circuit theory, it is well known that a linear feedback shift register (LFSR circuit generates pseudorandom bit sequences (PRBS, including an M-sequence with the maximum period of length. In this study, we tried to detect M-sequences known as a pseudorandom sequence generated by the LFSR circuit from time series patterns of stimulated action potentials. Stimulated action potentials were recorded from dissociated cultures of hippocampal neurons grown on a multielectrode array. We could find several M-sequences from a 3-stage LFSR circuit (M3. These results show the possibility of assembling LFSR circuits or its equivalent ones in a neuronal network. However, since the M3 pattern was composed of only four spike intervals, the possibility of an accidental detection was not zero. Then, we detected M-sequences from random spike sequences which were not generated from an LFSR circuit and compare the result with the number of M-sequences from the originally observed raster data. As a result, a significant difference was confirmed: a greater number of “0–1” reversed the 3-stage M-sequences occurred than would have accidentally be detected. This result suggests that some LFSR equivalent circuits are assembled in neuronal networks.

  12. Improvement of SNR with Chaotic Spreading Sequences for CDMA

    CERN Document Server

    Umeno, K; Umeno, Ken; Kitayama, Ken-ichi

    1999-01-01

    We show that chaotic spreading sequences generated by ergodic mappings of Chebyshev orthogonal polynomials have better correlation properties for CDMA(code division multiple access) than the optimal binary sequences (Gold sequences) in the sense of ensemble average.

  13. Fibonacci-like sequences and generalized Pascal's triangles

    Science.gov (United States)

    Vincenzi, G.; Siani, S.

    2014-05-01

    The properties pertaining to diagonals of generalized Pascal's triangles are studied. Combinatorial relationships between Fibonacci-like sequences and Fibonacci sequence itself are determined, using the sequence of diagonals of generalized Pascal's triangle.

  14. On type sequences and Arf rings

    Directory of Open Access Journals (Sweden)

    Dilip Premchand Patil

    2007-06-01

    Full Text Available In this article in Section~2 we give an explicit description to compute the type sequence $mathrm{t}_1,ldots,mathrm{t}_{n}$ of a semigroup $Gamma$ generated by an arithmetic sequence (see 2.7; we show that the $i$-th term $mathrm{t}_i$ is equal to $1$ or to the type $au_Gamma$, depending on its position. In Section 3, for analytically irreducible ring $R$ with the branch sequence $R=R_0 subsetneq R_1 subsetneq ldotssubsetneq R_{m-1} subsetneq R_{m} =overline{R}$, starting from a result proved in [4] we give a characterization (see 3.6 of the ``Arf'' property using the type sequence of $R$ and of the rings $R_j$, $1leq jleq m-1$. Further, we prove (see 3.9, 3.10 some relations among the integers $ell^*(R$ and $ell^*(R_j$, $1leq jleq m-1$. These relations and a result of [6] allow us to obtain a new characterization (see 3.12 of semigroup rings of minimal multiplicity with $ell^*(Rleq au(R$ in terms of the Arf property, type sequences and relations between $ell^*(R$ and $ell^*(R_j$, $1leq jleq m-1$.

  15. Viral genome sequencing by random priming methods

    Directory of Open Access Journals (Sweden)

    Zhang Xinsheng

    2008-01-01

    Full Text Available Abstract Background Most emerging health threats are of zoonotic origin. For the overwhelming majority, their causative agents are RNA viruses which include but are not limited to HIV, Influenza, SARS, Ebola, Dengue, and Hantavirus. Of increasing importance therefore is a better understanding of global viral diversity to enable better surveillance and prediction of pandemic threats; this will require rapid and flexible methods for complete viral genome sequencing. Results We have adapted the SISPA methodology 123 to genome sequencing of RNA and DNA viruses. We have demonstrated the utility of the method on various types and sources of viruses, obtaining near complete genome sequence of viruses ranging in size from 3,000–15,000 kb with a median depth of coverage of 14.33. We used this technique to generate full viral genome sequence in the presence of host contaminants, using viral preparations from cell culture supernatant, allantoic fluid and fecal matter. Conclusion The method described is of great utility in generating whole genome assemblies for viruses with little or no available sequence information, viruses from greatly divergent families, previously uncharacterized viruses, or to more fully describe mixed viral infections.

  16. Sequence analysis by iterated maps, a review.

    Science.gov (United States)

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  17. Protein sequence classification using feature hashing.

    Science.gov (United States)

    Caragea, Cornelia; Silvescu, Adrian; Mitra, Prasenjit

    2012-06-21

    Recent advances in next-generation sequencing technologies have resulted in an exponential increase in the rate at which protein sequence data are being acquired. The k-gram feature representation, commonly used for protein sequence classification, usually results in prohibitively high dimensional input spaces, for large values of k. Applying data mining algorithms to these input spaces may be intractable due to the large number of dimensions. Hence, using dimensionality reduction techniques can be crucial for the performance and the complexity of the learning algorithms. In this paper, we study the applicability of feature hashing to protein sequence classification, where the original high-dimensional space is "reduced" by hashing the features into a low-dimensional space, using a hash function, i.e., by mapping features into hash keys, where multiple features can be mapped (at random) to the same hash key, and "aggregating" their counts. We compare feature hashing with the "bag of k-grams" approach. Our results show that feature hashing is an effective approach to reducing dimensionality on protein sequence classification tasks.

  18. SNAD: sequence name annotation-based designer

    Directory of Open Access Journals (Sweden)

    Gorbalenya Alexander E

    2009-08-01

    Full Text Available Abstract Background A growing diversity of biological data is tagged with unique identifiers (UIDs associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Results Here we introduce SNAD (Sequence Name Annotation-based Designer that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. Conclusion A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.

  19. Probabilistic Sequence Learning in Mild Cognitive Impairment

    Directory of Open Access Journals (Sweden)

    Dezso eNemeth

    2013-07-01

    Full Text Available Mild Cognitive Impairment (MCI causes slight but noticeable disruption in cognitive systems, primarily executive and memory functions. However, it is not clear if the development of sequence learning is affected by an impaired cognitive system and, if so, how. The goal of our study was to investigate the development of probabilistic sequence learning, from the initial acquisition to consolidation, in MCI and healthy elderly control groups. We used the Alternating Serial Reaction Time task (ASRT to measure probabilistic sequence learning. Individuals with MCI showed weaker learning performance than the healthy elderly group. However, using the reaction times only from the second half of each learning block – after the reactivation phase - we found intact learning in MCI. Based on the assumption that the first part of each learning block is related to reactivation/recall processes, we suggest that these processes are affected in MCI. The 24-hour offline period showed no effect on sequence-specific learning in either group but did on general skill learning: the healthy elderly group showed offline improvement in general reaction times while individuals with MCI did not. Our findings deepen our understanding regarding the underlying mechanisms and time course of sequence acquisition and consolidation.

  20. BLEACHING EUCALYPTUS PULPS WITH SHORT SEQUENCES

    Directory of Open Access Journals (Sweden)

    Flaviana Reis Milagres

    2011-03-01

    Full Text Available Eucalyptus spp kraft pulp, due to its high content of hexenuronic acids, is quite easy to bleach. Therefore, investigations have been made attempting to decrease the number of stages in the bleaching process in order to minimize capital costs. This study focused on the evaluation of short ECF (Elemental Chlorine Free and TCF (Totally Chlorine Free sequences for bleaching oxygen delignified Eucalyptus spp kraft pulp to 90% ISO brightness: PMoDP (Molybdenum catalyzed acid peroxide, chlorine dioxide and hydrogen peroxide, PMoD/P (Molybdenum catalyzed acid peroxide, chlorine dioxide and hydrogen peroxide, without washing PMoD(PO (Molybdenum catalyzed acid peroxide, chlorine dioxide and pressurized peroxide, D(EPODP (chlorine dioxide, extraction oxidative with oxygen and peroxide, chlorine dioxide and hydrogen peroxide, PMoQ(PO (Molybdenum catalyzed acid peroxide, DTPA and pressurized peroxide, and XPMoQ(PO (Enzyme, molybdenum catalyzed acid peroxide, DTPA and pressurized peroxide. Uncommon pulp treatments, such as molybdenum catalyzed acid peroxide (PMo and xylanase (X bleaching stages, were used. Among the ECF alternatives, the two-stage PMoD/P sequence proved highly cost-effective without affecting pulp quality in relation to the traditional D(EPODP sequence and produced better quality effluent in relation to the reference. However, a four stage sequence, XPMoQ(PO, was required to achieve full brightness using the TCF technology. This sequence was highly cost-effective although it only produced pulp of acceptable quality.

  1. Progressive multiple sequence alignments from triplets

    Directory of Open Access Journals (Sweden)

    Stadler Peter F

    2007-07-01

    Full Text Available Abstract Background The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors. Research Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the "once a gap, always a gap" problem of progressive alignment procedures. Conclusion The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mismatch scores.

  2. RSAT 2015: Regulatory Sequence Analysis Tools.

    Science.gov (United States)

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/.

  3. DNA sequencing by synthesis with degenerate primers

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    The degenerate primer-based sequencing Was developed by a synthesis method(DP-SBS)for high-throughput DNA sequencing,in which a set of degenerate primers are hybridized on the arrayed DNA templates and extended by DNA polymerase on microarrays.In this method,adifferent set of degenerate primers containing a give nnumber(n)of degenerate nucleotides at the 3'-ends were annealed to the sequenced templates that were immobilized on the solid surface.The nucleotides(n+1)on the template sequences were determined by detecting the incorporation of fluorescent labeled nucleotides.The fluorescent labeled nucleotide was incorporated into the primer in a base-specific manner after the enzymatic primer extension reactions and nine-base length were read out accurately.The main advanmge of the DP-SBS is that the method only uses very conventional biochemical reagents and avoids the complicated special chemical reagents for removing the labeled nucleotides and reactivating the primer for further extension.From the present study,it is found that the DP-SBS method is reliable,simple,and cost-effective for laboratory-sequencing a large amount of short DNA fragments.

  4. End Sequencing and Finger Printing of Human & Mouse BAC Libraries

    Energy Technology Data Exchange (ETDEWEB)

    Fraser, C

    2005-09-27

    This project provided for continued end sequencing of existing and new BAC libraries constructed to support human sequencing as well as to initiate BAC end sequencing from the mouse BAC libraries constructed to support mouse sequencing. The clones, the sequences, and the fingerprints are now an available resource for the community at large. Research and development of new metaodologies for BAC end sequencing have reduced costs and increase throughput.

  5. Comparison study on k-word statistical measures for protein: From sequence to 'sequence space'

    Directory of Open Access Journals (Sweden)

    Wang Tianming

    2008-09-01

    Full Text Available Abstract Background Many proposed statistical measures can efficiently compare protein sequence to further infer protein structure, function and evolutionary information. They share the same idea of using k-word frequencies of protein sequences. Given a protein sequence, the information on its related protein sequences hasn't been used for protein sequence comparison until now. This paper proposed a scheme to construct protein 'sequence space' which was associated with protein sequences related to the given protein, and the performances of statistical measures were compared when they explored the information on protein 'sequence space' or not. This paper also presented two statistical measures for protein: gre.k (generalized relative entropy and gsm.k (gapped similarity measure. Results We tested statistical measures based on protein 'sequence space' or not with three data sets. This not only offers the systematic and quantitative experimental assessment of these statistical measures, but also naturally complements the available comparison of statistical measures based on protein sequence. Moreover, we compared our statistical measures with alignment-based measures and the existing statistical measures. The experiments were grouped into two sets. The first one, performed via ROC (Receiver Operating Curve analysis, aims at assessing the intrinsic ability of the statistical measures to discriminate and classify protein sequences. The second set of the experiments aims at assessing how well our measure does in phylogenetic analysis. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of protein 'sequence space' and statistical measures were obtained. Conclusion Alignment-based measures have a clear advantage when the data is high redundant. The more efficient statistical measure is the novel gsm.k introduced by this article, the cos.k followed. When the data becomes less redundant, gre

  6. Generation of physical map contig-specific sequences useful for whole genome sequence scaffolding.

    Directory of Open Access Journals (Sweden)

    Yanliang Jiang

    Full Text Available Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge.

  7. Generation of Physical Map Contig-Specific Sequences Useful for Whole Genome Sequence Scaffolding

    Science.gov (United States)

    Jiang, Yanliang; Ninwichian, Parichart; Liu, Shikai; Zhang, Jiaren; Kucuktas, Huseyin; Sun, Fanyue; Kaltenboeck, Ludmilla; Sun, Luyang; Bao, Lisui; Liu, Zhanjiang

    2013-01-01

    Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly) were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge. PMID:24205335

  8. Finding Sequential Patterns from Large Sequence Data

    CERN Document Server

    Esmaeili, Mahdi

    2010-01-01

    Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Sequential pattern mining finds sets of data items that occur together frequently in some sequences. Sequential pattern mining, which extracts frequent subsequences from a sequence database, has attracted a great deal of interest during the recent data mining research because it is the basis of many applications, such as: web user analysis, stock trend prediction, DNA sequence analysis, finding language or linguistic patterns from natural language texts, and using the history of symptoms to predict certain kind of disease. The diversity of the applications may not be possible to apply a single sequential pattern model to all these problems. Each application may require a unique model and solution. A number of research projects were established in recent years to develop meaningful sequential pattern...

  9. Spectral sequences in smooth generalized cohomology

    CERN Document Server

    Grady, Daniel

    2016-01-01

    We consider spectral sequences in smooth generalized cohomology theories, including differential generalized cohomology theories. The main differential spectral sequences will be of the Atiyah-Hirzebruch (AHSS) type, where we provide a filtration by the Cech resolution of smooth manifolds. This allows for systematic study of torsion in differential cohomology. We apply this in detail to smooth Deligne cohomology, differential topological complex K-theory, and to a smooth extension of integral Morava K-theory that we introduce. In each case we explicitly identify the differentials in the corresponding spectral sequences, which exhibit an interesting and systematic interplay between (refinement of) classical cohomology operations, operations involving differential forms, and operations on cohomology with U(1) coefficients.

  10. Nager syndrome and Pierre Robin sequence.

    Science.gov (United States)

    Rosa, Rafael Fabiano Machado; Guimarães, Victória Bernardes; Beltrão, Luciana Amorim; Trombetta, Júlia Santana; Lliguin, Karen Lizeth Puma; de Mattos, Vinicius Freitas; Zen, Paulo Ricardo Gazzola

    2015-04-01

    Nager syndrome is considered a rare genetic syndrome characterized by craniofacial and radial anomalies. Pierre Robin sequence is a triad that includes micrognathia, cleft palate and glossoptosis. The present patient had typical findings of Nager syndrome and Pierre Robin sequence. He progressed to severe respiratory distress, requiring mechanical ventilation and tracheostomy. At 1 year and 11 months, he had episodes of cardiorespiratory arrest and died. In the literature review, we identified the clinical description of 44 patients with Nager syndrome. Among them, 93.1% had micrognathia, 38.6% cleft palate and 11.3% glossoptosis. Only one (2.3%) had all three features, as observed in the present patient. Therefore, despite the fact that the features of Pierre Robin sequence are common, there are few patients who have the complete triad. It is noteworthy, however, that they may be associated with respiratory distress, which may put the patient's life at risk.

  11. Dynamics and Control of DNA Sequence Amplification

    CERN Document Server

    Marimuthu, Karthikeyan

    2014-01-01

    DNA amplification is the process of replication of a specified DNA sequence \\emph{in vitro} through time-dependent manipulation of its external environment. A theoretical framework for determination of the optimal dynamic operating conditions of DNA amplification reactions, for any specified amplification objective, is presented based on first-principles biophysical modeling and control theory. Amplification of DNA is formulated as a problem in control theory with optimal solutions that can differ considerably from strategies typically used in practice. Using the Polymerase Chain Reaction (PCR) as an example, sequence-dependent biophysical models for DNA amplification are cast as control systems, wherein the dynamics of the reaction are controlled by a manipulated input variable. Using these control systems, we demonstrate that there exists an optimal temperature cycling strategy for geometric amplification of any DNA sequence and formulate optimal control problems that can be used to derive the optimal tempe...

  12. Bias of purine stretches in sequenced chromosomes

    DEFF Research Database (Denmark)

    Ussery, David; Soumpasis, Dikeos Mario; Brunak, Søren

    2002-01-01

    We examined more than 700 DNA sequences (full length chromosomes and plasmids) for stretches of purines (R) or pyrimidines (Y) and alternating YR stretches; such regions will likely adopt structures which are different from the canonical B-form. Since one turn of the DNA helix is roughly 10 bp, we...... measured the fraction of each genome which contains purine (or pyrimidine) tracts of lengths of 10 by or longer (hereafter referred to as 'purine tracts'), as well as stretches of alternating pyrimidines/purine ('pyr/pur tracts') of the same length. Using this criteria, a random sequence would be expected......, in eukaryotes there is an abundance of long stretches of purines or alternating purine/pyrimidine tracts, which cannot be explained in this way; these sequences are likely to play an important role in eukaryotic chromosome organisation....

  13. Degree sequences of k-multi-hypertournaments

    Institute of Scientific and Technical Information of China (English)

    Pirzada S

    2009-01-01

    Let n and k (n ≥ k > 1) be two non-negative integers. A k-multi-hypertournament on n vertices is a pair (V, A), where V is a set of vertices with |V|= n, and A is a set of k-tuples of vertices, called arcs, such that for any k-subset S of V, A contains at least one (at most k!) of the k! k-tuples whose entries belong to S. The necessary and sufficient conditions for a non-decreasing sequence of non-negative integers to be the out-degree sequence (in-degree sequence) of some k-multi-hypertournament are given.

  14. Constructing circuit codes by permuting initial sequences

    CERN Document Server

    Wynn, Ed

    2012-01-01

    Two new constructions are presented for coils and snakes in the hypercube. Improvements are made on the best known results for snake-in-the-box coils of dimensions 9, 10 and 11, and for some other circuit codes of dimensions between 8 and 13. In the first construction, circuit codes are generated from permuted copies of an initial transition sequence; the multiple copies constrain the search, so that long codes can be found relatively efficiently. In the second construction, two lower-dimensional paths are joined together with only one or two changes in the highest dimension; this requires a search for a permutation of the second sequence to fit around the first. It is possible to investigate sequences of vertices of the hypercube, including circuit codes, by connecting the corresponding vertices in an extended graph related to the hypercube. As an example of this, invertible circuit codes are briefly discussed.

  15. Developmental sequence of Cambrian embryo Markuelia

    Institute of Scientific and Technical Information of China (English)

    DONG XiPing

    2007-01-01

    Based on more exquisitely preserved specimens of Markuelia hunanensis recently recovered from Middle and Upper Cambrian in western Hunan and in the light of Synchrotron radiation X-ray tomographic microscopy, the developmental sequence from cleavage through organogenesis to the pre-hatching of Cambrian embryo Markuelia, especially the developmental sequence during the pre-hatching stage, i.e. from the earliest period when the scalids and tail spines only took shape to the latest period (just about hatching), is established. This developmental sequence provides a pattern of embryonic development during the pre-hatching stage, which has not been established in the living scalidophorans (priapulids, Ioriciferans and kinorhynchs). Thus, it not only enriches our knowledge on the embryonic development of the extant descendants of Markuelia, but also opens a new window to the evolution and development of the animal.

  16. The Recursion Theorem and Infinite Sequences

    CERN Document Server

    Miller, Arnold W

    2008-01-01

    In this paper we use the Recursion Theorem to show the existence of various infinite sequences and sets. Our main result is that there is an increasing sequence e_0, e_1, e_2 .. such that W_{e_n}={e_{n+1}} for every n. Similarly, we prove that there exists an increasing sequence such that W_{e_n}={e_{n+1},e_{n+2},...} for every n. We call a nonempty computably enumerable set A self-constructing if W_e=A for every e in A. We show that every nonempty computable enumerable set which is disjoint from an infinite computable set is one-one equivalent to a self-constructing set

  17. Microbial species delineation using whole genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos; Mukherjee, Supratim; Ivanova, Natalia; Mavrommatics, Kostas; Pati, Amrita; Konstantinidis, Konstantinos

    2014-10-20

    Species assignments in prokaryotes use a manual, poly-phasic approach utilizing both phenotypic traits and sequence information of phylogenetic marker genes. With thousands of genomes being sequenced every year, an automated, uniform and scalable approach exploiting the rich genomic information in whole genome sequences is desired, at least for the initial assignment of species to an organism. We have evaluated pairwise genome-wide Average Nucleotide Identity (gANI) values and alignment fractions (AFs) for nearly 13,000 genomes using our fast implementation of the computation, identifying robust and widely applicable hard cut-offs for species assignments based on AF and gANI. Using these cutoffs, we generated stable species-level clusters of organisms, which enabled the identification of several species mis-assignments and facilitated the assignment of species for organisms without species definitions.

  18. DNA sequencing by nanopores: advances and challenges

    Science.gov (United States)

    Agah, Shaghayegh; Zheng, Ming; Pasquali, Matteo; Kolomeisky, Anatoly B.

    2016-10-01

    Developing inexpensive and simple DNA sequencing methods capable of detecting entire genomes in short periods of time could revolutionize the world of medicine and technology. It will also lead to major advances in our understanding of fundamental biological processes. It has been shown that nanopores have the ability of single-molecule sensing of various biological molecules rapidly and at a low cost. This has stimulated significant experimental efforts in developing DNA sequencing techniques by utilizing biological and artificial nanopores. In this review, we discuss recent progress in the nanopore sequencing field with a focus on the nature of nanopores and on sensing mechanisms during the translocation. Current challenges and alternative methods are also discussed.

  19. Transforming clinical microbiology with bacterial genome sequencing.

    Science.gov (United States)

    Didelot, Xavier; Bowden, Rory; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W

    2012-09-01

    Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.

  20. TRAP Sequence - An Interesting Entity in Twins

    Directory of Open Access Journals (Sweden)

    R H Srinivas Prasad

    2012-01-01

    Full Text Available Twin reversed arterial perfusion (TRAP sequence, is a rare malformation occurring in monozygotic multiple gestations. One well-developed normal (pump twin and the other twin with absent cardiac structure (acardiac, who is hemodynamically dependent on the normal (pump twin are characteristic of this syndrome. The acardiac twin develops multiple anomalies that make survival difficult. The prognosis of the pump twin is variable with mortality rate ranging from 50% to 70%. Complications that affect the prognosis of the pump twin include complications of congestive cardiac failure due to increased cardiac demand, prematurity secondary to preterm delivery, and polyhydramnios. Because of these complications prompt detection, follow-up, and treatment of this condition is very important. We report two cases of TRAP sequence that emphasizes the importance of gray-scale and color Doppler imaging in diagnosis, detection of poor prognostic features, follow-up, and management of TRAP sequence.

  1. A statistical study of aftershock sequences

    Directory of Open Access Journals (Sweden)

    Giorgio Ranalli

    2010-02-01

    Full Text Available A comprehensive statistical study of the phenomenology of aftershock sequences is made in this paper. The spatial distribution of aftershocks indicates that they are mainly crustal events; however, deeper sequences also take place. The analysis of the distribution of aftershocks in 15 sequences with respect to time and magnitude leads to the statistical confirmation of a set of phenomenological laws describing the process, namely, the time-frequency law of hyperbolic decay of aftershock activity with time, the magnitude stability law, and the exponential magnitude- frequency distribution. The hypotheses involved are checked. The grouping of data and the statistical methods employed are chosen according to some basic well·confirmed assumptions regarding the nature of the process.

  2. TWO ASPECTS OF A GENERALIZED FIBONACCI SEQUENCE

    Directory of Open Access Journals (Sweden)

    Johan Matheus Tuwankotta

    2015-05-01

    Full Text Available In this paper we study the so-called generalized Fibonacci sequence: $x_{n+2} = \\alpha x_{n+1} + \\beta x_n, n\\in \\mathbb{N}$.  We derive an open domain around the origin of the parameter space where the sequence converges to $0$.  The limiting behavior on the boundary of this domain are: convergence to a nontrivial limit, $k$-periodic ($k\\in \\mathbb{N}$, or quasi-periodic.  We use the ratio of two consecutive terms of the sequence to construct a rational approximation for algebraic numbers of the form: $\\sqrt{r}, r\\in \\mathbb{Q}$.  Using a similar idea, we extend this to higher dimension to construct a rational approximation for  $\\sqrt[3]{ a + b\\sqrt{c}} +  \\sqrt[3]{ a - b\\sqrt{c}} + d$.

  3. Limits of zeros of polynomial sequences

    CERN Document Server

    Zhu, Xinyun

    2007-01-01

    In the present paper we consider $F_k(x)=x^{k}-\\sum_{t=0}^{k-1}x^t,$ the characteristic polynomial of the $k$-th order Fibonacci sequence, the latter denoted $G(k,l).$ We determine the limits of the real roots of certain odd and even degree polynomials related to the derivatives and integrals of $F_k(x),$ that form infinite sequences of polynomials, of increasing degree. In particular, as $k \\to \\infty,$ the limiting values of the zeros are determined, for both odd and even cases. It is also shown, in both cases, that the convergence is monotone for sufficiently large degree. We give an upper bound for the modulus of the complex zeros of the polynomials for each sequence. This gives a general solution related to problems considered by Dubeau 1989, 1993, Miles 1960, Flores 1967, Miller 1971 and later by the second author in the present paper, and Narayan 1997.

  4. Inferring interaction partners from protein sequences

    CERN Document Server

    Bitbol, Anne-Florence; Colwell, Lucy J; Wingreen, Ned S

    2016-01-01

    Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multi-protein complexes, and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners. Hence, the sequences of interacting partners are correlated. Here we exploit these correlations to accurately identify which proteins are specific interaction partners from sequence data alone. Our general approach, which employs a pairwise maximum entropy model to infer direct couplings between residues, has been successfully used to predict the three-dimensional structures of proteins from sequences. Building on this approach, we introduce an iterative algorithm to predict specific interaction partners from among the members of two protein families. We assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. The algorithm proves successful without any a pri...

  5. Dynamic programming algorithms for biological sequence comparison.

    Science.gov (United States)

    Pearson, W R; Miller, W

    1992-01-01

    Efficient dynamic programming algorithms are available for a broad class of protein and DNA sequence comparison problems. These algorithms require computer time proportional to the product of the lengths of the two sequences being compared [O(N2)] but require memory space proportional only to the sum of these lengths [O(N)]. Although the requirement for O(N2) time limits use of the algorithms to the largest computers when searching protein and DNA sequence databases, many other applications of these algorithms, such as calculation of distances for evolutionary trees and comparison of a new sequence to a library of sequence profiles, are well within the capabilities of desktop computers. In particular, the results of library searches with rapid searching programs, such as FASTA or BLAST, should be confirmed by performing a rigorous optimal alignment. Whereas rapid methods do not overlook significant sequence similarities, FASTA limits the number of gaps that can be inserted into an alignment, so that a rigorous alignment may extend the alignment substantially in some cases. BLAST does not allow gaps in the local regions that it reports; a calculation that allows gaps is very likely to extend the alignment substantially. Although a Monte Carlo evaluation of the statistical significance of a similarity score with a rigorous algorithm is much slower than the heuristic approach used by the RDF2 program, the dynamic programming approach should take less than 1 hr on a 386-based PC or desktop Unix workstation. For descriptive purposes, we have limited our discussion to methods for calculating similarity scores and distances that use gap penalties of the form g = rk. Nevertheless, programs for the more general case (g = q+rk) are readily available. Versions of these programs that run either on Unix workstations, IBM-PC class computers, or the Macintosh can be obtained from either of the authors.

  6. Bilateral maculopathy associated with Pierre Robin sequence.

    Science.gov (United States)

    Witmer, Matthew T; Vasan, Ryan; Levy, Richard; Davis, Jessica; Chan, R V Paul

    2012-08-01

    Pierre Robin sequence has been associated with a number of ocular complications, including myopia, strabismus, Möbius syndrome, nasolacrimal duct obstruction, glaucoma, cataract, microphthalmos, coloboma of choroid, and retinal detachment. We report a 10-day-old boy who presented with micrognathia, glossoptosis, and cleft palate as well as multiple congenital anomalies. Ophthalmic examination was notable for bilateral maculopathy, with focal areas of retinal and retinal pigment epithelial atrophy. The association of Pierre Robin sequence and maculopathy has been reported only twice previously.

  7. Limits of zeros of polynomial sequences

    OpenAIRE

    Zhu, Xinyun; Grossman, George

    2007-01-01

    In the present paper we consider $F_k(x)=x^{k}-\\sum_{t=0}^{k-1}x^t,$ the characteristic polynomial of the $k$-th order Fibonacci sequence, the latter denoted $G(k,l).$ We determine the limits of the real roots of certain odd and even degree polynomials related to the derivatives and integrals of $F_k(x),$ that form infinite sequences of polynomials, of increasing degree. In particular, as $k \\to \\infty,$ the limiting values of the zeros are determined, for both odd and even cases. It is also ...

  8. Characterizing C6+P2-graphic Sequences

    Institute of Scientific and Technical Information of China (English)

    HU Li-li

    2014-01-01

    For a given graph H, a graphic sequence π = (d1, d2, · · · , dn) is said to be potentially H-graphic if π has a realization containing H as a subgraph. In this paper, we characterize the potentially C6 +P2-graphic sequences where C6 +P2 denotes the graph obtained from C6 by adding two adjacent edges to the three pairwise nonadjacent vertices of C6. Moreover, we use the characterization to determine the value ofσ(C6+P2, n).

  9. Queen Charlotte 2001 Earthquake Aftershock Sequence

    Science.gov (United States)

    Mulder, T.; Rogers, G. C.

    2012-12-01

    On Oct 12, 2001, an Mw=6.3 earthquake occurred off the Queen Charlotte Islands, BC. It was felt throughout Haida Gwaii (Queen Charlotte Islands) and the adjoining mainland. It generated a small tsunami recorded on Vancouver Island tide gauges. Moment tensor solutions show almost pure thrust faulting. There was a significant aftershock sequence associated with this event. Relocation of the catalogue aftershock sequence with respect to a key calibration event with various subsets of common stations show significant movement in the event locations. The aftershocks define an ~30 degree dipping fault plane.

  10. The computational linguistics of biological sequences

    Energy Technology Data Exchange (ETDEWEB)

    Searls, D. [Univ. of Pennsylvania, Philadelphia, PA (United States)

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Protein sequences are analogous in many respects, particularly their folding behavior. Proteins have a much richer variety of interactions, but in theory the same linguistic principles could come to bear in describing dependencies between distant residues that arise by virtue of three-dimensional structure. This tutorial will concentrate on nucleic acid sequences.

  11. Digital image sequence processing, compression, and analysis

    CERN Document Server

    Reed, Todd R

    2004-01-01

    IntroductionTodd R. ReedCONTENT-BASED IMAGE SEQUENCE REPRESENTATIONPedro M. Q. Aguiar, Radu S. Jasinschi, José M. F. Moura, andCharnchai PluempitiwiriyawejTHE COMPUTATION OF MOTIONChristoph Stiller, Sören Kammel, Jan Horn, and Thao DangMOTION ANALYSIS AND DISPLACEMENT ESTIMATION IN THE FREQUENCY DOMAINLuca Lucchese and Guido Maria CortelazzoQUALITY OF SERVICE ASSESSMENT IN NEW GENERATION WIRELESS VIDEO COMMUNICATIONSGaetano GiuntaERROR CONCEALMENT IN DIGITAL VIDEOFrancesco G.B. De NataleIMAGE SEQUENCE RESTORATION: A WIDER PERSPECTIVEAnil KokaramVIDEO SUMMARIZATIONCuneyt M. Taskiran and Edward

  12. Complete genome sequence of arracacha mottle virus.

    Science.gov (United States)

    Orílio, Anelise F; Lucinda, Natalia; Dusi, André N; Nagata, Tatsuya; Inoue-Nagata, Alice K

    2013-01-01

    Arracacha mottle virus (AMoV) is the only potyvirus reported to infect arracacha (Arracacia xanthorrhiza) in Brazil. Here, the complete genome sequence of an isolate of AMoV was determined to be 9,630 nucleotides in length, excluding the 3' poly-A tail, and encoding a polyprotein of 3,135 amino acids and a putative P3N-PIPO protein. Its genomic organization is typical of a member of the genus Potyvirus, containing all conserved motifs. Its full genome sequence shared 56.2 % nucleotide identity with sunflower chlorotic mottle virus and verbena virus Y, the most closely related viruses.

  13. Sequence variability of Campylobacter temperate bacteriophages

    Directory of Open Access Journals (Sweden)

    Ng Lai-King

    2008-03-01

    Full Text Available Abstract Background Prophages integrated within the chromosomes of Campylobacter jejuni isolates have been demonstrated very recently. Prior work with Campylobacter temperate bacteriophages, as well as evidence from prophages in other enteric bacteria, suggests these prophages might have a role in the biology and virulence of the organism. However, very little is known about the genetic variability of Campylobacter prophages which, if present, could lead to differential phenotypes in isolates carrying the phages versus those that do not. As a first step in the characterization of C. jejuni prophages, we investigated the distribution of prophage DNA within a C. jejuni population assessed the DNA and protein sequence variability within a subset of the putative prophages found. Results Southern blotting of C. jejuni DNA using probes from genes within the three putative prophages of the C. jejuni sequenced strain RM 1221 demonstrated the presence of at least one prophage gene in a large proportion (27/35 of isolates tested. Of these, 15 were positive for 5 or more of the 7 Campylobacter Mu-like phage 1 (CMLP 1, also designated Campylobacter jejuni integrated element 1, or CJIE 1 genes tested. Twelve of these putative prophages were chosen for further analysis. DNA sequencing of a 9,000 to 11,000 nucleotide region of each prophage demonstrated a close homology with CMLP 1 in both gene order and nucleotide sequence. Structural and sequence variability, including short insertions, deletions, and allele replacements, were found within the prophage genomes, some of which would alter the protein products of the ORFs involved. No insertions of novel genes were detected within the sequenced regions. The 12 prophages and RM 1221 had a % G+C very similar to C. jejuni sequenced strains, as well as promoter regions characteristic of C. jejuni. None of the putative prophages were successfully induced and propagated, so it is not known if they were functional or

  14. Rates of Convergence of Recursively Defined Sequences

    DEFF Research Database (Denmark)

    Lambov, Branimir Zdravkov

    2005-01-01

    This paper gives a generalization of a result by Matiyasevich which gives explicit rates of convergence for monotone recursively defined sequences. The generalization is motivated by recent developments in fixed point theory and the search for applications of proof mining to the field. It relaxes...... the requirement for monotonicity to the form xn+1 ≤ (1+an)xn+bn where the parameter sequences have to be bounded in sum, and also provides means to treat computational errors. The paper also gives an example result, an application of proof mining to fixed point theory, that can be achieved by the means discussed...

  15. Variations on strongly lacunary quasi Cauchy sequences

    Science.gov (United States)

    Kaplan, Huseyin; Cakalli, Huseyin

    2016-08-01

    We introduce a new function space, namely the space of Nθ (p)-ward continuous functions, which turns out to be a closed subspace of the space of continuous functions for each positive integer p. Nθα(p ) -ward continuity is also introduced and investigated for any fixed 0 kr-1, kr], and θ = (kr) is a lacunary sequence, i.e. an increasing sequence of positive integers such that k0 ≠ 0, and hr: kr-kr-1 →∞.

  16. Proteomics-grade de novo sequencing approach

    DEFF Research Database (Denmark)

    Savitski, Mikhail M; Nielsen, Michael L; Kjeldsen, Frank

    2005-01-01

    known proteins, complete de novo sequencing of their peptides is desired. The main problems of conventional sequencing based on tandem mass spectrometry are incomplete backbone fragmentation and the frequent overlap of fragment masses. In this work, the first proteomics-grade de novo approach...... is presented, where the above problems are alleviated by the use of complementary fragmentation techniques CAD and ECD. Implementation of a high-current, large-area dispenser cathode as a source of low-energy electrons provided efficient ECD of doubly charged peptides, the most abundant species (65...

  17. Self-Matching Properties of Beatty Sequences

    Directory of Open Access Journals (Sweden)

    Z. Masáková

    2007-01-01

    Full Text Available We study the selfmatching properties of Beatty sequences, in particular of the graph of the function  ⌊ jβ ⌋ against j for every quadratic unit βϵ (0,1. We show that translation in the argument by an element Gi of a generalized Fibonacci sequence almost always causes the translation of the value of the function by Gi=1. More precisely, for fixed i ϵ ℕ, we have ⌊β(j+Gi⌋ = ⌊βj⌋ + Gi=1, where j ϵ Ui. We determine the set Ui of mismatches and show that it has a low frequency, namely βi.

  18. Sequence embedding for fast construction of guide trees for multiple sequence alignment

    LENUS (Irish Health Repository)

    Blackshields, Gordon

    2010-05-14

    Abstract Background The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http:\\/\\/www.clustal.org\\/mbed.tgz.

  19. A formula on linear complexity of highest coordinate sequences from maximal periodic sequences over Galois rings

    Institute of Scientific and Technical Information of China (English)

    HU Lei; SUN Nigang

    2006-01-01

    Using a polynomial expression of the highest coordinate map, we deduce an exact formula on the linear complexity of the highest coordinate sequence derived from a maximal periodic sequence over an arbitrary Galois ring of characteristic p2 , where p is a prime. This generalizes the known result of Udaya and Siddiqi for the case that the Galois ring is Z4.

  20. Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Miri eMichaeli

    2012-12-01

    Full Text Available High throughput sequencing (HTS yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner, a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier, a program for identifying legitimate and artifact insertions and/or deletions (indels. Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.

  1. From Sequence to Morphology - Long-Range Correlations in Complete Sequenced Genomes

    NARCIS (Netherlands)

    T.A. Knoch (Tobias)

    2004-01-01

    textabstractThe largely unresolved sequential organization, i.e. the relations within DNA sequences, and its connection to the three-dimensional organization of genomes was investigated by correlation analyses of completely sequenced chromosomes from Viroids, Archaea, Bacteria, Arabidopsis thali

  2. Finding the most significant common sequence and structure motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, L.J.; Stormo, G.D.

    1997-01-01

    We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints, In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections, The first part utilizes a simplified ve...

  3. Complete Genome Sequence of Streptococcus agalactiae CNCTC 10/84, a Hypervirulent Sequence Type 26 Strain

    OpenAIRE

    Hooven, Thomas A.; Randis, Tara M.; Daugherty, Sean C.; Narechania, Apurva; Planet, Paul J.; Tettelin, Hervé; Ratner, Adam J.

    2014-01-01

    Streptococcus agalactiae (group B Streptococcus [GBS]) is a human pathogen with a propensity to cause neonatal infections. We report the complete genome sequence of GBS strain CNCTC 10/84, a hypervirulent clinical isolate frequently used to study GBS pathogenesis. Comparative analysis of this sequence may shed light on novel pathogenic mechanisms.

  4. A blackberry (Rubus L.) expressed sequence tag library for the development of simple sequence repeat markers

    Science.gov (United States)

    A blackberry (Rubus L.) expressed sequence tag (EST) library was produced for developing simple sequence repeat (SSR) markers from the tetraploid blackberry cultivar, Merton Thornless, the source of the thornless trait in commercial cultivars. RNA was extracted from young expanding leaves and used f...

  5. Discours polemique, refutation et resolution des sequences conversationnelles (Argumentative Discourse, Refutation and Outcome of Conversational Sequences).

    Science.gov (United States)

    Moeschler, Jacques

    1981-01-01

    Analyzes the strategies employed in terminating conversational exchanges, with particular attention to argumentative sequences. Examines the features that distinguish these sequences from those that have a transactional character, and discusses the patterns of verbal interaction attendant to negative responses. Societe Nouvelle Didier Erudition,…

  6. Vector sequences - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us ...od - Number of data entries 7 entries - Joomla SEF URLs by Artio About This Database Database Description Download License Update His...tory of This Database Site Policy | Contact Us Vector sequences - Budding yeast cDNA sequencing project | LSDB Archive ...

  7. Sequence embedding for fast construction of guide trees for multiple sequence alignment

    Directory of Open Access Journals (Sweden)

    Wilm Andreas

    2010-05-01

    Full Text Available Abstract Background The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http://www.clustal.org/mbed.tgz.

  8. ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors

    OpenAIRE

    2009-01-01

    This article presents the design of a sequence-based predictor named ProteDNA for identifying the sequence-specific binding residues in a transcription factor (TF). Concerning protein–DNA interactions, there are two types of binding mechanisms involved, namely sequence-specific binding and nonspecific binding. Sequence-specific bindings occur between protein sidechains and nucleotide bases and correspond to sequence-specific recognition of genes. Therefore, sequence-specific bindings are esse...

  9. Sequence Length Limits for Controlling False Positives in Discovering Nucleotide Sequence Motifs

    Institute of Scientific and Technical Information of China (English)

    CHEN Lei; QiAN Zi-liang

    2008-01-01

    In the study of motif discovery, especially the transcription factor DNA binding sites discovery, a too long input sequence would return non-informative motifs rather than those biological functional motifs. This paper gave theoretical analyses and computational experiments to suggest the length limits of the input sequence. When the sequence length exceeds a certain critical point, the probability of discovering the motif decreases sharply. The work not only gave an explanation on the unsatisfying results of the existed motif discovery problems that the input sequence length might be too long and exceed the point, but also provided an estimation of input sequence length we should accept to get more meaningful and reliable results in motif discovery.

  10. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences.

    Science.gov (United States)

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D; Adir, Noam

    2016-06-28

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel.

  11. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

    Science.gov (United States)

    Martin, Andrew C R

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.

  12. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Directory of Open Access Journals (Sweden)

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  13. The learning of two similar complex movement sequences: does practice insulate a sequence from interference?

    Science.gov (United States)

    Panzer, Stefan; Shea, Charles H

    2008-12-01

    Panzer et al. [Panzer, S., Wilde, H., & Shea, C. H. (2006). The learning of two similar complex movement sequences: Proactive and retroactive effects on learning. Journal of Motor Behavior, 38, 60-70] found evidence to indicate that the memory state(s) underpinning the production of a movement sequence that was practiced for one day was essentially "overwritten" when another similar sequence was subsequently practiced on the next day. An interference paradigm was used to determine if additional practice on the first sequence would insulate it from retroactive interference arising from learning a new similar sequence. Participants produced the sequences by moving a lever with their right arm/hand to sequentially presented target locations. The experimental group practiced one 16-element movement sequence (S1) for two consecutive days. A second 16-element sequence (S2) was practiced on Day 3. The sequence practiced on Day 3 was created by switching the positions of 2 of 16 elements in the sequence practiced on the first day. Control groups received either two days of practice on S1 or one day of practice on S2. Contrary to our earlier findings (Panzer, Wilde, & Shea, 2006) of strong retroactive interference when S1 was only practiced for one day, we found no evidence of retroactive interference when S1 was practiced for two days prior to the switch to S2 practice. Interestingly, but also contrary to our earlier findings, we found the learning of S2 was facilitated by the prior practice of S1. This proactive facilitation was observed in S2 acquisition and on the S2 retention test.

  14. Monitoring Error Rates In Illumina Sequencing

    Science.gov (United States)

    Manley, Leigh J.; Ma, Duanduan; Levine, Stuart S.

    2016-01-01

    Guaranteeing high-quality next-generation sequencing data in a rapidly changing environment is an ongoing challenge. The introduction of the Illumina NextSeq 500 and the depreciation of specific metrics from Illumina's Sequencing Analysis Viewer (SAV; Illumina, San Diego, CA, USA) have made it more difficult to determine directly the baseline error rate of sequencing runs. To improve our ability to measure base quality, we have created an open-source tool to construct the Percent Perfect Reads (PPR) plot, previously provided by the Illumina sequencers. The PPR program is compatible with HiSeq 2000/2500, MiSeq, and NextSeq 500 instruments and provides an alternative to Illumina's quality value (Q) scores for determining run quality. Whereas Q scores are representative of run quality, they are often overestimated and are sourced from different look-up tables for each platform. The PPR’s unique capabilities as a cross-instrument comparison device, as a troubleshooting tool, and as a tool for monitoring instrument performance can provide an increase in clarity over SAV metrics that is often crucial for maintaining instrument health. These capabilities are highlighted. PMID:27672352

  15. What Makes a Protein Sequence a Prion?

    Science.gov (United States)

    Sabate, Raimon; Rousseau, Frederic; Schymkowitz, Joost; Ventura, Salvador

    2015-01-01

    Typical amyloid diseases such as Alzheimer's and Parkinson's were thought to exclusively result from de novo aggregation, but recently it was shown that amyloids formed in one cell can cross-seed aggregation in other cells, following a prion-like mechanism. Despite the large experimental effort devoted to understanding the phenomenon of prion transmissibility, it is still poorly understood how this property is encoded in the primary sequence. In many cases, prion structural conversion is driven by the presence of relatively large glutamine/asparagine (Q/N) enriched segments. Several studies suggest that it is the amino acid composition of these regions rather than their specific sequence that accounts for their priogenicity. However, our analysis indicates that it is instead the presence and potency of specific short amyloid-prone sequences that occur within intrinsically disordered Q/N-rich regions that determine their prion behaviour, modulated by the structural and compositional context. This provides a basis for the accurate identification and evaluation of prion candidate sequences in proteomes in the context of a unified framework for amyloid formation and prion propagation. PMID:25569335

  16. Time sequence photography of Roosters Comb

    Science.gov (United States)

    The importance of understanding natural landscape changes is key in properly determining rangeland ecology. Time sequence photography allows a landscape snapshot to be documented and enables the ability to compare natural changes overtime. Photographs of Roosters Comb were taken from the same vantag...

  17. Draft Genome Sequence of Lactobacillus rhamnosus 2166.

    OpenAIRE

    Karlyshev, Andrey V.; Melnikov, Vyacheslav G.; Kosarev, Igor V.; Abramov, Vyacheslav M.

    2014-01-01

    In this report, we present a draft sequence of the genome of Lactobacillus rhamnosus strain 2166, a potential novel probiotic. Genome annotation and read mapping onto a reference genome of L. rhamnosus strain GG allowed for the identification of the differences and similarities in the genomic contents and gene arrangements of these strains.

  18. Deciphering the RNA landscape by RNAome sequencing

    NARCIS (Netherlands)

    K.W.J. Derks (Kasper); B. Misovic (Branislav); M.C.G.N. van den hout (Mirjam); C. Kockx (Christel); C.P. Gomez (Cesar Payan); R.W.W. Brouwer (Rutger); H. Vrieling (Harry); J.H.J. Hoeijmakers (Jan); W.F.J. van IJcken (Wilfred); J. Pothof (Joris)

    2015-01-01

    textabstractCurrent RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species in an unperturbed manner. We report strand-specific RNAome sequencing that determines expression of small and large RNAs from rRNA-depleted total RNA in a sin

  19. Getting started in mapping-by-sequencing

    Institute of Scientific and Technical Information of China (English)

    Héctor Candela; Rubén Casanova-Sáez; José Luis Micol

    2015-01-01

    Next-generation sequencing (NGS) technologies al ow the cost-effective sequencing of whole genomes and have expanded the scope of genomics to novel applications, such as the genome-wide characterization of intraspecific polymorphisms and the rapid mapping and identification of point mutations. Next-generation sequencing platforms, such as the Il umina HiSeq2000 platform, are now commercial y available at affordable prices and routinely produce an enormous amount of sequence data, but their wide use is often hindered by a lack of knowledge on how to manipulate and process the information produced. In this review, we focus on the strategies that are available to geneticists who wish to incorporate these novel approaches into their research but who are not familiar with the necessary bioinformatic concepts and computational tools. In particular, we comprehensively summarize case studies where the use of NGS technologies has led to the identification of point mutations, a strategy that has been dubbed“mapping-by-sequencing”, and review examples from plants and other model species such as Caenorhabditis elegans, Saccharomyces cerevisiae, and Drosophila mela-nogaster. As these technologies are becoming cheaper and more powerful, their use is also expanding to al ow mutation identification in species with larger genomes, such as many crop plants.

  20. Generalized Mignotte Sequences and Software Watermarking

    Directory of Open Access Journals (Sweden)

    Hedley Morris

    2009-12-01

    Full Text Available Dynamic Graph Watermarking (DGW is vulnerable to attack. We propose the use of a multiple secret sharing scheme to improve the robustness of DGW. We propose using Mignotte Sequences as the way to create the watermark shares needed by the method. The new scheme is suitable for inclusion in the SandMark test program. All necessary algorithms are identified.

  1. Revisiting the Montessori Elementary Biology Sequence.

    Science.gov (United States)

    Scott, Judy; Lanaro, Pamela

    1994-01-01

    Suggests a revised Montessori elementary biology sequence based on the new five-kingdom model. In keeping with Montessori principles that move the learner from the whole to the parts and from the simple to the complex, offers a proposed outline of instruction for the lower elementary (6-9) level and for the upper elementary (9-12) level. (TJQ)

  2. The Ultramafites and layered Gabbro Sequences

    NARCIS (Netherlands)

    Oosterom, M.G.

    1963-01-01

    On Stjernöy, Seiland and the neighbouring peninsulas of Öksfjord and Bergsfjord ultramafic bodies of peridotite and pyroxenite with associated layered gabbro sequences occur within a complex of highly metamorphic gabbro gneisses, rocks akin to pyroxene-granulites and mafic charnockites. As is shown

  3. Recalling visual serial order for verbal sequences

    NARCIS (Netherlands)

    Logie, R.H.; Saito, S.; Morita, A.; Varma, S.; Norris, D.

    2016-01-01

    We report three experiments in which participants performed written serial recall of visually presented verbal sequences with items varying in visual similarity. In Experiments 1 and 2 native speakers of Japanese recalled visually presented Japanese Kanji characters. In Experiment 3, native speakers

  4. Galaxy LIMS for next-generation sequencing

    NARCIS (Netherlands)

    Scholtalbers, J.; Rossler, J.; Sorn, P.; Graaf, J. de; Boisguerin, V.; Castle, J.; Sahin, U.

    2013-01-01

    SUMMARY: We have developed a laboratory information management system (LIMS) for a next-generation sequencing (NGS) laboratory within the existing Galaxy platform. The system provides lab technicians standard and customizable sample information forms, barcoded submission forms, tracking of input sam

  5. Artificial Intelligence Controls Tape-Recording Sequence

    Science.gov (United States)

    Schwuttke, Ursula M.; Otamura, Roy M.; Zottarelli, Lawrence J.

    1989-01-01

    Developmental expert-system computer program intended to schedule recording of large amounts of data on limited amount of magnetic tape. Schedules recording using two sets of rules. First set incorporates knowledge of locations for recording of new data. Second set incorporates knowledge about issuing commands to recorder. Designed primarily for use on Voyager Spacecraft, also applicable to planning and sequencing in industry.

  6. Inferring interaction partners from protein sequences

    Science.gov (United States)

    Bitbol, Anne-Florence; Dwyer, Robert S.; Colwell, Lucy J.; Wingreen, Ned S.

    2016-01-01

    Specific protein−protein interactions are crucial in the cell, both to ensure the formation and stability of multiprotein complexes and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners, causing their sequences to be correlated. Here we exploit these correlations to accurately identify, from sequence data alone, which proteins are specific interaction partners. Our general approach, which employs a pairwise maximum entropy model to infer couplings between residues, has been successfully used to predict the 3D structures of proteins from sequences. Thus inspired, we introduce an iterative algorithm to predict specific interaction partners from two protein families whose members are known to interact. We first assess the algorithm’s performance on histidine kinases and response regulators from bacterial two-component signaling systems. We obtain a striking 0.93 true positive fraction on our complete dataset without any a priori knowledge of interaction partners, and we uncover the origin of this success. We then apply the algorithm to proteins from ATP-binding cassette (ABC) transporter complexes, and obtain accurate predictions in these systems as well. Finally, we present two metrics that accurately distinguish interacting protein families from noninteracting ones, using only sequence data. PMID:27663738

  7. Naming an indiscernible sequence in NIP theories

    CERN Document Server

    Chernikov, Artem

    2009-01-01

    In this short note we show that if we add predicate for a dense complete indiscernible sequence in a dependent theory then the result is still dependent. This answers a question of Baldwin and Benedikt and implies that every unstable dependent theory has a dependent expansion interpreting linear order.

  8. Completely irreducible sequences of meromorphic functions

    Institute of Scientific and Technical Information of China (English)

    NEVO; Shahar; ZALCMAN; Lawrence

    2010-01-01

    Given a sequence {fn } of meromorphic functions on a plane domain D, there exists a (possibly empty) open set U■D and a subsequence {f n k } which converges uniformly (with respect to the spherical metric on ) on compact subsets of U, while no subsequence of {f n k } converges uniformly on compact subsets of any larger open subset of D.

  9. Genetic Diagnosis through Whole-Exome Sequencing

    NARCIS (Netherlands)

    van der Zwaag, Paul A.; Jongbloed, Jan D. H.; van Tintelen, J. Peter

    2014-01-01

    To the Editor: Yang et al. (Oct. 17 issue)(1) report the application of whole-exome sequencing in 250 patients with a potentially genetic disease, which resulted in a molecular diagnosis in 25% of them. A total of 30 patients had medically actionable incidental findings in a total of 16 genes; 18 of

  10. TRANSPORTATION INEQUALITIES FOR WEAKLY DEPENDENT SEQUENCES

    Institute of Scientific and Technical Information of China (English)

    Ma Yutao

    2011-01-01

    In[3],they gave necessary and sufficient condition for T1C and then as applications T1C for weakly dependent sequences was established.In this note,based on Gozlan-Léonard characterization for W1H-inequalities,we extends this result to W1Hinequalities.

  11. Estimation of evolutionary distances between nucleotide sequences.

    Science.gov (United States)

    Zharkikh, A

    1994-09-01

    A formal mathematical analysis of the substitution process in nucleotide sequence evolution was done in terms of the Markov process. By using matrix algebra theory, the theoretical foundation of Barry and Hartigan's (Stat. Sci. 2:191-210, 1987) and Lanave et al.'s (J. Mol. Evol. 20:86-93, 1984) methods was provided. Extensive computer simulation was used to compare the accuracy and effectiveness of various methods for estimating the evolutionary distance between two nucleotide sequences. It was shown that the multiparameter methods of Lanave et al.'s (J. Mol. Evol. 20:86-93, 1984), Gojobori et al.'s (J. Mol. Evol. 18:414-422, 1982), and Barry and Hartigan's (Stat. Sci. 2:191-210, 1987) are preferable to others for the purpose of phylogenetic analysis when the sequences are long. However, when sequences are short and the evolutionary distance is large, Tajima and Nei's (Mol. Biol. Evol. 1:269-285, 1984) method is superior to others.

  12. An oligonucleotide hybridization approach to DNA sequencing.

    Science.gov (United States)

    Khrapko, K R; Lysov YuP; Khorlyn, A A; Shick, V V; Florentiev, V L; Mirzabekov, A D

    1989-10-09

    We have proposed a DNA sequencing method based on hybridization of a DNA fragment to be sequenced with the complete set of fixed-length oligonucleotides (e.g., 4(8) = 65,536 possible 8-mers) immobilized individually as dots of a 2-D matrix [(1989) Dokl. Akad. Nauk SSSR 303, 1508-1511]. It was shown that the list of hybridizing octanucleotides is sufficient for the computer-assisted reconstruction of the structures for 80% of random-sequence fragments up to 200 bases long, based on the analysis of the octanucleotide overlapping. Here a refinement of the method and some experimental data are presented. We have performed hybridizations with oligonucleotides immobilized on a glass plate, and obtained their dissociation curves down to heptanucleotides. Other approaches, e.g., an additional hybridization of short oligonucleotides which continuously extend duplexes formed between the fragment and immobilized oligonucleotides, should considerably increase either the probability of unambiguous reconstruction, or the length of reconstructed sequences, or decrease the size of immobilized oligonucleotides.

  13. Mitochondrial DNA sequence evolution in shorebird populations.

    NARCIS (Netherlands)

    Wenink, P.W.

    1994-01-01

    This thesis describes the global molecular population structure of two shorebird species, in particular of the dunlin, Calidris alpina, by means of comparative sequence analysis of the most variable part of the mitochondrial DNA (mtDNA) genome. There are several reasons why mtDNA is the molecule of

  14. Using "Arabidopsis" Genetic Sequences to Teach Bioinformatics

    Science.gov (United States)

    Zhang, Xiaorong

    2009-01-01

    This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR…

  15. An Overview of Multiple Sequence Alignment Systems

    CERN Document Server

    Saeed, Fahad

    2009-01-01

    An overview of current multiple alignment systems to date are described.The useful algorithms, the procedures adopted and their limitations are presented.We also present the quality of the alignments obtained and in which cases(kind of alignments, kind of sequences etc) the particular systems are useful.

  16. Linearly recursive sequences and Dynkin diagrams

    CERN Document Server

    Reutenauer, Christophe

    2012-01-01

    Motivated by a construction in the theory of cluster algebras (Fomin and Zelevinsky), one associates to each acyclic directed graph a family of sequences of natural integers, one for each vertex; this construction is called a {\\em frieze}; these sequences are given by nonlinear recursions (with division), and the fact that they are integers is a consequence of the Laurent phenomenon of Fomin and Zelevinsky. If the sequences satisfy a linear recursion with constant coefficients, then the graph must be a Dynkin diagram or an extended Dynkin diagram, with an acyclic orientation. The converse also holds: the sequences of the frieze associated to an oriented Dynkin or Euclidean diagram satisfy linear recursions, and are even $\\mathbb N$-rational. One uses in the proof objects called $SL_2$-{\\em tilings of the plane}, which are fillings of the discrete plane such that each adjacent 2 by 2 minor is equal to 1. These objects, which have applications in the theory of cluster algebras, are interesting for themselves. S...

  17. Sublinear Time Motif Discovery from Multiple Sequences

    Directory of Open Access Journals (Sweden)

    Yunhui Fu

    2013-10-01

    Full Text Available In this paper, a natural probabilistic model for motif discovery has been used to experimentally test the quality of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet, Σ. A motif G = g1g2 ... gm is a string of m characters. In each background sequence is implanted a probabilistically-generated approximate copy of G. For a probabilistically-generated approximate copy b1b2 ... bm of G, every character, bi, is probabilistically generated, such that the probability for bi ≠ gi is at most α. We develop two new randomized algorithms and one new deterministic algorithm. They make advancements in the following aspects: (1 The algorithms are much faster than those before. Our algorithms can even run in sublinear time. (2 They can handle any motif pattern. (3 The restriction for the alphabet size is a lower bound of four. This gives them potential applications in practical problems, since gene sequences have an alphabet size of four. (4 All algorithms have rigorous proofs about their performances. The methods developed in this paper have been used in the software implementation. We observed some encouraging results that show improved performance for motif detection compared with other software.

  18. Probabilistic error correction for RNA sequencing.

    Science.gov (United States)

    Le, Hai-Son; Schulz, Marcel H; McCauley, Brenna M; Hinman, Veronica F; Bar-Joseph, Ziv

    2013-05-01

    Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not available. Current read error correction methods, developed for DNA sequence data, cannot handle the overlapping effects of non-uniform abundance, polymorphisms and alternative splicing. Here we present SEquencing Error CorrEction in Rna-seq data (SEECER), a hidden Markov Model (HMM)-based method, which is the first to successfully address these problems. SEECER efficiently learns hundreds of thousands of HMMs and uses these to correct sequencing errors. Using human RNA-Seq data, we show that SEECER greatly improves on previous methods in terms of quality of read alignment to the genome and assembly accuracy. To illustrate the usefulness of SEECER for de novo transcriptome studies, we generated new RNA-Seq data to study the development of the sea cucumber Parastichopus parvimensis. Our corrected assembled transcripts shed new light on two important stages in sea cucumber development. Comparison of the assembled transcripts to known transcripts in other species has also revealed novel transcripts that are unique to sea cucumber, some of which we have experimentally validated. Supporting website: http://sb.cs.cmu.edu/seecer/.

  19. Construction of \\mu-normal sequences

    OpenAIRE

    Madritsch, Manfred G.; Mance, Bill

    2012-01-01

    In the present paper we extend Champernowne's construction of normal numbers to provide sequences which are generic for a given invariant probability measure, which need not be the maximal one. We present a construction together with estimates and examples for normal numbers with respect to L\\"uroth series expansion, continued fractions expansion or $\\beta$-expansion.

  20. Complex Convexity of Orlicz Modular Sequence Spaces

    Directory of Open Access Journals (Sweden)

    Lili Chen

    2016-01-01

    Full Text Available The concepts of complex extreme points, complex strongly extreme points, complex strict convexity, and complex midpoint locally uniform convexity in general modular spaces are introduced. Then we prove that, for any Orlicz modular sequence space lΦ,ρ, lΦ,ρ is complex midpoint locally uniformly convex. As a corollary, lΦ,ρ is also complex strictly convex.

  1. Sequence Semantics for Dynamic Predicate Logic

    NARCIS (Netherlands)

    Vermeulen, C.F.M.

    2008-01-01

    In this paper a semantics for dynamic predicate logic is developed that uses sequence valued assignments. This semantics is compared with the usual relational semantics for dynamic predicate logic: it is shown that the most important intuitions of the usual semantics are preserved. Then it is shown

  2. On VC-density over indiscernible sequences

    OpenAIRE

    Guingona, Vincent; Hill, Cameron Donnay

    2011-01-01

    In this paper, we study VC-density over indiscernible sequences (denoted VC_ind-density). We answer an open question in [1], showing that VC_ind-density is always integer valued. We also show that VC_ind-density and dp-rank coincide in the natural way.

  3. Sorghum genome sequencing by methylation filtration.

    Directory of Open Access Journals (Sweden)

    Joseph A Bedell

    2005-01-01

    Full Text Available Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis.

  4. Genome sequence of Lactobacillus farciminis KCTC 3681.

    Science.gov (United States)

    Nam, Seong-Hyeuk; Choi, Sang-Haeng; Kang, Aram; Kim, Dong-Wook; Kim, Ryong Nam; Kim, Aeri; Kim, Dae-Soo; Park, Hong-Seog

    2011-04-01

    Lactobacillus farciminis is one of the most prevalent lactic acid bacterial species present during the manufacturing process of kimchi, the best-known traditional Korean dish. Here, we present the draft genome sequence of the type strain Lactobacillus farciminis KCTC 3681 (2,498,309 bp, with a G+C content of 36.4%), which consists of 5 scaffolds.

  5. The Porosity of Additive Noise Sequences

    CERN Document Server

    Misra, Vinith

    2012-01-01

    Consider a binary additive noise channel with noiseless feedback. When the noise is a stationary and ergodic process $\\mathbf{Z}$, the capacity is $1-\\mathbb{H}(\\mathbf{Z})$ ($\\mathbb{H}(\\cdot)$ denoting the entropy rate). It is shown analogously that when the noise is a deterministic sequence $z^\\infty$, the capacity under finite-state encoding and decoding is $1-\\bar{\\rho}(z^\\infty)$, where $\\bar{\\rho}(\\cdot)$ is Lempel and Ziv's finite-state compressibility. This quantity is termed the \\emph{porosity} $\\underline{\\sigma}(\\cdot)$ of an individual noise sequence. A sequence of schemes are presented that universally achieve porosity for any noise sequence. These converse and achievability results may be interpreted both as a channel-coding counterpart to Ziv and Lempel's work in universal source coding, as well as an extension to the work by Lomnitz and Feder and Shayevitz and Feder on communication across modulo-additive channels. Additionally, a slightly more practical architecture is suggested that draws a...

  6. A new DNA sequence assembly program.

    Science.gov (United States)

    Bonfield, J K; Smith, K f; Staden, R

    1995-01-01

    We describe the Genome Assembly Program (GAP), a new program for DNA sequence assembly. The program is suitable for large and small projects, a variety of strategies and can handle data from a range of sequencing instruments. It retains the useful components of our previous work, but includes many novel ideas and methods. Many of these methods have been made possible by the program's completely new, and highly interactive, graphical user interface. The program provides many visual clues to the current state of a sequencing project and allows users to interact in intuitive and graphical ways with their data. The program has tools to display and manipulate the various types of data that help to solve and check difficult assemblies, particularly those in repetitive genomes. We have introduced the following new displays: the Contig Selector, the Contig Comparator, the Template Display, the Restriction Enzyme Map and the Stop Codon Map. We have also made it possible to have any number of Contig Editors and Contig Joining Editors running simultaneously even on the same contig. The program also includes a new 'Directed Assembly' algorithm and routines for automatically detecting unfinished segments of sequence, to which it suggests experimental solutions. Images PMID:8559656

  7. Mechanisms of protein sequence divergence and incompatibility.

    Directory of Open Access Journals (Sweden)

    Alon Wellner

    Full Text Available Alignments of orthologous protein sequences convey a complex picture. Some positions are utterly conserved whilst others have diverged to variable degrees. Amongst the latter, many are non-exchangeable between extant sequences. How do functionally critical and highly conserved residues diverge? Why and how did these exchanges become incompatible within contemporary sequences? Our model is phosphoglycerate kinase (PGK, where lysine 219 is an essential active-site residue completely conserved throughout Eukaryota and Bacteria, and serine is found only in archaeal PGKs. Contemporary sequences tested exhibited complete loss of function upon exchanges at 219. However, a directed evolution experiment revealed that two mutations were sufficient for human PGK to become functional with serine at position 219. These two mutations made position 219 permissive not only for serine and lysine, but also to a range of other amino acids seen in archaeal PGKs. The identified trajectories that enabled exchanges at 219 show marked sign epistasis - a relatively small loss of function with respect to one amino acid (lysine versus a large gain with another (serine, and other amino acids. Our findings support the view that, as theoretically described, the trajectories underlining the divergence of critical positions are dominated by sign epistatic interactions. Such trajectories are an outcome of rare mutational combinations. Nonetheless, as suggested by the laboratory enabled K219S exchange, given enough time and variability in selection levels, even utterly conserved and functionally essential residues may change.

  8. Resolution enhancement of color video sequences.

    Science.gov (United States)

    Shah, N R; Zakhor, A

    1999-01-01

    We propose a new multiframe algorithm to enhance the spatial resolution of frames in video sequences. Our technique specifically accounts for the possibility that motion estimation will be inaccurate and compensates for these inaccuracies. Experiments show that our multiframe enhancement algorithm yields perceptibly sharper enhanced images with significant signal-to-noise ratio (SNR) improvement over bilinear and cubic B-spline interpolation.

  9. Automated Testing with Targeted Event Sequence Generation

    DEFF Research Database (Denmark)

    Jensen, Casper Svenning; Prasad, Mukul R.; Møller, Anders

    2013-01-01

    of the individual event handlers of the application. The second phase builds event sequences backward from the target, using the summaries together with a UI model of the application. Our experiments on a collection of open source Android applications show that this technique can successfully produce event...

  10. Sequences of Closed Operators and Correctness

    Directory of Open Access Journals (Sweden)

    Sabra Ramadan

    2010-01-01

    Full Text Available In applications and in mathematical physics equations it is very important for mathematical models corresponding to the given problem to be correct given. In this research we will study the relationship between the sequence of closed operators An→A and the correctness of the equation Ax = y. Also we will introduce the criterion for correctness.

  11. The Pizza Problem: A Solution with Sequences

    Science.gov (United States)

    Shafer, Kathryn G.; Mast, Caleb J.

    2008-01-01

    This article addresses the issues of coaching and assessing. A preservice middle school teacher's unique solution to the Pizza problem was not what the professor expected. The student's solution strategy, based on sequences and a reinvention of Pascal's triangle, is explained in detail. (Contains 8 figures.)

  12. Sorghum genome sequencing by methylation filtration.

    Science.gov (United States)

    Bedell, Joseph A; Budiman, Muhammad A; Nunberg, Andrew; Citek, Robert W; Robbins, Dan; Jones, Joshua; Flick, Elizabeth; Rholfing, Theresa; Fries, Jason; Bradford, Kourtney; McMenamy, Jennifer; Smith, Michael; Holeman, Heather; Roe, Bruce A; Wiley, Graham; Korf, Ian F; Rabinowicz, Pablo D; Lakey, Nathan; McCombie, W Richard; Jeddeloh, Jeffrey A; Martienssen, Robert A

    2005-01-01

    Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis.

  13. Fetal Kidney Anomalies: Next Generation Sequencing

    DEFF Research Database (Denmark)

    Rasmussen, Maria; Sunde, Lone; Nielsen, Marlene Louise

    Aim and Introduction Identification of abnormal kidneys in the fetus may lead to termination of the pregnancy and raises questions about the underlying cause and recurrence risk in future pregnancies. In this study, we investigate the effectiveness of targeted next generation sequencing in fetuses...

  14. Local Renyi entropic profiles of DNA sequences

    Directory of Open Access Journals (Sweden)

    Vinga Susana

    2007-10-01

    Full Text Available Abstract Background In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM. Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/. Conclusion The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.

  15. Properties of aftershock sequences in southern California

    Science.gov (United States)

    Kisslinger, Carl; Jones, Lucile M.

    1991-07-01

    The temporal behavior of 39 aftershock sequences in southern California, 1933-1988, was modeled by the modified Omori relation. Minimum magnitudes for completeness of each sequence catalog were determined, and the maximum likelihood estimates of the parameters K, p, and c, with the standard errors on each, were determined by the Ogata algorithm. The b value of each sequence was also calculated. Many of the active faults in the region, both strike slip and thrust, were sampled. The p values were graded in terms of the size of the standard error relative to the p value itself. Most of the sequences were modeled well by the Omori relation. Many of the sequences had p values close to the mean of the whole data set, 1.11±0.25, but values significantly different from the mean, as low as 0.7 and as high as 1.8, exist. No correlation of p with either the b value of the sequence or the mainshock magnitude was found. The results suggest a direct correlation of p values is with surface heat flow, with high values in the Salton Trough (high heat flow) and one low value in the San Bernardino Mountains and on the edge of the Ventura Basin (both low heat flow). The large fraction of the sequences with p values near the mean are at locations where the heat flow is near the regional mean, 74 mW/m2. If the hypothesis that aftershock decay rate is controlled by temperature at depth is valid, the effects of other factors such as heterogeneity of the fault zone properties are superimposed on the background rate determined by temperature. Surface heat flow is taken as an indicator of crustal temperature at hypocentral depths, but the effects on heat flow of convective heat transport and variations in near-surface thermal conductivity invalidate any simple association of local variations in heat flow with details of the subsurface temperature distribution. The interpretation is that higher temperatures in the aftershock source volume caused shortened stress relaxation times in the fault

  16. Evaluation measures of multiple sequence alignments.

    Science.gov (United States)

    Gonnet, G H; Korostensky, C; Benner, S

    2000-01-01

    Multiple sequence alignments (MSAs) are frequently used in the study of families of protein sequences or DNA/RNA sequences. They are a fundamental tool for the understanding of the structure, functionality and, ultimately, the evolution of proteins. A new algorithm, the Circular Sum (CS) method, is presented for formally evaluating the quality of an MSA. It is based on the use of a solution to the Traveling Salesman Problem, which identifies a circular tour through an evolutionary tree connecting the sequences in a protein family. With this approach, the calculation of an evolutionary tree and the errors that it would introduce can be avoided altogether. The algorithm gives an upper bound, the best score that can possibly be achieved by any MSA for a given set of protein sequences. Alternatively, if presented with a specific MSA, the algorithm provides a formal score for the MSA, which serves as an absolute measure of the quality of the MSA. The CS measure yields a direct connection between an MSA and the associated evolutionary tree. The measure can be used as a tool for evaluating different methods for producing MSAs. A brief example of the last application is provided. Because it weights all evolutionary events on a tree identically, but does not require the reconstruction of a tree, the CS algorithm has advantages over the frequently used sum-of-pairs measures for scoring MSAs, which weight some evolutionary events more strongly than others. Compared to other weighted sum-of-pairs measures, it has the advantage that no evolutionary tree must be constructed, because we can find a circular tour without knowing the tree.

  17. Sequence determinants in human polyadenylation site selection

    Directory of Open Access Journals (Sweden)

    Gautheret Daniel

    2003-02-01

    Full Text Available Abstract Background Differential polyadenylation is a widespread mechanism in higher eukaryotes producing mRNAs with different 3' ends in different contexts. This involves several alternative polyadenylation sites in the 3' UTR, each with its specific strength. Here, we analyze the vicinity of human polyadenylation signals in search of patterns that would help discriminate strong and weak polyadenylation sites, or true sites from randomly occurring signals. Results We used human genomic sequences to retrieve the region downstream of polyadenylation signals, usually absent from cDNA or mRNA databases. Analyzing 4956 EST-validated polyadenylation sites and their -300/+300 nt flanking regions, we clearly visualized the upstream (USE and downstream (DSE sequence elements, both characterized by U-rich (not GU-rich segments. The presence of a USE and a DSE is the main feature distinguishing true polyadenylation sites from randomly occurring A(A/UUAAA hexamers. While USEs are indifferently associated with strong and weak poly(A sites, DSEs are more conspicuous near strong poly(A sites. We then used the region encompassing the hexamer and DSE as a training set for poly(A site identification by the ERPIN program and achieved a prediction specificity of 69 to 85% for a sensitivity of 56%. Conclusion The availability of complete genomes and large EST sequence databases now permit large-scale observation of polyadenylation sites. Both U-rich sequences flanking both sides of poly(A signals contribute to the definition of "true" sites. However, the downstream U-rich sequences may also play an enhancing role. Based on this information, poly(A site prediction accuracy was moderately but consistently improved compared to the best previously available algorithm.

  18. Assessment of whole genome amplification for sequence capture and massively parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Johanna Hasmats

    Full Text Available Exome sequence capture and massively parallel sequencing can be combined to achieve inexpensive and rapid global analyses of the functional sections of the genome. The difficulties of working with relatively small quantities of genetic material, as may be necessary when sharing tumor biopsies between collaborators for instance, can be overcome using whole genome amplification. However, the potential drawbacks of using a whole genome amplification technology based on random primers in combination with sequence capture followed by massively parallel sequencing have not yet been examined in detail, especially in the context of mutation discovery in tumor material. In this work, we compare mutations detected in sequence data for unamplified DNA, whole genome amplified DNA, and RNA originating from the same tumor tissue samples from 16 patients diagnosed with non-small cell lung cancer. The results obtained provide a comprehensive overview of the merits of these techniques for mutation analysis. We evaluated the identified genetic variants, and found that most (74% of them were observed in both the amplified and the unamplified sequence data. Eighty-nine percent of the variations found by WGA were shared with unamplified DNA. We demonstrate a strategy for avoiding allelic bias by including RNA-sequencing information.

  19. Assessment of whole genome amplification for sequence capture and massively parallel sequencing.

    Science.gov (United States)

    Hasmats, Johanna; Gréen, Henrik; Orear, Cedric; Validire, Pierre; Huss, Mikael; Käller, Max; Lundeberg, Joakim

    2014-01-01

    Exome sequence capture and massively parallel sequencing can be combined to achieve inexpensive and rapid global analyses of the functional sections of the genome. The difficulties of working with relatively small quantities of genetic material, as may be necessary when sharing tumor biopsies between collaborators for instance, can be overcome using whole genome amplification. However, the potential drawbacks of using a whole genome amplification technology based on random primers in combination with sequence capture followed by massively parallel sequencing have not yet been examined in detail, especially in the context of mutation discovery in tumor material. In this work, we compare mutations detected in sequence data for unamplified DNA, whole genome amplified DNA, and RNA originating from the same tumor tissue samples from 16 patients diagnosed with non-small cell lung cancer. The results obtained provide a comprehensive overview of the merits of these techniques for mutation analysis. We evaluated the identified genetic variants, and found that most (74%) of them were observed in both the amplified and the unamplified sequence data. Eighty-nine percent of the variations found by WGA were shared with unamplified DNA. We demonstrate a strategy for avoiding allelic bias by including RNA-sequencing information.

  20. Processing modes and parallel processors in producing familiar keying sequences

    NARCIS (Netherlands)

    Verwey, Willem B.

    2003-01-01

    Recent theorizing indicates that the acquisition of movement sequence skill involves the development of several independent sequence representations at the same time. To examine this for the discrete sequence production task, participants in Experiment 1 produced a highly practiced sequence of six k

  1. The Polytopic-k-Step Fibonacci Sequences in Finite Groups

    Directory of Open Access Journals (Sweden)

    Ömür Deveci

    2011-01-01

    Full Text Available We study the polytopic-k-step Fibonacci sequences, the polytopic-k-step Fibonacci sequences modulo m, and the polytopic-k-step Fibonacci sequences in finite groups. Also, we examine the periods of the polytopic-k-step Fibonacci sequences in semidihedral group SD2m.

  2. Quantitative biostratigraphy and species evolutionary se-quence

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Introduction of species evolutionary sequence into the quantitative biostratigraphy is a significant work, either for studying biologic evolution or for making stratigraphic correlation and reconstructing geologic history. The quantitative biostratigraphy is to determine biostratigraphic event sequences by using probabilistic analysis. The evolutionary sequence systematics can efficiently ascertain species evolutionary sequences. Two methods have been proposed to determine the sequence of species-disappearance events: (1) species extinction events can be closed by last occurrence events using quantitative biostratigraphic analysis; (2) the duration of a species may be approximately replaced by the duration of its parent species. To combine these two methods for determining the sequence of species disappearance is the best way up to now. A consulting standard sequence that consists of the speciation sequence of Permian waagenophylloid corals and the biostratigraphic event sequence of other important fossils in Permian is used as an example. The group spearman rank-correlation test is used to test the consulting standard sequence by comparing four types of calculations and two kinds of sequences and to find abnormal events. Based on the found abnormal events in the test, the consulting standard sequence is revised to deal with different conditions. Sequences of speciation and species-disappearance, and species duration are determined. Application of species evolutionary sequence to quantitative biostratigraphy can largely improve the quality of biostratigraphic event sequence. In stratigraphic correlation, furthermore, event sequences have higher precision than range biozones.

  3. The nucleotide sequences of two leghemoglobin genes from soybean

    DEFF Research Database (Denmark)

    Wiborg, O; Hyldig-Nielsen, J J; Jensen, E O

    1982-01-01

    We present the complete nucleotide sequences of two leghemoglobin genes isolated from soybean DNA. Both genes contain three intervening sequences in identical positions. Comparison of the coding sequences with known amino-acid sequences of soybean leghemoglobins suggest that the two genes...

  4. Strong Limit Theorems for Arbitrary Fuzzy Stochastic Sequences

    Institute of Scientific and Technical Information of China (English)

    FEI Wei-yin

    2008-01-01

    Based on fuzzy random variables, the concept of fuzzy stochastic sequences is defined. Strong limit theorems for fuzzy stochastic sequences are established. Some known results in non-fuzzy stochastic sequences are extended. In order to prove results of this paper, the notion of fuzzy martingale difference sequences is also introduced.

  5. On Period of the Sequence of Fibonacci Polynomials Modulo

    Directory of Open Access Journals (Sweden)

    İnci Gültekin

    2013-01-01

    Full Text Available It is shown that the sequence obtained by reducing modulo coefficient and exponent of each Fibonacci polynomials term is periodic. Also if is prime, then sequences of Fibonacci polynomial are compared with Wall numbers of Fibonacci sequences according to modulo . It is found that order of cyclic group generated with matrix is equal to the period of these sequences.

  6. Bacterial repetitive extragenic palindromic sequences are DNA targets for Insertion Sequence elements

    Directory of Open Access Journals (Sweden)

    Pareja Eduardo

    2006-03-01

    Full Text Available Abstract Background Mobile elements are involved in genomic rearrangements and virulence acquisition, and hence, are important elements in bacterial genome evolution. The insertion of some specific Insertion Sequences had been associated with repetitive extragenic palindromic (REP elements. Considering that there are a sufficient number of available genomes with described REPs, and exploiting the advantage of the traceability of transposition events in genomes, we decided to exhaustively analyze the relationship between REP sequences and mobile elements. Results This global multigenome study highlights the importance of repetitive extragenic palindromic elements as target sequences for transposases. The study is based on the analysis of the DNA regions surrounding the 981 instances of Insertion Sequence elements with respect to the positioning of REP sequences in the 19 available annotated microbial genomes corresponding to species of bacteria with reported REP sequences. This analysis has allowed the detection of the specific insertion into REP sequences for ISPsy8 in Pseudomonas syringae DC3000, ISPa11 in P. aeruginosa PA01, ISPpu9 and ISPpu10 in P. putida KT2440, and ISRm22 and ISRm19 in Sinorhizobium meliloti 1021 genome. Preference for insertion in extragenic spaces with REP sequences has also been detected for ISPsy7 in P. syringae DC3000, ISRm5 in S. meliloti and ISNm1106 in Neisseria meningitidis MC58 and Z2491 genomes. Probably, the association with REP elements that we have detected analyzing genomes is only the tip of the iceberg, and this association could be even more frequent in natural isolates. Conclusion Our findings characterize REP elements as hot spots for transposition and reinforce the relationship between REP sequences and genomic plasticity mediated by mobile elements. In addition, this study defines a subset of REP-recognizer transposases with high target selectivity that can be useful in the development of new tools for

  7. Analysis of expressed sequence tags from Plasmodium falciparum.

    Science.gov (United States)

    Chakrabarti, D; Reddy, G R; Dame, J B; Almira, E C; Laipis, P J; Ferl, R J; Yang, T P; Rowe, T C; Schuster, S M

    1994-07-01

    An initiative was undertaken to sequence all genes of the human malaria parasite Plasmodium falciparum in an effort to gain a better understanding at the molecular level of the parasite that inflicts much suffering in the developing world. 550 random complimentary DNA clones were partially sequenced from the intraerythrocytic form of the parasite as one of the approaches to analyze the transcribed sequences of its genome. The sequences, after editing, generated 389 expressed sequence tag sites and over 105 kb of DNA sequences. About 32% of these clones showed significant homology with other genes in the database. These clones represent 340 new Plasmodium falciparum expressed sequence tags.

  8. High order coherent control sequences of fat pulses

    CERN Document Server

    Pasini, S; Uhrig, G S

    2010-01-01

    We analyze the performance of sequences of fat pulses of various lengths and shapes for dynamic decoupling and we compare it with that of sequences of ideal, instantaneous pulses. The use of second order, shaped pulses represents a significant improvement. Non-equidistant sequences characterized by pulse durations scaled proportional to the duration T of the sequence strikingly outperform the sequences with pulses of constant length for small T. Interestingly, for longer durations sequences of pulses of substantial length are found to suppress dephasing better than sequences of ideal pulses.

  9. The first determination of DNA sequence of a specific gene.

    Science.gov (United States)

    Inouye, Masayori

    2016-05-10

    How and when the first DNA sequence of a gene was determined? In 1977, F. Sanger came up with an innovative technology to sequence DNA by using chain terminators, and determined the entire DNA sequence of the 5375-base genome of bacteriophage φX 174 (Sanger et al., 1977). While this Sanger's achievement has been recognized as the first DNA sequencing of genes, we had determined DNA sequence of a gene, albeit a partial sequence, 11 years before the Sanger's DNA sequence (Okada et al., 1966).

  10. Topological sequence entropy of continuous maps on topological spaces

    Directory of Open Access Journals (Sweden)

    Lei Liu

    2013-12-01

    Full Text Available In this paper we propose a new definition of topological sequence entropy for continuous maps on arbitrary topological spaces (compactness, metrizability, even axioms of separation not necessarily required, investigate fundamental properties of the new sequence entropy, and compare the new sequence entropy with the existing ones. The defined sequence entropy generates that of Goodman. Yet, it holds various basic properties of Goodman’s sequence entropy, e.g., the sequence entropy of a subsystem is bounded by that of the original system, topologically conjugated systems have a same sequence entropy, the sequence entropy of the induced hyperspace system is larger than or equal to that of the original system, and in particular this new sequence entropy coincides with Goodman’s sequence entropy for compact systems.

  11. Spot-Based Generations for Meta-Fibonacci Sequences

    CERN Document Server

    Dalton, Barnaby; Tanny, Stephen

    2011-01-01

    For many meta-Fibonacci sequences it is possible to identify a partition of the sequence into successive intervals (sometimes called blocks) with the property that the sequence behaves "similarly" in each block. This partition provides insights into the sequence properties. To date, for any given sequence, only ad hoc methods have been available to identify this partition. We apply a new concept - the spot-based generation sequence - to derive a general methodology for identifying this partition for a large class of meta-Fibonacci sequences. This class includes the Conolly and Conway sequences and many of their well-behaved variants, and even some highly chaotic sequences, such as Hofstadter's famous Q-sequence.

  12. SNP calling by sequencing pooled samples

    Directory of Open Access Journals (Sweden)

    Raineri Emanuele

    2012-09-01

    Full Text Available Abstract Background Performing high throughput sequencing on samples pooled from different individuals is a strategy to characterize genetic variability at a small fraction of the cost required for individual sequencing. In certain circumstances some variability estimators have even lower variance than those obtained with individual sequencing. SNP calling and estimating the frequency of the minor allele from pooled samples, though, is a subtle exercise for at least three reasons. First, sequencing errors may have a much larger relevance than in individual SNP calling: while their impact in individual sequencing can be reduced by setting a restriction on a minimum number of reads per allele, this would have a strong and undesired effect in pools because it is unlikely that alleles at low frequency in the pool will be read many times. Second, the prior allele frequency for heterozygous sites in individuals is usually 0.5 (assuming one is not analyzing sequences coming from, e.g. cancer tissues, but this is not true in pools: in fact, under the standard neutral model, singletons (i.e. alleles of minimum frequency are the most common class of variants because P(f ∝ 1/f and they occur more often as the sample size increases. Third, an allele appearing only once in the reads from a pool does not necessarily correspond to a singleton in the set of individuals making up the pool, and vice versa, there can be more than one read – or, more likely, none – from a true singleton. Results To improve upon existing theory and software packages, we have developed a Bayesian approach for minor allele frequency (MAF computation and SNP calling in pools (and implemented it in a program called snape: the approach takes into account sequencing errors and allows users to choose different priors. We also set up a pipeline which can simulate the coalescence process giving rise to the SNPs, the pooling procedure and the sequencing. We used it to compare the

  13. Internally quenched fluorescent peptide libraries with randomized sequences designed to detect endopeptidases.

    Science.gov (United States)

    Oliveira, Lilian C G; Silva, Vinícius O; Okamoto, Debora N; Kondo, Marcia Y; Santos, Saara M B; Hirata, Isaura Y; Vallim, Marcelo A; Pascon, Renata C; Gouvea, Iuri E; Juliano, Maria A; Juliano, Luiz

    2012-02-01

    Identification of synthetic peptide substrates for novel peptidases is an essential step for their study. With this purpose we synthesized fluorescence resonance energy transfer (FRET) peptide libraries Abz (or MCA)-GXXXXXQ-EDDnp and Abz (or MCA)-GXXZXXQ-EDDnp, where X consists of an equimolar mixture of all amino acids, the Z position is fixed with one of the proteinogenic amino acids (cysteine was excluded), Abz (ortho-aminobenzoic acid) or MCA ([7-amino-4-methyl]coumarin) is the fluorescence donor and Q-EDDnp (glutamine-[N-(2,4-dinitrophenyl)-ethylenediamine]) is the fluorescence acceptor. The peptide libraries MCA-GXXX↓XXQ-EDDnp and MCA-GXXZ↓XXQ-EDDnp were cleaved as indicated (↓) by trypsin, chymotrypsin, cathepsin L, pepsin A, and Eqolisin as confirmed by Edman degradation of the products derived from the digestion of these libraries. The best hydrolyzed Abz-GXXZXXQ-EDDnp sublibraries by these proteases, including Dengue 2 virus NS2B-NS3 protease, contained amino acids at the Z position that are reported to be well accepted by their S(1) subsite. The pH profiles of the hydrolytic activities of these canonical proteases on the libraries were similar to those reported for typical substrates. The FRET peptide libraries provide an efficient and simple approach for detecting nanomolar concentrations of endopeptidases and are useful for initial specificity characterization as performed for two proteases secreted by a Bacillus subtilis.

  14. Bayesian coestimation of phylogeny and sequence alignment

    Directory of Open Access Journals (Sweden)

    Jensen Jens

    2005-04-01

    Full Text Available Abstract Background Two central problems in computational biology are the determination of the alignment and phylogeny of a set of biological sequences. The traditional approach to this problem is to first build a multiple alignment of these sequences, followed by a phylogenetic reconstruction step based on this multiple alignment. However, alignment and phylogenetic inference are fundamentally interdependent, and ignoring this fact leads to biased and overconfident estimations. Whether the main interest be in sequence alignment or phylogeny, a major goal of computational biology is the co-estimation of both. Results We developed a fully Bayesian Markov chain Monte Carlo method for coestimating phylogeny and sequence alignment, under the Thorne-Kishino-Felsenstein model of substitution and single nucleotide insertion-deletion (indel events. In our earlier work, we introduced a novel and efficient algorithm, termed the "indel peeling algorithm", which includes indels as phylogenetically informative evolutionary events, and resembles Felsenstein's peeling algorithm for substitutions on a phylogenetic tree. For a fixed alignment, our extension analytically integrates out both substitution and indel events within a proper statistical model, without the need for data augmentation at internal tree nodes, allowing for efficient sampling of tree topologies and edge lengths. To additionally sample multiple alignments, we here introduce an efficient partial Metropolized independence sampler for alignments, and combine these two algorithms into a fully Bayesian co-estimation procedure for the alignment and phylogeny problem. Our approach results in estimates for the posterior distribution of evolutionary rate parameters, for the maximum a-posteriori (MAP phylogenetic tree, and for the posterior decoding alignment. Estimates for the evolutionary tree and multiple alignment are augmented with confidence estimates for each node height and alignment column

  15. Pyrosequencing-An Alternative to Traditional Sanger Sequencing

    OpenAIRE

    2012-01-01

    Problem statement: Pyrosequencing has the potential to rapidly and reliably sequence DNA taking advantages over traditional Sanger di-deoxy sequencing approach. Approach: A comprehensive review of the literature on the principles, applications, challenges and prospects of pyrosequencing was performed. Results: Pyrosequencing was a DNA sequencing technology based on the sequencing-by-synthesis principle. It employs a series of four enzymes to accurately detect nucleic acid sequences during the...

  16. Sequence in intuitionistic fuzzy soft multi topological spaces

    Directory of Open Access Journals (Sweden)

    AJOY DAS

    2015-10-01

    Full Text Available In this paper, we introduce a new type of sequence of intuitionistic fuzzy soft multi sets and their basic properties are studied. The concepts of subsequence, increasing sequence, decreasing sequence and convergence sequence of intuitionistic fuzzy soft multi sets are proposed. The concepts of cluster intuitionistic fuzzy soft multi sets of sequences are also introduced. Some basic properties regarding the above concepts are explored.

  17. A symmetry-related sequence-structure relation of proteins

    Institute of Scientific and Technical Information of China (English)

    XU Ruizhen; LI Mingfen; CHEN Hanlin; HUANG Yanzhao; XIAO Yi

    2005-01-01

    Proteins have regular tertiary structures but irregular amino acid sequences. This made it very difficult to decode the structural information in the protein sequences. Here we demonstrate that many small αprotein domains have hidden sequence symmetries characteristic of their pseudo-symmetric tertiary structures. We also present a modified method of recurrent plot to reveal this kind of the hidden sequence symmetry. The results may enable us to understand part of the relations between protein sequences and their tertiary structures.

  18. NCBI Mass Sequence Downloader–Large dataset downloading made easy

    OpenAIRE

    Pina-Martins, F; Paulo, O. S.

    2016-01-01

    Sequence databases, such as NCBI, are a very important resource in many areas of science. Downloading small amounts of sequences to local storage can easily be performed using any recent web browser, but downloading tens of thousands of sequences is not as simple. NCBI Mass Sequence Downloader is an open source program aimed at simplifying obtaining large amounts of sequence data from NCBI databases to local storage. It is written in python (can be run under both python 2 and python 3), an...

  19. Quick Trickle Permutation Based on Quick Trickle Characteristic Sequence

    Institute of Scientific and Technical Information of China (English)

    Wang Li-na; Fei Ru-chun; Liu Zhu

    2003-01-01

    The concept of quick trickle characteristic sequence is presented, the properties and count of quick trickle characteristic sequence are researched, the mapping relationship between quick trickle characteristic sequence and quick trickle permutation is discussed. Finally, an efficient construction of quick trickle permutation based on quick trickle characteristic sequence is given, by which quick trickle permutation can be figured out after constructing quick trickle characteristic sequence. Quick trickle permutation has good cryptographic properties.

  20. Dinoflagellate 17S rRNA sequence inferred from the gene sequence: Evolutionary implications

    Science.gov (United States)

    Herzog, Michel; Maroteaux, Luc

    1986-01-01

    We present the complete sequence of the nuclear-encoded small-ribosomal-subunit RNA inferred from the cloned gene sequence of the dinoflagellate Prorocentrum micans. The dinoflagellate 17S rRNA sequence of 1798 nucleotides is contained in a family of 200 tandemly repeated genes per haploid genome. A tentative model of the secondary structure of P. micans 17S rRNA is presented. This sequence is compared with the small-ribosomal-subunit rRNA of Xenopus laevis (Animalia), Saccharomyces cerevisiae (Fungi), Zea mays (Planta), Dictyostelium discoideum (Protoctista), and Halobacterium volcanii (Monera). Although the secondary structure of the dinoflagellate 17S rRNA presents most of the eukaryotic characteristics, it contains sufficient archaeobacterial-like structural features to reinforce the view that dinoflagellates branch off very early from the eukaryotic lineage. PMID:16578795

  1. 0,1 distribution in the highest level sequences of primitive sequences over Z2e

    Institute of Scientific and Technical Information of China (English)

    FAN; Shuqin(

    2003-01-01

    [1]Ward, M., The arithmetical theory of linear recurring sequences, Trans. Amer. Math. Soc, 1933, 35(6):600-628.[2]Dai Zongduo, Binary sequences derived from ML-sequences over rings I: Periods and minimal polynomials,Journal of Cryptology, 1992, 5: 193-207.[3]Dai, Z. D., Beth, T., Gollman, D., Lower bounds for the linear complexity of sequences over residue rings, Advances in Cryptology-Eurocrypt's 90, Spring-Verlag LNCS 19991, 473: 189-195.[4]Zeng Kencheng, Dai Zongduo, Huang Minqiang, Injectiveness of mappings from ring sequences to their sequences of the significant bits, Symposium on Theoretical Problems of Cryptology, State Key Laboratory of Information Security, Beijing, China, June 1995, 132-141.[5]Boztas, S., Hammons, A. R., Kumar, P. V., 4-phase sequences with near-optimum correlation properties, IEEE. Trans. Inform. Theory, 1992, 38: 1101-1113.[6]Kuzmin, A. S., Nechaev, A. A., A construction of noise stable codes using linear recurrents over Galois rings,Russian Math. Surveys, 1992, 47: 189-190.[7]Qi Wenfeng, Zhou Jinjun, Distribution of 0 and 1 in highest level of primitive sequences over Z2e, Science in China, Ser. A, 1997, 40(6): 606-611.[8]Qi Wenfeng, Zhou Jinjun, Distribution of 0 and 1 in highest level of primitive sequences over Z2e (Ц),Chinese Science Bulletin, 1998, 43(8): 633-635.[9]Zhu Fengxiang, Qi Wenfeng, Distribution of 0 and 1 in the highest level of primitive sequences over Z2e,Advances in Cryptology-CHINACRYPT' 2000, Beijing: Science Press, 2000, 1-5.[10]Kamlovski, O. V., Kuzmin, A. S., Distribution of elements on cycles of linear recurrents sequences over Galois rings, Russian Math. Surveys, 1998, 53(2): 392-393.[11]Kumar, P. V., Helleseth, T., Calderbank, A. R., An upper bound for Weil exponential sums over Galois rings and applications, IEEE. Trans. Infor. Theory, 1995, 41:456-468.

  2. Unbounded randomness certification using sequences of measurements

    Science.gov (United States)

    Curchod, F. J.; Johansson, M.; Augusiak, R.; Hoban, M. J.; Wittek, P.; Acín, A.

    2017-02-01

    Unpredictability, or randomness, of the outcomes of measurements made on an entangled state can be certified provided that the statistics violate a Bell inequality. In the standard Bell scenario where each party performs a single measurement on its share of the system, only a finite amount of randomness, of at most 4 log2d bits, can be certified from a pair of entangled particles of dimension d . Our work shows that this fundamental limitation can be overcome using sequences of (nonprojective) measurements on the same system. More precisely, we prove that one can certify any amount of random bits from a pair of qubits in a pure state as the resource, even if it is arbitrarily weakly entangled. In addition, this certification is achieved by near-maximal violation of a particular Bell inequality for each measurement in the sequence.

  3. Accelerated large-scale multiple sequence alignment

    Directory of Open Access Journals (Sweden)

    Lloyd Scott

    2011-12-01

    Full Text Available Abstract Background Multiple sequence alignment (MSA is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations according to Amdahl's Law. This work is the first known to accelerate the third stage of progressive alignment on reconfigurable hardware. Results We reduce subgroups of aligned sequences into discrete profiles before they are pairwise aligned on the accelerator. Using an FPGA accelerator, an overall speedup of up to 150 has been demonstrated on a large data set when compared to a 2.4 GHz Core2 processor. Conclusions Our parallel algorithm and architecture accelerates large-scale MSA with reconfigurable computing and allows researchers to solve the larger problems that confront biologists today. Program source is available from http://dna.cs.byu.edu/msa/.

  4. Cardiac nonrigid motion analysis from image sequences

    Institute of Scientific and Technical Information of China (English)

    LIU Huafeng

    2006-01-01

    Noninvasive estimation of the soft tissue kinematics properties from medical image sequences has many important clinical and physiological implications, such as the diagnosis of heart diseases and the understanding of cardiac mechanics. In this paper, we present a biomechanics based strategy, framed as a priori constraints for the ill-posed motion recovery problema, to realize estimation of the cardiac motion and deformation parameters. By constructing the heart dynamics system equations from biomechanics principles, we use the finite element method to generate smooth estimates.of heart kinematics throughout the cardiac cycle. We present the application of the strategy to the estimation of displacements and strains from in vivo left ventricular magnetic resonance image sequence.

  5. The characterization of twenty sequenced human genomes.

    Directory of Open Access Journals (Sweden)

    Kimberly Pelak

    2010-09-01

    Full Text Available We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten "case" genomes from individuals with severe hemophilia A and ten "control" genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.

  6. Atypical presentation of amniotic band sequence.

    Science.gov (United States)

    Bodamer, O A; Popek, E J; Bacino, C

    2001-04-22

    Amniotic Band Sequence (ABS) is a disruption sequence that results in a variable group of abnormalities secondary to the disruption process and subsequent deformations. The incidence of ABS ranges from 1:1,200 to 1:15,000 live-born, and is even higher in still-born [Froster and Baird, 1993: Am J Med Genet 46:497-500]. The pathophysiology of ABS remains controversial, but a close look to critical periods of embryogenesis and/or organogenesis has helped in understanding pathogenetic mechanisms leading to the ABS disruption. The abnormalities are typically limited to external structures; however, associated internal malformations as seen in the case reported here may occur [Hunter and Carpenter, 1986: Am J Med Genet 24:691-700]. The prognosis depends on the severity of the abnormalities and the involvement of internal organs [Froster and Baird; 1993: Am J Med Genet 46:497-500; Levy, 1998: Ped Rev 19:249].

  7. HAMSA: Highly Accelerated Multiple Sequence Aligner

    Directory of Open Access Journals (Sweden)

    Naglaa M. Reda

    2016-06-01

    Full Text Available For biologists, the existence of an efficient tool for multiple sequence alignment is essential. This work presents a new parallel aligner called HAMSA. HAMSA is a bioinformatics application designed for highly accelerated alignment of multiple sequences of proteins and DNA/RNA on a multi-core cluster system. The design of HAMSA is based on a combination of our new optimized algorithms proposed recently of vectorization, partitioning, and scheduling. It mainly operates on a distance vector instead of a distance matrix. It accomplishes similarity computations and generates the guide tree in a highly accelerated and accurate manner. HAMSA outperforms MSAProbs with 21.9- fold speedup, and ClustalW-MPI of 11-fold speedup. It can be considered as an essential tool for structure prediction, protein classification, motive finding and drug design studies.

  8. Elastic sequence correlation for human action analysis.

    Science.gov (United States)

    Wang, Li; Cheng, Li; Wang, Liang

    2011-06-01

    This paper addresses the problem of automatically analyzing and understanding human actions from video footage. An "action correlation" framework, elastic sequence correlation (ESC), is proposed to identify action subsequences from a database of (possibly long) video sequences that are similar to a given query video action clip. In particular, we show that two well-known algorithms, namely approximate pattern matching in computer and information sciences and dynamic time warping (DTW) method in signal processing, are special cases of our ESC framework. The proposed framework is applied to two important real-world applications: action pattern retrieval, as well as action segmentation and recognition, where, on average, its run time speed (in matlab) is about 3.3 frames per second. In addition, comparing with the state-of-the-art algorithms on a number of challenging data sets, our approach is demonstrated to perform competitively.

  9. The International Nucleotide Sequence Database Collaboration

    Science.gov (United States)

    Cochrane, Guy; Karsch-Mizrachi, Ilene; Takagi, Toshihisa; Sequence Database Collaboration, International Nucleotide

    2016-01-01

    The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org) comprises three global partners committed to capturing, preserving and providing comprehensive public-domain nucleotide sequence information. The INSDC establishes standards, formats and protocols for data and metadata to make it easier for individuals and organisations to submit their nucleotide data reliably to public archives. This work enables the continuous, global exchange of information about living things. Here we present an update of the INSDC in 2015, including data growth and diversification, new standards and requirements by publishers for authors to submit their data to the public archives. The INSDC serves as a model for data sharing in the life sciences. PMID:26657633

  10. Sequence Matching Analysis for Curriculum Development

    Directory of Open Access Journals (Sweden)

    Liem Yenny Bendatu

    2015-06-01

    Full Text Available Many organizations apply information technologies to support their business processes. Using the information technologies, the actual events are recorded and utilized to conform with predefined model. Conformance checking is an approach to measure the fitness and appropriateness between process model and actual events. However, when there are multiple events with the same timestamp, the traditional approach unfit to result such measures. This study attempts to develop a sequence matching analysis. Considering conformance checking as the basis of this approach, this proposed approach utilizes the current control flow technique in process mining domain. A case study in the field of educational process has been conducted. This study also proposes a curriculum analysis framework to test the proposed approach. By considering the learning sequence of students, it results some measurements for curriculum development. Finally, the result of the proposed approach has been verified by relevant instructors for further development.

  11. Tritium pellet injection sequences for TFTR

    Energy Technology Data Exchange (ETDEWEB)

    Houlberg, W.A.; Milora, S.L.; Attenberger, S.E.; Singer, C.E.; Schmidt, G.L.

    1983-01-01

    Tritium pellet injection into neutral deuterium, beam heated deuterium plasmas in the Tokamak Fusion Test Reactor (TFTR) is shown to be an attractive means of (1) minimizing tritium use per tritium discharge and over a sequence of tritium discharges; (2) greatly reducing the tritium load in the walls, limiters, getters, and cryopanels; (3) maintaining or improving instantaneous neutron production (Q); (4) reducing or eliminating deuterium-tritium (D-T) neutron production in non-optimized discharges; and (5) generally adding flexibility to the experimental sequences leading to optimal Q operation. Transport analyses of both compression and full-bore TFTR plasmas are used to support the above observations and to provide the basis for a proposed eight-pellet gas gun injector for the 1986 tritium experiments.

  12. Transfer in motor sequence learning: effects of practice schedule and sequence context

    OpenAIRE

    Diana Margit Müssgens; Fredrik eUllén

    2015-01-01

    Transfer (i.e., the application of a learned skill in a novel context) is an important and desirable outcome of motor skill learning. While much research has been devoted to understanding transfer of explicit skills the mechanisms of skill transfer after incidental learning remain poorly understood. The aim of this study was to 1) examine the effect of practice schedule on transfer and 2) investigate whether sequence-specific knowledge can transfer to an unfamiliar sequence context. We traine...

  13. An evaluation of Comparative Genome Sequencing (CGS by comparing two previously-sequenced bacterial genomes

    Directory of Open Access Journals (Sweden)

    Herring Christopher D

    2007-08-01

    Full Text Available Abstract Background With the development of new technology, it has recently become practical to resequence the genome of a bacterium after experimental manipulation. It is critical though to know the accuracy of the technique used, and to establish confidence that all of the mutations were detected. Results In order to evaluate the accuracy of genome resequencing using the microarray-based Comparative Genome Sequencing service provided by Nimblegen Systems Inc., we resequenced the E. coli strain W3110 Kohara using MG1655 as a reference, both of which have been completely sequenced using traditional sequencing methods. CGS detected 7 of 8 small sequence differences, one large deletion, and 9 of 12 IS element insertions present in W3110, but did not detect a large chromosomal inversion. In addition, we confirmed that CGS also detected 2 SNPs, one deletion and 7 IS element insertions that are not present in the genome sequence, which we attribute to changes that occurred after the creation of the W3110 lambda clone library. The false positive rate for SNPs was one per 244 Kb of genome sequence. Conclusion CGS is an effective way to detect multiple mutations present in one bacterium relative to another, and while highly cost-effective, is prone to certain errors. Mutations occurring in repeated sequences or in sequences with a high degree of secondary structure may go undetected. It is also critical to follow up on regions of interest in which SNPs were not called because they often indicate deletions or IS element insertions.

  14. Transcriptome sequencing of the Microarray Quality Control (MAQC RNA reference samples using next generation sequencing

    Directory of Open Access Journals (Sweden)

    Thierry-Mieg Danielle

    2009-06-01

    Full Text Available Abstract Background Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC reference RNA samples using Roche's 454 Genome Sequencer FLX. Results We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. Conclusion Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.

  15. Image registration method for medical image sequences

    Science.gov (United States)

    Gee, Timothy F.; Goddard, James S.

    2013-03-26

    Image registration of low contrast image sequences is provided. In one aspect, a desired region of an image is automatically segmented and only the desired region is registered. Active contours and adaptive thresholding of intensity or edge information may be used to segment the desired regions. A transform function is defined to register the segmented region, and sub-pixel information may be determined using one or more interpolation methods.

  16. Exome sequencing of a multigenerational human pedigree.

    Science.gov (United States)

    Hedges, Dale J; Hedges, Dale; Burges, Dan; Powell, Eric; Almonte, Cherylyn; Huang, Jia; Young, Stuart; Boese, Benjamin; Schmidt, Mike; Pericak-Vance, Margaret A; Martin, Eden; Zhang, Xinmin; Harkins, Timothy T; Züchner, Stephan

    2009-12-14

    Over the next few years, the efficient use of next-generation sequencing (NGS) in human genetics research will depend heavily upon the effective mechanisms for the selective enrichment of genomic regions of interest. Recently, comprehensive exome capture arrays have become available for targeting approximately 33 Mb or approximately 180,000 coding exons across the human genome. Selective genomic enrichment of the human exome offers an attractive option for new experimental designs aiming to quickly identify potential disease-associated genetic variants, especially in family-based studies. We have evaluated a 2.1 M feature human exome capture array on eight individuals from a three-generation family pedigree. We were able to cover up to 98% of the targeted bases at a long-read sequence read depth of > or = 3, 86% at a read depth of > or = 10, and over 50% of all targets were covered with > or = 20 reads. We identified up to 14,284 SNPs and small indels per individual exome, with up to 1,679 of these representing putative novel polymorphisms. Applying the conservative genotype calling approach HCDiff, the average rate of detection of a variant allele based on Illumina 1 M BeadChips genotypes was 95.2% at > or = 10x sequence. Further, we propose an advantageous genotype calling strategy for low covered targets that empirically determines cut-off thresholds at a given coverage depth based on existing genotype data. Application of this method was able to detect >99% of SNPs covered > or = 8x. Our results offer guidance for "real-world" applications in human genetics and provide further evidence that microarray-based exome capture is an efficient and reliable method to enrich for chromosomal regions of interest in next-generation sequencing experiments.

  17. Exome sequencing of a multigenerational human pedigree.

    Directory of Open Access Journals (Sweden)

    Dale J Hedges

    Full Text Available Over the next few years, the efficient use of next-generation sequencing (NGS in human genetics research will depend heavily upon the effective mechanisms for the selective enrichment of genomic regions of interest. Recently, comprehensive exome capture arrays have become available for targeting approximately 33 Mb or approximately 180,000 coding exons across the human genome. Selective genomic enrichment of the human exome offers an attractive option for new experimental designs aiming to quickly identify potential disease-associated genetic variants, especially in family-based studies. We have evaluated a 2.1 M feature human exome capture array on eight individuals from a three-generation family pedigree. We were able to cover up to 98% of the targeted bases at a long-read sequence read depth of > or = 3, 86% at a read depth of > or = 10, and over 50% of all targets were covered with > or = 20 reads. We identified up to 14,284 SNPs and small indels per individual exome, with up to 1,679 of these representing putative novel polymorphisms. Applying the conservative genotype calling approach HCDiff, the average rate of detection of a variant allele based on Illumina 1 M BeadChips genotypes was 95.2% at > or = 10x sequence. Further, we propose an advantageous genotype calling strategy for low covered targets that empirically determines cut-off thresholds at a given coverage depth based on existing genotype data. Application of this method was able to detect >99% of SNPs covered > or = 8x. Our results offer guidance for "real-world" applications in human genetics and provide further evidence that microarray-based exome capture is an efficient and reliable method to enrich for chromosomal regions of interest in next-generation sequencing experiments.

  18. Gao's Conjecture on Zero-Sum Sequences

    Indian Academy of Sciences (India)

    B Sury; R Thangadurai

    2002-08-01

    In this paper, we shall address three closely-related conjectures due to van Emde Boas, W D Gao and Kemnitz on zero-sum problems on $\\mathbf{Z}_p \\oplus \\mathbf{Z}_p$. We prove a number of results including a proof of the conjecture of Gao for the prime = 7 (Theorem 3.1). The conjecture of Kemnitz is also proved (Propositions 4.6, 4.9, 4.10) for many classes of sequences.

  19. Test sequence construction from SFC specification

    OpenAIRE

    Provost, Julien; Roussel, Jean-Marc; Faure, Jean-Marc

    2009-01-01

    International audience; This paper focuses on conformance test of electronic programmable devices whose specification is given in Sequential Function Chart (SFC). More precisely, a method is proposed to obtain automatically, from this specication, one minimum length test sequence which permits the exhaustive test of the behavior of the device. This method takes advantage of previous results on construction of the state machine representation of a SFC and on test of Mealy machines; conversely,...

  20. Variance of partial sums of stationary sequences

    CERN Document Server

    Deligiannidis, George

    2012-01-01

    Let $X_1, X_2,...$ be a centred sequence of weakly stationary random variables with spectral measure $F$ and partial sums $S_n = X_1 +...+ X_n$, and let $G(x) = \\int_{-x}^x F(\\rd x)$. We show that $\\var(S_n)$ is regularly varying of index $\\gamma$ at infinity, if and only if $G(x)$ is regularly varying of index $2-\\gamma$ at the origin ($0<\\gamma<2$).

  1. Adenovirus sequences required for replication in vivo.

    OpenAIRE

    Wang, K.; Pearson, G D

    1985-01-01

    We have studied the in vivo replication properties of plasmids carrying deletion mutations within cloned adenovirus terminal sequences. Deletion mapping located the adenovirus DNA replication origin entirely within the first 67 bp of the adenovirus inverted terminal repeat. This region could be further subdivided into two functional domains: a minimal replication origin and an adjacent auxillary region which boosted the efficiency of replication by more than 100-fold. The minimal origin occup...

  2. Chromatin domains and prediction of MAR sequences.

    Science.gov (United States)

    Boulikas, T

    1995-01-01

    Polynuceosomes are constrained into loops or domains and are insulated from the effects of chromatin structure and torsional strain from flanking domains by the cross-complexation of matrix-attached regions (MARs) and matrix proteins. MARs or SARs have an average size of 500 bp, are spaced about every 30 kb, and are control elements maintaining independent realms of gene activity. A fraction of MARs may cohabit with core origin replication (ORIs) and another fraction might cohabit with transcriptional enhancers. DNA replication, transcription, repair, splicing, and recombination seem to take place on the nuclear matrix. Classical AT-rich MARs have been proposed to anchor the core enhancers and core origins complexed with low abundancy transcription factors to the nuclear matrix via the cooperative binding to MARs of abundant classical matrix proteins (topoisomerase II, histone H1, lamins, SP120, ARBP, SATB1); this creates a unique nuclear microenvironment rich in regulatory proteins able to sustain transcription, replication, repair, and recombination. Theoretical searches and experimental data strongly support a model of activation of MARs and ORIs by transcription factors. A set of 21 characteristics are deduced or proposed for MAR/ORI sequences including their enrichment in inverted repeats, AT tracts, DNA unwinding elements, replication initiator protein sites, homooligonucleotide repeats (i.e., AAA, TTT, CCC), curved DNA, DNase I-hypersensitive sites, nucleosome-free stretches, polypurine stretches, and motifs with a potential for left-handed and triplex structures. We are establishing Banks of ORI and MAR sequences and have undertaken a large project of sequencing a large number of MARs in an effort to determine classes of DNA sequences in these regulatory elements and to understand their role at the origins of replication and transcriptional enhancers.

  3. Tablet—next generation sequence assembly visualization

    OpenAIRE

    Milne, Iain; Bayer, Micha; Cardle, Linda; Shaw, Paul; Stephen, Gordon; Wright, Frank; Marshall, David

    2009-01-01

    Summary: Tablet is a lightweight, high-performance graphical viewer for next-generation sequence assemblies and alignments. Supporting a range of input assembly formats, Tablet provides high-quality visualizations showing data in packed or stacked views, allowing instant access and navigation to any region of interest, and whole contig overviews and data summaries. Tablet is both multi-core aware and memory efficient, allowing it to handle assemblies containing millions of reads, even on a 32...

  4. Subresultants in Recursive Polynomial Remainder Sequence

    OpenAIRE

    Terui, Akira

    2008-01-01

    We introduce concepts of "recursive polynomial remainder sequence (PRS)" and "recursive subresultant," and investigate their properties. In calculating PRS, if there exists the GCD (greatest common divisor) of initial polynomials, we calculate "recursively" with new PRS for the GCD and its derivative, until a constant is derived. We call such a PRS a recursive PRS. We define recursive subresultants to be determinants representing the coefficients in recursive PRS by coefficients of initial po...

  5. Unsupervised statistical clustering of environmental shotgun sequences

    Directory of Open Access Journals (Sweden)

    Bhatnagar Srijak

    2009-10-01

    Full Text Available Abstract Background The development of effective environmental shotgun sequence binning methods remains an ongoing challenge in algorithmic analysis of metagenomic data. While previous methods have focused primarily on supervised learning involving extrinsic data, a first-principles statistical model combined with a self-training fitting method has not yet been developed. Results We derive an unsupervised, maximum-likelihood formalism for clustering short sequences by their taxonomic origin on the basis of their k-mer distributions. The formalism is implemented using a Markov Chain Monte Carlo approach in a k-mer feature space. We introduce a space transformation that reduces the dimensionality of the feature space and a genomic fragment divergence measure that strongly correlates with the method's performance. Pairwise analysis of over 1000 completely sequenced genomes reveals that the vast majority of genomes have sufficient genomic fragment divergence to be amenable for binning using the present formalism. Using a high-performance implementation, the binner is able to classify fragments as short as 400 nt with accuracy over 90% in simulations of low-complexity communities of 2 to 10 species, given sufficient genomic fragment divergence. The method is available as an open source package called LikelyBin. Conclusion An unsupervised binning method based on statistical signatures of short environmental sequences is a viable stand-alone binning method for low complexity samples. For medium and high complexity samples, we discuss the possibility of combining the current method with other methods as part of an iterative process to enhance the resolving power of sorting reads into taxonomic and/or functional bins.

  6. Detecting overlapping coding sequences in virus genomes

    Directory of Open Access Journals (Sweden)

    Brown Chris M

    2006-02-01

    Full Text Available Abstract Background Detecting new coding sequences (CDSs in viral genomes can be difficult for several reasons. The typically compact genomes often contain a number of overlapping coding and non-coding functional elements, which can result in unusual patterns of codon usage; conservation between related sequences can be difficult to interpret – especially within overlapping genes; and viruses often employ non-canonical translational mechanisms – e.g. frameshifting, stop codon read-through, leaky-scanning and internal ribosome entry sites – which can conceal potentially coding open reading frames (ORFs. Results In a previous paper we introduced a new statistic – MLOGD (Maximum Likelihood Overlapping Gene Detector – for detecting and analysing overlapping CDSs. Here we present (a an improved MLOGD statistic, (b a greatly extended suite of software using MLOGD, (c a database of results for 640 virus sequence alignments, and (d a web-interface to the software and database. Tests show that, from an alignment with just 20 mutations, MLOGD can discriminate non-overlapping CDSs from non-coding ORFs with a typical accuracy of up to 98%, and can detect CDSs overlapping known CDSs with a typical accuracy of 90%. In addition, the software produces a variety of statistics and graphics, useful for analysing an input multiple sequence alignment. Conclusion MLOGD is an easy-to-use tool for virus genome annotation, detecting new CDSs – in particular overlapping or short CDSs – and for analysing overlapping CDSs following frameshift sites. The software, web-server, database and supplementary material are available at http://guinevere.otago.ac.nz/mlogd.html.

  7. Periodic pattern detection in sparse boolean sequences

    Directory of Open Access Journals (Sweden)

    Hérisson Joan

    2010-09-01

    Full Text Available Abstract Background The specific position of functionally related genes along the DNA has been shown to reflect the interplay between chromosome structure and genetic regulation. By investigating the statistical properties of the distances separating such genes, several studies have highlighted various periodic trends. In many cases, however, groups built up from co-functional or co-regulated genes are small and contain wrong information (data contamination so that the statistics is poorly exploitable. In addition, gene positions are not expected to satisfy a perfectly ordered pattern along the DNA. Within this scope, we present an algorithm that aims to highlight periodic patterns in sparse boolean sequences, i.e. sequences of the type 010011011010... where the ratio of the number of 1's (denoting here the transcription start of a gene to 0's is small. Results The algorithm is particularly robust with respect to strong signal distortions such as the addition of 1's at arbitrary positions (contaminated data, the deletion of existing 1's in the sequence (missing data and the presence of disorder in the position of the 1's (noise. This robustness property stems from an appropriate exploitation of the remarkable alignment properties of periodic points in solenoidal coordinates. Conclusions The efficiency of the algorithm is demonstrated in situations where standard Fourier-based spectral methods are poorly adapted. We also show how the proposed framework allows to identify the 1's that participate in the periodic trends, i.e. how the framework allows to allocate a positional score to genes, in the same spirit of the sequence score. The software is available for public use at http://www.issb.genopole.fr/MEGA/Softwares/iSSB_SolenoidalApplication.zip.

  8. Minimum degree and density of binary sequences

    DEFF Research Database (Denmark)

    Brandt, Stephan; Müttel, J.; Rautenbach, D.;

    2010-01-01

    For d,k∈N with k ≤ 2d, let g(d,k) denote the infimum density of binary sequences (x)∈{0,1} which satisfy the minimum degree condition σ(x+) ≥ k for all i∈Z with xi=1. We reduce the problem of computing g(d,k) to a combinatorial problem related to the generalized k-girth of a graph G which...

  9. Whole-exome/genome sequencing and genomics.

    Science.gov (United States)

    Grody, Wayne W; Thompson, Barry H; Hudgins, Louanne

    2013-12-01

    As medical genetics has progressed from a descriptive entity to one focused on the functional relationship between genes and clinical disorders, emphasis has been placed on genomics. Genomics, a subelement of genetics, is the study of the genome, the sum total of all the genes of an organism. The human genome, which is contained in the 23 pairs of nuclear chromosomes and in the mitochondrial DNA of each cell, comprises >6 billion nucleotides of genetic code. There are some 23,000 protein-coding genes, a surprisingly small fraction of the total genetic material, with the remainder composed of noncoding DNA, regulatory sequences, and introns. The Human Genome Project, launched in 1990, produced a draft of the genome in 2001 and then a finished sequence in 2003, on the 50th anniversary of the initial publication of Watson and Crick's paper on the double-helical structure of DNA. Since then, this mass of genetic information has been translated at an ever-increasing pace into useable knowledge applicable to clinical medicine. The recent advent of massively parallel DNA sequencing (also known as shotgun, high-throughput, and next-generation sequencing) has brought whole-genome analysis into the clinic for the first time, and most of the current applications are directed at children with congenital conditions that are undiagnosable by using standard genetic tests for single-gene disorders. Thus, pediatricians must become familiar with this technology, what it can and cannot offer, and its technical and ethical challenges. Here, we address the concepts of human genomic analysis and its clinical applicability for primary care providers.

  10. Supercomputers and biological sequence comparison algorithms.

    Science.gov (United States)

    Core, N G; Edmiston, E W; Saltz, J H; Smith, R M

    1989-12-01

    Comparison of biological (DNA or protein) sequences provides insight into molecular structure, function, and homology and is increasingly important as the available databases become larger and more numerous. One method of increasing the speed of the calculations is to perform them in parallel. We present the results of initial investigations using two dynamic programming algorithms on the Intel iPSC hypercube and the Connection Machine as well as an inexpensive, heuristically-based algorithm on the Encore Multimax.

  11. Quantifying Next Generation Sequencing Sample Pre-Processing Bias in HIV-1 Complete Genome Sequencing.

    Science.gov (United States)

    Vrancken, Bram; Trovão, Nídia Sequeira; Baele, Guy; van Wijngaerden, Eric; Vandamme, Anne-Mieke; van Laethem, Kristel; Lemey, Philippe

    2016-01-07

    Genetic analyses play a central role in infectious disease research. Massively parallelized "mechanical cloning" and sequencing technologies were quickly adopted by HIV researchers in order to broaden the understanding of the clinical importance of minor drug-resistant variants. These efforts have, however, remained largely limited to small genomic regions. The growing need to monitor multiple genome regions for drug resistance testing, as well as the obvious benefit for studying evolutionary and epidemic processes makes complete genome sequencing an important goal in viral research. In addition, a major drawback for NGS applications to RNA viruses is the need for large quantities of input DNA. Here, we use a generic overlapping amplicon-based near full-genome amplification protocol to compare low-input enzymatic fragmentation (Nextera™) with conventional mechanical shearing for Roche 454 sequencing. We find that the fragmentation method has only a modest impact on the characterization of the population composition and that for reliable results, the variation introduced at all steps of the procedure--from nucleic acid extraction to sequencing--should be taken into account, a finding that is also relevant for NGS technologies that are now more commonly used. Furthermore, by applying our protocol to deep sequence a number of pre-therapy plasma and PBMC samples, we illustrate the potential benefits of a near complete genome sequencing approach in routine genotyping.

  12. Expressed sequence tags (ESTs and simple sequence repeat (SSR markers from octoploid strawberry (Fragaria × ananassa

    Directory of Open Access Journals (Sweden)

    Bies Dawn H

    2005-06-01

    Full Text Available Abstract Background Cultivated strawberry (Fragaria × ananassa represents one of the most valued fruit crops in the United States. Despite its economic importance, the octoploid genome presents a formidable barrier to efficient study of genome structure and molecular mechanisms that underlie agriculturally-relevant traits. Many potentially fruitful research avenues, especially large-scale gene expression surveys and development of molecular genetic markers have been limited by a lack of sequence information in public databases. As a first step to remedy this discrepancy a cDNA library has been developed from salicylate-treated, whole-plant tissues and over 1800 expressed sequence tags (EST's have been sequenced and analyzed. Results A putative unigene set of 1304 sequences – 133 contigs and 1171 singlets – has been developed, and the transcripts have been functionally annotated. Homology searches indicate that 89.5% of sequences share significant similarity to known/putative proteins or Rosaceae ESTs. The ESTs have been functionally characterized and genes relevant to specific physiological processes of economic importance have been identified. A set of tools useful for SSR development and mapping is presented. Conclusion Sequences derived from this effort may be used to speed gene discovery efforts in Fragaria and the Rosaceae in general and also open avenues of comparative mapping. This report represents a first step in expanding molecular-genetic analyses in strawberry and demonstrates how computational tools can be used to optimally mine a large body of useful information from a relatively small data set.

  13. Sequence assembly using next generation sequencing data--challenges and solutions.

    Science.gov (United States)

    Chin, Francis Y L; Leung, Henry C M; Yiu, S M

    2014-11-01

    Sequence assembling is an important step for bioinformatics study. With the help of next generation sequencing (NGS) technology, high throughput DNA fragment (reads) can be randomly sampled from DNA or RNA molecular sequence. However, as the positions of reads being sampled are unknown, assembling process is required for combining overlapped reads to reconstruct the original DNA or RNA sequence. Compared with traditional Sanger sequencing methods, although the throughput of NGS reads increases, the read length is shorter and the error rate is higher. It introduces several problems in assembling. Moreover, paired-end reads instead of single-end reads can be sampled which contain more information. The existing assemblers cannot fully utilize this information and fails to assemble longer contigs. In this article, we will revisit the major problems of assembling NGS reads on genomic, transcriptomic, metagenomic and metatranscriptomic data. We will also describe our IDBA package for solving these problems. IDBA package has adopted several novel ideas in assembling, including using multiple k, local assembling and progressive depth removal. Compared with existence assemblers, IDBA has better performance on many simulated and real sequencing datasets.

  14. PN Sequence Preestimator Scheme for DS-SS Signal Acquisition Using Block Sequence Estimation

    Directory of Open Access Journals (Sweden)

    Sang Kyu Park

    2005-03-01

    Full Text Available An m-sequence (PN sequence preestimator scheme for direct-sequence spread spectrum (DS-SS signal acquisition by using block sequence estimation (BSE is proposed and analyzed. The proposed scheme consists of an estimator and a verifier which work according to the PN sequence chip clock, and provides not only the enhanced chip estimates with a threshold decision logic and one-chip error correction among the first m received chips, but also the reliability check of the estimates with additional decision logic. The probabilities of the estimator and verifier operations are calculated. With these results, the detection, the false alarm, and the missing probabilities of the proposed scheme are derived. In addition, using a signal flow graph, the average acquisition time is calculated. The proposed scheme can be used as a preestimator and easily implemented by changing the internal signal path of a generally used digital matched filter (DMF correlator or any other correlator that has a lot of sampling data memories for sampled PN sequence. The numerical results show rapid acquisition performance in a relatively good CNR.

  15. An approach to sequence DNA without tagging

    Science.gov (United States)

    Niu, Sanjun; Saraf, Ravi F.

    2002-10-01

    Microarray technology is playing an increasingly important role in biology and medicine and its application to genomics for gene expression analysis has already reached the market with a variety of commercially available instruments. In these combinatorial analysis methods, known probe single-strand DNA (ssDNA) 'primers' are attached in clusters of typically 100 µm × 100 µm pixels. Each pixel of the array has a slightly different sequence. On exposure to 'unknown' target ssDNA, the pixels with the right complementary probe ssDNA sequence convert to double-stranded DNA (dsDNA) by a hybridization reaction. To transduct the conversion of the pixel to dsDNA, the target ssDNA is labelled with a photoluminescent tag during the polymerase chain reaction (PCR) amplification process. Due to the statistical distribution of the tags in the target ssDNA, it becomes significantly difficult to implement these methods as a diagnostic tool in a pathology laboratory. A method to sequence DNA without tagging the molecule is developed. The fabrication process is compatible with current microelectronics and (emerging) soft-material fabrication technologies, allowing the method to be integrable with micro-electromechanical systems (MEMS) and lab-on-a-chip devices. An estimated sensitivity of 10-12 g on a 1 cm2 device area is obtained.

  16. Mapping and Sequencing the Human Genome

    Science.gov (United States)

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  17. Multiple sequence alignment accuracy and phylogenetic inference.

    Science.gov (United States)

    Ogden, T Heath; Rosenberg, Michael S

    2006-04-01

    Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.

  18. Nucleosome DNA sequence structure of isochores

    Directory of Open Access Journals (Sweden)

    Trifonov Edward N

    2011-04-01

    Full Text Available Abstract Background Significant differences in G+C content between different isochore types suggest that the nucleosome positioning patterns in DNA of the isochores should be different as well. Results Extraction of the patterns from the isochore DNA sequences by Shannon N-gram extension reveals that while the general motif YRRRRRYYYYYR is characteristic for all isochore types, the dominant positioning patterns of the isochores vary between TAAAAATTTTTA and CGGGGGCCCCCG due to the large differences in G+C composition. This is observed in human, mouse and chicken isochores, demonstrating that the variations of the positioning patterns are largely G+C dependent rather than species-specific. The species-specificity of nucleosome positioning patterns is revealed by dinucleotide periodicity analyses in isochore sequences. While human sequences are showing CG periodicity, chicken isochores display AG (CT periodicity. Mouse isochores show very weak CG periodicity only. Conclusions Nucleosome positioning pattern as revealed by Shannon N-gram extension is strongly dependent on G+C content and different in different isochores. Species-specificity of the pattern is subtle. It is reflected in the choice of preferentially periodical dinucleotides.

  19. A Fast Generic Sequence Matching Algorithm

    CERN Document Server

    Musser, David R

    2008-01-01

    A string matching -- and more generally, sequence matching -- algorithm is presented that has a linear worst-case computing time bound, a low worst-case bound on the number of comparisons (2n), and sublinear average-case behavior that is better than that of the fastest versions of the Boyer-Moore algorithm. The algorithm retains its efficiency advantages in a wide variety of sequence matching problems of practical interest, including traditional string matching; large-alphabet problems (as in Unicode strings); and small-alphabet, long-pattern problems (as in DNA searches). Since it is expressed as a generic algorithm for searching in sequences over an arbitrary type T, it is well suited for use in generic software libraries such as the C++ Standard Template Library. The algorithm was obtained by adding to the Knuth-Morris-Pratt algorithm one of the pattern-shifting techniques from the Boyer-Moore algorithm, with provision for use of hashing in this technique. In situations in which a hash function or random a...

  20. Multilocus sequence analysis (MLSA) in prokaryotic taxonomy.

    Science.gov (United States)

    Glaeser, Stefanie P; Kämpfer, Peter

    2015-06-01

    To obtain a higher resolution of the phylogenetic relationships of species within a genus or genera within a family, multilocus sequence analysis (MLSA) is currently a widely used method. In MLSA studies, partial sequences of genes coding for proteins with conserved functions ('housekeeping genes') are used to generate phylogenetic trees and subsequently deduce phylogenies. However, MLSA is not only suggested as a phylogenetic tool to support and clarify the resolution of bacterial species with a higher resolution, as in 16S rRNA gene-based studies, but has also been discussed as a replacement for DNA-DNA hybridization (DDH) in species delineation. Nevertheless, despite the fact that MLSA has become an accepted and widely used method in prokaryotic taxonomy, no common generally accepted recommendations have been devised to date for either the whole area of microbial taxonomy or for taxa-specific applications of individual MLSA schemes. The different ways MLSA is performed can vary greatly for the selection of genes, their number, and the calculation method used when comparing the sequences obtained. Here, we provide an overview of the historical development of MLSA and critically review its current application in prokaryotic taxonomy by highlighting the advantages and disadvantages of the method's numerous variations. This provides a perspective for its future use in forthcoming genome-based genotypic taxonomic analyses.