WorldWideScience

Sample records for abrf edman sequencing

  1. Broad coverage identification of multiple proteolytic cleavage site sequences in complex high molecular weight proteins using quantitative proteomics as a complement to edman sequencing.

    Science.gov (United States)

    Doucet, Alain; Overall, Christopher M

    2011-05-01

    Proteolytic processing modifies the pleiotropic functions of many large, complex, and modular proteins and can generate cleavage products with new biological activity. The identification of exact proteolytic cleavage sites in the extracellular matrix laminins, fibronectin, and other extracellular matrix proteins is not only important for understanding protein turnover but is needed for the identification of new bioactive cleavage products. Several such products have recently been recognized that are suggested to play important cellular regulatory roles in processes, including angiogenesis. However, identifying multiple cleavage sites in extracellular matrix proteins and other large proteins is challenging as N-terminal Edman sequencing of multiple and often closely spaced cleavage fragments on SDS-PAGE gels is difficult, thus limiting throughput and coverage. We developed a new liquid chromatography-mass spectrometry approach we call amino-terminal oriented mass spectrometry of substrates (ATOMS) for the N-terminal identification of protein cleavage fragments in solution. ATOMS utilizes efficient and low cost dimethylation isotopic labeling of original N-terminal and proteolytically generated N termini of protein cleavage fragments followed by quantitative tandem mass spectrometry analysis. Being a peptide-centric approach, ATOMS is not dependent on the SDS-PAGE resolution limits for protein fragments of similar mass. We demonstrate that ATOMS reliably identifies multiple proteolytic sites per reaction in complex proteins. Fifty-five neutrophil elastase cleavage sites were identified in laminin-1 and fibronectin-1 with 34 more identified by matrix metalloproteinase cleavage. Hence, our degradomics approach offers a complimentary alternative to Edman sequencing with broad applicability in identifying N termini such as cleavage sites in complex high molecular weight extracellular matrix proteins after in vitro cleavage assays. ATOMS can therefore be useful in

  2. Broad Coverage Identification of Multiple Proteolytic Cleavage Site Sequences in Complex High Molecular Weight Proteins Using Quantitative Proteomics as a Complement to Edman Sequencing*

    OpenAIRE

    Doucet, Alain; Christopher M Overall

    2010-01-01

    Proteolytic processing modifies the pleiotropic functions of many large, complex, and modular proteins and can generate cleavage products with new biological activity. The identification of exact proteolytic cleavage sites in the extracellular matrix laminins, fibronectin, and other extracellular matrix proteins is not only important for understanding protein turnover but is needed for the identification of new bioactive cleavage products. Several such products have recently been recognized t...

  3. The 2012/2013 ABRF Proteomic Research Group Study: Assessing Longitudinal Intralaboratory Variability in Routine Peptide Liquid Chromatography Tandem Mass Spectrometry Analyses.

    Science.gov (United States)

    Bennett, Keiryn L; Wang, Xia; Bystrom, Cory E; Chambers, Matthew C; Andacht, Tracy M; Dangott, Larry J; Elortza, Félix; Leszyk, John; Molina, Henrik; Moritz, Robert L; Phinney, Brett S; Thompson, J Will; Bunger, Maureen K; Tabb, David L

    2015-12-01

    Questions concerning longitudinal data quality and reproducibility of proteomic laboratories spurred the Protein Research Group of the Association of Biomolecular Resource Facilities (ABRF-PRG) to design a study to systematically assess the reproducibility of proteomic laboratories over an extended period of time. Developed as an open study, initially 64 participants were recruited from the broader mass spectrometry community to analyze provided aliquots of a six bovine protein tryptic digest mixture every month for a period of nine months. Data were uploaded to a central repository, and the operators answered an accompanying survey. Ultimately, 45 laboratories submitted a minimum of eight LC-MSMS raw data files collected in data-dependent acquisition (DDA) mode. No standard operating procedures were enforced; rather the participants were encouraged to analyze the samples according to usual practices in the laboratory. Unlike previous studies, this investigation was not designed to compare laboratories or instrument configuration, but rather to assess the temporal intralaboratory reproducibility. The outcome of the study was reassuring with 80% of the participating laboratories performing analyses at a medium to high level of reproducibility and quality over the 9-month period. For the groups that had one or more outlying experiments, the major contributing factor that correlated to the survey data was the performance of preventative maintenance prior to the LC-MSMS analyses. Thus, the Protein Research Group of the Association of Biomolecular Resource Facilities recommends that laboratories closely scrutinize the quality control data following such events. Additionally, improved quality control recording is imperative. This longitudinal study provides evidence that mass spectrometry-based proteomics is reproducible. When quality control measures are strictly adhered to, such reproducibility is comparable among many disparate groups. Data from the study are

  4. Complete amino acid sequence of the myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani.

    Science.gov (United States)

    Jones, B N; Wang, C C; Dwulet, F E; Lehman, L D; Meuth, J L; Bogardt, R A; Gurd, F R

    1979-04-25

    The complete amino acid sequence of the major component myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani, was determined by the automated Edman degradation of several large peptides obtained by specific cleavage of the protein. The acetimidated apomyoglobin was selectively cleaved at its two methionyl residues with cyanogen bromide and at its three arginyl residues by trypsin. By subjecting four of these peptides and the apomyoglobin to automated Edman degradation, over 80% of the primary structure of the protein was obtained. The remainder of the covalent structure was determined by the sequence analysis of peptides that resulted from further digestion of the central cyanogen bromide fragment. This fragment was cleaved at its glutamyl residues with staphylococcal protease and its lysyl residues with trypsin. The action of trypsin was restricted to the lysyl residues by chemical modification of the single arginyl residue of the fragment with 1,2-cyclohexanedione. The primary structure of this myoglobin proved to be identical with that from the Atlantic bottlenosed dolphin and Pacific common dolphin but differs from the myoglobins of the killer whale and pilot whale at two positions. The above sequence identities and differences reflect the close taxonomic relationship of these five species of Cetacea. PMID:454657

  5. cDNA and deduced amino acid sequence of human pulmonary surfactant-associated proteolipid SPL(Phe).

    OpenAIRE

    Glasser, S W; Korfhagen, T R; Weaver, T.; Pilot-Matias, T; Fox, J L; Whitsett, J A

    1987-01-01

    Hydrophobic surfactant-associated protein of Mr 6000-14,000 was isolated from ether/ethanol or chloroform/methanol extracts of mammalian pulmonary surfactant. Automated Edman degradation in a gas-phase sequencer showed the major N-terminus of the human low molecular weight protein to be Phe-Pro-Ile-Pro-Leu-Pro-Tyr-Cys-Trp-Leu-Cys-Arg-Ala-Leu-. Because of the N-terminal phenylalanine, the surfactant protein was designated SPL(Phe). Antiserum generated against hydrophobic surfactant protein(s) ...

  6. Sequence homology between the subunits of two immunologically and functionally distinct types of fimbriae of Actinomyces spp.

    OpenAIRE

    Yeung, M K; Cisar, J O

    1990-01-01

    Nucleotide sequencing of the type 1 fimbrial subunit gene of Actinomyces viscosus T14V revealed a consensus ribosome-binding site followed by an open reading frame of 1,599 nucleotides. The encoded protein of 533 amino acids (Mr = 56,899) was predominantly hydrophilic except for an amino-terminal signal peptide and a carboxy-terminal region identified as a potential membrane-spanning segment. Edman degradation of the cloned protein expressed in Escherichia coli and the type 1 fimbriae of A. v...

  7. Complete amino acid sequence and structure characterization of the taste-modifying protein, miraculin.

    Science.gov (United States)

    Theerasilp, S; Hitotsuya, H; Nakajo, S; Nakaya, K; Nakamura, Y; Kurihara, Y

    1989-04-25

    The taste-modifying protein, miraculin, has the unusual property of modifying sour taste into sweet taste. The complete amino acid sequence of miraculin purified from miracle fruits by a newly developed method (Theerasilp, S., and Kurihara, Y. (1988) J. Biol. Chem. 263, 11536-11539) was determined by an automatic Edman degradation method. Miraculin was a single polypeptide with 191 amino acid residues. The calculated molecular weight based on the amino acid sequence and the carbohydrate content (13.9%) was 24,600. Asn-42 and Asn-186 were linked N-glycosidically to carbohydrate chains. High homology was found between the amino acid sequences of miraculin and soybean trypsin inhibitor. PMID:2708331

  8. Protein chemotaxonomy. XIII. Amino acid sequence of ferredoxin from Panax ginseng.

    Science.gov (United States)

    Mino, Yoshiki

    2006-08-01

    The complete amino acid sequence of [2Fe-2S] ferredoxin from Panax ginseng (Araliaceae) has been determined by automated Edman degradation of the entire S-carboxymethylcysteinyl protein and of the peptides obtained by enzymatic digestion. This ferredoxin has a unique amino acid sequence, which includes an insertion of Tyr at the 3rd position from the amino-terminus and a deletion of two amino acid residues at the carboxyl terminus. This ferredoxin had 18 differences in its amino acid sequence compared to that of Petroselinum sativum (Umbelliferae). In contrast, 23-33 differences were observed compared to other dicotyledonous plants. This suggests that Panax ginseng is related taxonomically to umbelliferous plants. PMID:16880642

  9. Amino acid sequences of neuropeptides in the sinus gland of the land crab Cardisoma carnifex: a novel neuropeptide proteolysis site.

    Science.gov (United States)

    Newcomb, R W

    1987-08-01

    The sinus gland is a major neurosecretory structure in Crustacea. Five peptides, labeled C, D, E, F, and I, isolated from the sinus gland of the land crab have been hypothesized to arise from the incomplete proteolysis at two internal sites on a single biosynthetic intermediate peptide "H", based on amino acid composition additivities and pulse-chase radiolabeling studies. The presence of only a single major precursor for the sinus gland peptides implies that peptide H may be synthesized on a common precursor with crustacean hyperglycemic hormone forms, "J" and "L," and a peptide, "K," similar to peptides with molt inhibiting activity. Here I report amino acid sequences of these peptides. The amino terminal sequence of the parent peptide, H, (and the homologous fragments) proved refractory to Edman degradation. Data from amino acid analysis and carboxypeptidase digestion of the naturally occurring fragments and of fragments produced by endopeptidase digestion were used together with Edman degradation to obtain the sequences. Amino acid analysis of fragments of the naturally occurring "overlap" peptides (those produced by internal cleavage at one site on H) was used to obtain the sequences across the cleavage sites. The amino acid sequence of the land crab peptide H is Arg-Ser-Ala-Asp-Gly-Phe-Gly-Arg-Met-Glu-Ser-Leu-Leu-Thr-Ser-Leu-Arg-Gly- Ser-Ala-Glu- Ser-Pro-Ala-Ala-Leu-Gly-Glu-Ala-Ser-Ala-Ala-His-Pro-Leu-Glu. In vivo cleavage at one site involves excision of arginine from the sequence Leu-Arg-Gly, whereas cleavage at the other site involves excision of serine from the sequence Glu-Ser-Leu. Proteolysis at the latter sequence has not been previously reported in intact secretory granules. The aspartate at position 4 is possibly covalently modified.

  10. The amino acid sequence of Escherichia coli cyanase.

    Science.gov (United States)

    Chin, C C; Anderson, P M; Wold, F

    1983-01-10

    The amino acid sequence of the enzyme cyanase (cyanate hydrolase) from Escherichia coli has been determined by automatic Edman degradation of the intact protein and of its component peptides. The primary peptides used in the sequencing were produced by cyanogen bromide cleavage at the methionine residues, yielding 4 peptides plus free homoserine from the NH2-terminal methionine, and by trypsin cleavage at the 7 arginine residues after acetylation of the lysines. Secondary peptides required for overlaps and COOH-terminal sequences were produced by chymotrypsin or clostripain cleavage of some of the larger peptides. The complete sequence of the cyanase subunit consists of 156 amino acid residues (Mr 16,350). Based on the observation that the cysteine-containing peptide is obtained as a disulfide-linked dimer, it is proposed that the covalent structure of cyanase is made up of two subunits linked by a disulfide bond between the single cystine residue in each subunit. The native enzyme (Mr 150,000) then appears to be a complex of four or five such subunit dimers.

  11. Two distinct ferredoxins from Rhodobacter capsulatus: complete amino acid sequences and molecular evolution.

    Science.gov (United States)

    Saeki, K; Suetsugu, Y; Yao, Y; Horio, T; Marrs, B L; Matsubara, H

    1990-09-01

    Two distinct ferredoxins were purified from Rhodobacter capsulatus SB1003. Their complete amino acid sequences were determined by a combination of protease digestion, BrCN cleavage and Edman degradation. Ferredoxins I and II were composed of 64 and 111 amino acids, respectively, with molecular weights of 6,728 and 12,549 excluding iron and sulfur atoms. Both contained two Cys clusters in their amino acid sequences. The first cluster of ferredoxin I and the second cluster of ferredoxin II had a sequence, CxxCxxCxxxCP, in common with the ferredoxins found in Clostridia. The second cluster of ferredoxin I had a sequence, CxxCxxxxxxxxCxxxCM, with extra amino acids between the second and third Cys, which has been reported for other photosynthetic bacterial ferredoxins and putative ferredoxins (nif-gene products) from nitrogen-fixing bacteria, and with a unique occurrence of Met. The first cluster of ferredoxin II had a CxxCxxxxCxxxCP sequence, with two additional amino acids between the second and third Cys, a characteristics feature of Azotobacter-[3Fe-4S] [4Fe-4S]-ferredoxin. Ferredoxin II was also similar to Azotobacter-type ferredoxins with an extended carboxyl (C-) terminal sequence compared to the common Clostridium-type. The evolutionary relationship of the two together with a putative one recently found to be encoded in nifENXQ region in this bacterium [Moreno-Vivian et al. (1989) J. Bacteriol. 171, 2591-2598] is discussed. PMID:2277040

  12. Localization of an O-glycosylated site in the recombinant barley alpha-amylase 1 produced in yeast and correction of the amino acid sequence using matrix-assisted laser desorption/ionization mass spectrometry of peptide mixtures

    DEFF Research Database (Denmark)

    Andersen, Jens S.; Søgaard, M; Svensson, B;

    1994-01-01

    , and analyzed directly by MALDI-MS. Based on the three mass spectrometric peptide maps, an error in the sequence deduced from cDNA, resulting in a mass difference of 28 Da, was located to a sequence stretch of 5 amino acid residues; furthermore, a dihexose substituent was identified on Thr410. Subsequent Edman...... degradation of two selected peptides isolated from the endoproteinase Lys-C digest corrected the sequence to be Val instead of Ala in position 284 and confirmed the O-glycosylation. These results demonstrate that the direct peptide mixture analysis by MALDI-MS is a rapid and sensitive method for protein......Matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) of peptide mixtures was used to characterize recombinant barley alpha-amylase 1, produced in yeast. Three peptide mixtures were generated by cleavage with CNBr, digestion with endoproteinase Lys-C and Asp-N, respectively...

  13. Automatic sequences

    CERN Document Server

    Haeseler, Friedrich

    2003-01-01

    Automatic sequences are sequences which are produced by a finite automaton. Although they are not random they may look as being random. They are complicated, in the sense of not being not ultimately periodic, they may look rather complicated, in the sense that it may not be easy to name the rule by which the sequence is generated, however there exists a rule which generates the sequence. The concept automatic sequences has special applications in algebra, number theory, finite automata and formal languages, combinatorics on words. The text deals with different aspects of automatic sequences, in particular:· a general introduction to automatic sequences· the basic (combinatorial) properties of automatic sequences· the algebraic approach to automatic sequences· geometric objects related to automatic sequences.

  14. CSTX-9, a toxic peptide from the spider Cupiennius salei: amino acid sequence, disulphide bridge pattern and comparison with other spider toxins containing the cystine knot structure.

    Science.gov (United States)

    Schalle, J; Kämpfer, U; Schürch, S; Kuhn-Nentwig, L; Haeberli, S; Nentwig, W

    2001-09-01

    CSTX-9 (68 residues, 7530.9 Da) is one of the most abundant toxic polypeptides in the venom of the wandering spider Cupiennius salei. The amino acid sequence was determined by Edman degradation using reduced and alkylated CSTX-9 and peptides generated by cleavages with endoproteinase Asp-N and trypsin, respectively. Sequence comparison with CSTX-1, the most abundant and the most toxic polypeptide in the crude spider venom, revealed a high degree of similarity (53% identity). By means of limited proteolysis with immobilised trypsin and RP-HPLC, the cystine-containing peptides of CSTX-9 were isolated and the disulphide bridges were assigned by amino acid analysis, Edman degradation and nanospray tandem mass spectrometry. The four disulphide bonds present in CSTX-9 are arranged in the following pattern: 1-4, 2-5, 3-8 and 6-7 (Cys6-Cys21, Cys13-Cys30, Cys20-Cys48, Cys32-Cys46). Sequence comparison of CSTX-1 with CSTX-9 clearly indicates the same disulphide bridge pattern, which is also found in other spider polypeptide toxins, e.g. agatoxins (omega-AGA-IVA, omega-AGA-IVB, mu-AGA-I and mu-AGA-VI) from Agelenopsis aperta, SNX-325 from Segestria florentina and curtatoxins (CT-I, CT-II and CT-III) from Hololena curta. CSTX-1/CSTX-9 belong to the family of ion channel toxins containing the inhibitor cystine knot structural motif. CSTX-9, lacking the lysine-rich C-terminal tail of CSTX-1, exhibits a ninefold lower toxicity to Drosophila melanogaster than CSTX-1. This is in accordance with previous observations of CSTX-2a and CSTX-2b, two truncated forms of CSTX-1 which, like CSTX-9, also lack the C-terminal lysine-rich tail. PMID:11693532

  15. Isolation, characterization and cDNA sequencing of a Kazal family proteinase inhibitor from seminal plasma of turkey (Meleagris gallopavo).

    Science.gov (United States)

    Słowińska, Mariola; Olczak, Mariusz; Wojtczak, Mariola; Glogowski, Jan; Jankowski, Jan; Watorek, Wiesław; Amarowicz, Ryszard; Ciereszko, Andrzej

    2008-06-01

    The turkey reproductive tract and seminal plasma contain a serine proteinase inhibitor that seems to be unique for the reproductive tract. Our experimental objective was to isolate, characterize and cDNA sequence the Kazal family proteinase inhibitor from turkey seminal plasma and testis. Seminal plasma contains two forms of a Kazal family inhibitor: virgin (Ia) represented by an inhibitor of moderate electrophoretic migration rate (present also in the testis) and modified (Ib, a split peptide bond) represented by an inhibitor with a fast migration rate. The inhibitor from the seminal plasma was purified by affinity, ion-exchange and reverse phase chromatography. The testis inhibitor was purified by affinity and ion-exchange chromatography. N-terminal Edman sequencing of the two seminal plasma inhibitors and testis inhibitor were identical. This sequence was used to construct primers and obtain a cDNA sequence from the testis. Analysis of a cDNA sequence indicated that turkey proteinase inhibitor belongs to Kazal family inhibitors (pancreatic secretory trypsin inhibitors, mammalian acrosin inhibitors) and caltrin. The turkey seminal plasma Kazal inhibitor belongs to low molecular mass inhibitors and is characterized by a high value of the equilibrium association constant for inhibitor/trypsin complexes.

  16. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  17. Sequence assembly

    DEFF Research Database (Denmark)

    Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria;

    2009-01-01

    and plays an important role in processing the information generated by these methods. Here, we provide a comprehensive overview of the current publicly available sequence assembly programs. We describe the basic principles of computational assembly along with the main concerns, such as repetitive sequences...

  18. Dna Sequencing

    Science.gov (United States)

    Tabor, Stanley; Richardson, Charles C.

    1995-04-25

    A method for sequencing a strand of DNA, including the steps off: providing the strand of DNA; annealing the strand with a primer able to hybridize to the strand to give an annealed mixture; incubating the mixture with four deoxyribonucleoside triphosphates, a DNA polymerase, and at least three deoxyribonucleoside triphosphates in different amounts, under conditions in favoring primer extension to form nucleic acid fragments complementory to the DNA to be sequenced; labelling the nucleic and fragments; separating them and determining the position of the deoxyribonucleoside triphosphates by differences in the intensity of the labels, thereby to determine the DNA sequence.

  19. De novo sequencing of two novel peptides homologous to calcitonin-like peptides, from skin secretion of the Chinese Frog, Odorrana schmackeri

    Directory of Open Access Journals (Sweden)

    Geisa P.C. Evaristo

    2015-09-01

    Full Text Available An MS/MS based analytical strategy was followed to solve the complete sequence of two new peptides from frog (Odorrana schmackeri skin secretion. This involved reduction and alkylation with two different alkylating agents followed by high resolution tandem mass spectrometry. De novo sequencing was achieved by complementary CID and ETD fragmentations of full-length peptides and of selected tryptic fragments. Heavy and light isotope dimethyl labeling assisted with annotation of sequence ion series. The identified primary structures are GCD[I/L]STCATHN[I/L]VNE[I/L]NKFDKSKPSSGGVGPESP-NH2 and SCNLSTCATHNLVNELNKFDKSKPSSGGVGPESF-NH2, i.e. two carboxyamidated 34 residue peptides with an aminoterminal intramolecular ring structure formed by a disulfide bridge between Cys2 and Cys7. Edman degradation analysis of the second peptide positively confirmed the exact sequence, resolving I/L discriminations. Both peptide sequences are novel and share homology with calcitonin, calcitonin gene related peptide (CGRP and adrenomedullin from other vertebrates. Detailed sequence analysis as well as the 34 residue length of both O. schmackeri peptides, suggest they do not fully qualify as either calcitonins (32 residues or CGRPs (37 amino acids and may justify their classification in a novel peptide family within the calcitonin gene related peptide superfamily. Smooth muscle contractility assays with synthetic replicas of the S–S linked peptides on rat tail artery, uterus, bladder and ileum did not reveal myotropic activity.

  20. Sequences of metanicins, 20-residue peptaibols from the ascomycetous fungus CBS 597.80.

    Science.gov (United States)

    Kimonyo, Anastase; Brückner, Hans

    2013-05-01

    Four linear 20-residue peptaibols, named metanicins (MTCs) A-D, were isolated from submerged cultures of the ascomycetous fungus CBS 597.80. Structure elucidation was performed by a combination of fast-atom-bombardment mass spectrometry (FAB-MS), electrospray ionization MS, Edman degradation of isolated fragments, and amino acid analysis by ion-exchange and gas chromatography, and enantioselective HPLC. The sequences of MTC A(B) are (amino acid exchange in B and C in parentheses): Ac-Aib-Ala-Aib-Ala-Aib-Ala-Gln-Aib-Val-Aib-Gly-Leu-Aib-Pro-Val-Aib-Aib(D-Iva)-Gln-Gln-Pheol and of MTC C(D) Ac-Aib-Ala-Aib-Ala-Aib-Ala-Gln-Aib-Val-Aib-Gly-Leu-Aib-Pro-Val-Aib-Aib(D-Iva)-Gln-Gln-Pheol (Ac, acetyl; Aib, α-aminoisobutyric acid; Iva, isovaline; Pheol, L-phenylalaninol). The peptides are related, and some of the sequences are identical, to other 20-residue peptaibols isolated from Trichoderma species. MTCs show moderate activities against Micrococcus luteus, Enterococcus faecalis, and Staphylococcus aureus, and very low activities against Bacillus subtilis. The producer has originally been identified and deposited as Metarhizium anisopliae var. anisopliae CBS 597.80. Although this identification has been withdrawn by Centralbureau voor Schimmelcultures (CBS) in the meantime, the accession number will be retained - independently from any taxonomic revisions. PMID:23681727

  1. Main: Sequences [KOME

    Lifescience Database Archive (English)

    Full Text Available Sequences Nucleotide Sequence Nucleotide sequence of full length cDNA (trimmed sequence) kome_ine_full_seque...nce_db.fasta.zip kome_ine_full_sequence_db.zip kome_ine_full_sequence_db ...

  2. [Sequencing babies?].

    Science.gov (United States)

    Jordan, Bertrand

    2015-10-01

    An extension of newborn screening to genome sequencing is now feasible but raises a number of scientific, organisational and ethical issues. This is being explored in discussions and in several funded trials, in order to maximize benefits and avoid some identified risks. As some companies are already offering such a service, this is quite an urgent matter. PMID:26481033

  3. Novel proline-hydroxyproline glycopeptides from the dandelion (Taraxacum officinale Wigg.) flowers: de novo sequencing and biological activity.

    Science.gov (United States)

    Astafieva, Alexandra A; Enyenihi, Atim A; Rogozhin, Eugene A; Kozlov, Sergey A; Grishin, Eugene V; Odintsova, Tatyana I; Zubarev, Roman A; Egorov, Tsezi A

    2015-09-01

    Two novel homologous peptides named ToHyp1 and ToHyp2 that show no similarity to any known proteins were isolated from Taraxacum officinale Wigg. flowers by multidimensional liquid chromatography. Amino acid and mass spectrometry analyses demonstrated that the peptides have unusual structure: they are cysteine-free, proline-hydroxyproline-rich and post-translationally glycosylated by pentoses, with 5 carbohydrates in ToHyp2 and 10 in ToHyp1. The ToHyp2 peptide with a monoisotopic molecular mass of 4350.3Da was completely sequenced by a combination of Edman degradation and de novo sequencing via top down multistage collision induced dissociation (CID) and higher energy dissociation (HCD) tandem mass spectrometry (MS(n)). ToHyp2 consists of 35 amino acids, contains eighteen proline residues, of which 8 prolines are hydroxylated. The peptide displays antifungal activity and inhibits growth of Gram-positive and Gram-negative bacteria. We further showed that carbohydrate moieties have no significant impact on the peptide structure, but are important for antifungal activity although not absolutely necessary. The deglycosylated ToHyp2 peptide was less active against the susceptible fungus Bipolaris sorokiniana than the native peptide. Unique structural features of the ToHyp2 peptide place it into a new family of plant defense peptides. The discovery of ToHyp peptides in T. officinale flowers expands the repertoire of molecules of plant origin with practical applications.

  4. Main: Sequences [KOME

    Lifescience Database Archive (English)

    Full Text Available Sequences Amino Acid Sequence Amino Acid sequence of full length cDNA (Longest ORF) kome_ine_full_sequence..._amino_db.fasta.zip kome_ine_full_sequence_amino_db.zip kome_ine_full_sequence_amino_db ...

  5. Amino acid sequence and posttranslational modifications of human factor VIIa from plasma and transfected baby hamster kidney cells

    International Nuclear Information System (INIS)

    Blood coagulation factor VII is a vitamin K dependent glycoprotein which in its activated form, factor VIIa, participates in the coagulation process by activating factor X and/or factor IX in the presence of Ca2+ and tissue factor. Three types of potential posttranslational modifications exist in the human factor VIIa molecule, namely, 10 γ-carboxylated, N-terminally located glutamic acid residues, 1 β-hydroxylated aspartic acid residue, and 2 N-glycosylated asparagine residues. In the present study, the amino acid sequence and posttranslational modifications of recombinant factor VIIa as purified from the culture medium of a transfected baby hamster kidney cell line have been compared to human plasma factor VIIa. By use of HPLC, amino acid analysis, peptide mapping, and automated Edman degradation, the protein backbone of recombinant factor VIIa was found to be identical with human factor VIIa. Asparagine residues 145 and 322 were found to be fully N-glycosylated in human plasma factor VIIa. In the recombinant factor VIIa, asparagine residue 322 was fully glycosylated whereas asparagine residue 145 was only partially (approximately 66%) glycosylated. Besides minor differences in the sialic acid and fucose contents, the overall carbohydrate compositions were nearly identical in recombinant factor VIIa and human plasma factor VIIa. These results show that factor VIIa as produced in the transfected baby hamster kidney cells is very similar to human plasma factor VIIa and that this cell line thus might represent an alternative source for human factor VIIa

  6. Purification and amino acid sequence of a bacteriocins produced by Lactobacillus salivarius K7 isolated from chicken intestine

    Directory of Open Access Journals (Sweden)

    Kenji Sonomoto

    2006-03-01

    Full Text Available A bacteriocin-producing strain, Lactobacillus K7, was isolated from a chicken intestine. The inhibitory activity was determined by spot-on-lawn technique. Identification of the strain was performed by morphological, biochemical (API 50 CH kit and molecular genetic (16S rDNA basis. Bacteriocin purification processes were carried out by amberlite adsorption, cation exchange and reverse-phase high perform- ance liquid chromatography. N-terminal amino acid sequences were performed by Edman degradation. Molecular mass was determined by electrospray-ionization (ESI mass spectrometry (MS. Lactobacillus K7 showed inhibitory activity against Lactobacillus sakei subsp. sakei JCM 1157T, Leuconostoc mesenteroides subsp. mesenteroides JCM 6124T and Bacillus coagulans JCM 2257T. This strain was identified as Lb. salivarius. The antimicrobial substance was destroyed by proteolytic enzymes, indicating its proteinaceous structure designated as a bacteriocin type. The purification of bacteriocin by amberlite adsorption, cation exchange, and reverse-phase chromatography resulted in only one single active peak, which was designated FK22. Molecular weight of this fraction was 4331.70 Da. By amino acid sequence, this peptide was homology to Abp 118 beta produced by Lb. salivarius UCC118. In addition, Lb. salivarius UCC118 produced 2-peptide bacteriocin, which was Abp 118 alpha and beta. Based on the partial amino acid sequences of Abp 118 beta, specific primers were designed from nucleotide sequences according to data from GenBank. The result showed that the deduced peptide was high homology to 2-peptide bacteriocin, Abp 118 alpha and beta.

  7. Classifying Genomic Sequences by Sequence Feature Analysis

    Institute of Scientific and Technical Information of China (English)

    Zhi-Hua Liu; Dian Jiao; Xiao Sun

    2005-01-01

    Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream,exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis.

  8. Multi-task Sequence to Sequence Learning

    OpenAIRE

    Luong, Minh-Thang; Le, Quoc V.; Sutskever, Ilya; Vinyals, Oriol; Kaiser, Lukasz

    2015-01-01

    Sequence to sequence learning has recently emerged as a new paradigm in supervised learning. To date, most of its applications focused on only one task and not much work explored this framework for multiple tasks. This paper examines three multi-task learning (MTL) settings for sequence to sequence models: (a) the oneto-many setting - where the encoder is shared between several tasks such as machine translation and syntactic parsing, (b) the many-to-one setting - useful when only the decoder ...

  9. Shotgun protein sequencing.

    Energy Technology Data Exchange (ETDEWEB)

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  10. Fibonacci-Like Sequence

    OpenAIRE

    Shikha Bhatnagar; Bijendra Singh; Omprakash Sikhwal

    2013-01-01

    The Fibonacci and Lucas sequences are well-known examples of second order recurrence sequences, which belong to particular class of recursive sequences. In this article, Fibonacci-Like sequence is introduced and defined by The Binet’s formula and generating function of Fibonacci-Like sequence are presented with some identities and connection formulae.

  11. Whole Genome Sequencing

    Science.gov (United States)

    ... you want to learn. Search form Search Whole Genome Sequencing You are here Home Testing & Services Testing ... the full story, click here . What is whole genome sequencing? Whole genome sequencing is the mapping out ...

  12. Multimodal sequence learning.

    Science.gov (United States)

    Kemény, Ferenc; Meier, Beat

    2016-02-01

    While sequence learning research models complex phenomena, previous studies have mostly focused on unimodal sequences. The goal of the current experiment is to put implicit sequence learning into a multimodal context: to test whether it can operate across different modalities. We used the Task Sequence Learning paradigm to test whether sequence learning varies across modalities, and whether participants are able to learn multimodal sequences. Our results show that implicit sequence learning is very similar regardless of the source modality. However, the presence of correlated task and response sequences was required for learning to take place. The experiment provides new evidence for implicit sequence learning of abstract conceptual representations. In general, the results suggest that correlated sequences are necessary for implicit sequence learning to occur. Moreover, they show that elements from different modalities can be automatically integrated into one unitary multimodal sequence.

  13. Sequence Read Archive (SRA)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome...

  14. Consensus Sequence Zen

    OpenAIRE

    Schneider, Thomas D.

    2002-01-01

    Consensus sequences are widely used in molecular biology but they have many flaws. As a result, binding sites of proteins and other molecules are missed during studies of genetic sequences and important biological effects cannot be seen. Information theory provides a mathematically robust way to avoid consensus sequences. Instead of using consensus sequences, sequence conservation can be quantitatively presented in bits of information by using sequence logo graphics to repre...

  15. The generalized quaternion sequence

    Science.gov (United States)

    Deveci, Ömür

    2016-04-01

    In this work, we define the recurrence sequence by using the relation matrix of the generalized quaternion group and then, we obtain miscellaneous properties of this sequence. Also, we obtain the cyclic groups and the semigroups which are produced by generating matrix of the sequence defined when read modulo m. Furthermore, we study this sequence modulo m, and then we derive the relationship among the order the cyclic groups obtained and the periods of the sequence defined.

  16. Polynomially Bounded Sequences and Polynomial Sequences

    Directory of Open Access Journals (Sweden)

    Okazaki Hiroyuki

    2015-09-01

    Full Text Available In this article, we formalize polynomially bounded sequences that plays an important role in computational complexity theory. Class P is a fundamental computational complexity class that contains all polynomial-time decision problems [11], [12]. It takes polynomially bounded amount of computation time to solve polynomial-time decision problems by the deterministic Turing machine. Moreover we formalize polynomial sequences [5].

  17. Production, purification, sequencing and activity spectra of mutacins D-123.1 and F-59.1

    Directory of Open Access Journals (Sweden)

    LaPointe Gisèle

    2011-04-01

    Full Text Available Abstract Background The increase in bacterial resistance to antibiotics impels the development of new anti-bacterial substances. Mutacins (bacteriocins are small antibacterial peptides produced by Streptococcus mutans showing activity against bacterial pathogens. The objective of the study was to produce and characterise additional mutacins in order to find new useful antibacterial substances. Results Mutacin F-59.1 was produced in liquid media by S. mutans 59.1 while production of mutacin D-123.1 by S. mutans 123.1 was obtained in semi-solid media. Mutacins were purified by hydrophobic chromatography. The amino acid sequences of the mutacins were obtained by Edman degradation and their molecular mass was determined by mass spectrometry. Mutacin F-59.1 consists of 25 amino acids, containing the YGNGV consensus sequence of pediocin-like bacteriocins with a molecular mass calculated at 2719 Da. Mutacin D-123.1 has an identical molecular mass (2364 Da with the same first 9 amino acids as mutacin I. Mutacins D-123.1 and F-59.1 have wide activity spectra inhibiting human and food-borne pathogens. The lantibiotic mutacin D-123.1 possesses a broader activity spectrum than mutacin F-59.1 against the bacterial strains tested. Conclusion Mutacin F-59.1 is the first pediocin-like bacteriocin identified and characterised that is produced by Streptococcus mutans. Mutacin D-123.1 appears to be identical to mutacin I previously identified in different strains of S. mutans.

  18. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  19. Contamination of sequence databases with adaptor sequences

    Energy Technology Data Exchange (ETDEWEB)

    Yoshikawa, Takeo; Sanders, A.R.; Detera-Wadleigh, S.D. [National Institute of Mental Health, Bethesda, MD (United States)

    1997-02-01

    Because of the exponential increase in the amount of DNA sequences being added to the public databases on a daily basis, it has become imperative to identify sources of contamination rapidly. Previously, contaminations of sequence databases have been reported to alert the scientific community to the problem. These contaminations can be divided into two categories. The first category comprises host sequences that have been difficult for submitters to manage or control. Examples include anomalous sequences derived from Escherichia coli, which are inserted into the chromosomes (and plasmids) of the bacterial hosts. Insertion sequences are highly mobile and are capable of transposing themselves into plasmids during cloning manipulation. Another example of the first category is the infection with yeast genomic DNA or with bacterial DNA of some commercially available cDNA libraries from Clontech. The second category of database contamination is due to the inadvertent inclusion of nonhost sequences. This category includes incorporation of cloning-vector sequences and multicloning sites in the database submission. M13-derived artifacts have been common, since M13-based vectors have been widely used for subcloning DNA fragments. Recognizing this problem, the National Center for Biotechnology Information (NCBI) started to screen, in April 1994, all sequences directly submitted to GenBank, against a set of vector data retrieved from GenBank by use of key-word searches, such as {open_quotes}vector.{close_quotes} In this report, we present evidence for another sequence artifact that is widespread but that, to our knowledge, has not yet been reported. 11 refs., 1 tab.

  20. Anomaly Detection in Sequences

    Data.gov (United States)

    National Aeronautics and Space Administration — We present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that...

  1. DNA sequencing conference, 2

    Energy Technology Data Exchange (ETDEWEB)

    Cook-Deegan, R.M. [Georgetown Univ., Kennedy Inst. of Ethics, Washington, DC (United States); Venter, J.C. [National Inst. of Neurological Disorders and Strokes, Bethesda, MD (United States); Gilbert, W. [Harvard Univ., Cambridge, MA (United States); Mulligan, J. [Stanford Univ., CA (United States); Mansfield, B.K. [Oak Ridge National Lab., TN (United States)

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  2. sequenceMiner algorithm

    Data.gov (United States)

    National Aeronautics and Space Administration — Detecting and describing anomalies in large repositories of discrete symbol sequences. sequenceMiner has been open-sourced! Download the file below to try it out....

  3. Sequence information signal processor

    Science.gov (United States)

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1999-01-01

    An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.

  4. Roles of repetitive sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bell, G.I.

    1991-12-31

    The DNA of higher eukaryotes contains many repetitive sequences. The study of repetitive sequences is important, not only because many have important biological function, but also because they provide information on genome organization, evolution and dynamics. In this paper, I will first discuss some generic effects that repetitive sequences will have upon genome dynamics and evolution. In particular, it will be shown that repetitive sequences foster recombination among, and turnover of, the elements of a genome. I will then consider some examples of repetitive sequences, notably minisatellite sequences and telomere sequences as examples of tandem repeats, without and with respectively known function, and Alu sequences as an example of interspersed repeats. Some other examples will also be considered in less detail.

  5. DNA sequences encoding erythropoietin

    Energy Technology Data Exchange (ETDEWEB)

    Lin, F.K.

    1987-10-27

    A purified and isolated DNA sequence is described consisting essentially of a DNA sequence encoding a polypeptide having an amino acid sequence sufficiently duplicative of that of erythropoietin to allow possession of the biological property of causing bone marrow cells to increase production of reticulocytes and red blood cells, and to increase hemoglobin synthesis or iron uptake.

  6. On Asymptotically Orthonormal Sequences

    OpenAIRE

    Fricain, Emmanuel; Rupam, Rishika

    2016-01-01

    An asymptotically orthonormal sequence is a sequence which is 'nearly' orthonormal in the sense that it satisfies the Parseval equality up to two constants close to one. In this paper, we explore such sequences formed by normalized reproducing kernels of model spaces and de Branges Rovnyak spaces.

  7. Low autocorrelation binary sequences

    Science.gov (United States)

    Packebusch, Tom; Mertens, Stephan

    2016-04-01

    Binary sequences with minimal autocorrelations have applications in communication engineering, mathematics and computer science. In statistical physics they appear as groundstates of the Bernasconi model. Finding these sequences is a notoriously hard problem, that so far can be solved only by exhaustive search. We review recent algorithms and present a new algorithm that finds optimal sequences of length N in time O(N {1.73}N). We computed all optimal sequences for N≤slant 66 and all optimal skewsymmetric sequences for N≤slant 119.

  8. Repdigits in -Lucas Sequences

    Indian Academy of Sciences (India)

    Jhon J J Bravo; Florian Luca

    2014-05-01

    For an integer ≥ 2, let $(L_n^{(k)})_n$ be the -Lucas sequence which starts with $0,\\ldots,0,2,1$ ( terms) and each term afterwards is the sum of the preceding terms. In 2000, Luca (Port. Math. 57(2) 2000 243-254) proved that 11 is the largest number with only one distinct digit (the so-called repdigit) in the sequence $(L_n^{(2)})_n$. In this paper, we address a similar problem in the family of -Lucas sequences. We also show that the -Lucas sequences have similar properties to those of -Fibonacci sequences and occur in formulae simultaneously with the latter.

  9. cDNA and deduced amino acid sequence of human pulmonary surfactant-associated proteolipid SPL(Phe)

    Energy Technology Data Exchange (ETDEWEB)

    Glasser, S.W.; Korfhagen, T.R.; Weaver, T.; Pilot-Matias, T.; Fox, J.L.; Whitsett, J.A.

    1987-06-01

    Hydrophobic surfactant-associated protein of M/sub r/ 6000-14,000 was isolated from either/ethanol or chloroform/methanol extracts of mammalian pulmonary surfactant. Automated Edman degradation in a gas-phase sequencer showed the major N-terminus of the human low molecular weight protein to be Phe-Pro-Ile-Pro-Leu-Pro-Try-Cys-Trp-Leu-Cys-Arg-Ala-Leu-. Because of the N-terminal phenylalanine, the surfactant protein was designated SPL(Phe). Antiserum generated against hydrophobic surfactant protein(s) from bovine pulmonary surfactant recognized protein of M/sub r/ 6000-14,000 in immunoblot analysis and was used to screen a lambdagt11 expression library constructed from adult human lung poly(A)/sup +/ RNA. This resulted in identification of a 1.4-kilobase cDNA clone that was shown to encode the N-terminus of the surfactant polypeptide SPL(Phe) (Phe-Pro-Ile-Pro-Leu-Pro-) within an open reading frame for a larger protein. Expression of a fused ..beta..-galactosidase-SPL (Phe) gene in Escherichia coli yielded an immunoreactive M/sub r/ 34,000 fusion peptide. Hybrid-arrested translation with the cDNA and immunoprecipitation of (/sup 35/S)methionine-labeled in vitro translation products of human poly(A)/sup +/ RNA with a surfactant polyclonal antibody resulted in identification of a M/sub r/ 40,000 precursor protein. Blot hybridization analysis of electrophoretically fractionated RNA from human lung detected a 2.0-kilobase RNA that was more abundant in adult lung than in fetal lung. These proteins, and specifically SPL(Phe), may therefore be useful for synthesis of replacement surfactants for treatment of hyaline membrane disease in newborn infants or of other surfactant-deficient states.

  10. cDNA and deduced amino acid sequence of human pulmonary surfactant-associated proteolipid SPL(Phe)

    International Nuclear Information System (INIS)

    Hydrophobic surfactant-associated protein of M/sub r/ 6000-14,000 was isolated from either/ethanol or chloroform/methanol extracts of mammalian pulmonary surfactant. Automated Edman degradation in a gas-phase sequencer showed the major N-terminus of the human low molecular weight protein to be Phe-Pro-Ile-Pro-Leu-Pro-Try-Cys-Trp-Leu-Cys-Arg-Ala-Leu-. Because of the N-terminal phenylalanine, the surfactant protein was designated SPL(Phe). Antiserum generated against hydrophobic surfactant protein(s) from bovine pulmonary surfactant recognized protein of M/sub r/ 6000-14,000 in immunoblot analysis and was used to screen a λgt11 expression library constructed from adult human lung poly(A)+ RNA. This resulted in identification of a 1.4-kilobase cDNA clone that was shown to encode the N-terminus of the surfactant polypeptide SPL(Phe) (Phe-Pro-Ile-Pro-Leu-Pro-) within an open reading frame for a larger protein. Expression of a fused β-galactosidase-SPL (Phe) gene in Escherichia coli yielded an immunoreactive M/sub r/ 34,000 fusion peptide. Hybrid-arrested translation with the cDNA and immunoprecipitation of [35S]methionine-labeled in vitro translation products of human poly(A)+ RNA with a surfactant polyclonal antibody resulted in identification of a M/sub r/ 40,000 precursor protein. Blot hybridization analysis of electrophoretically fractionated RNA from human lung detected a 2.0-kilobase RNA that was more abundant in adult lung than in fetal lung. These proteins, and specifically SPL(Phe), may therefore be useful for synthesis of replacement surfactants for treatment of hyaline membrane disease in newborn infants or of other surfactant-deficient states

  11. Sequences of commutator operations

    CERN Document Server

    Aichinger, Erhard

    2012-01-01

    Given the congruence lattice L of a finite algebra A with a Mal'cev term, we look for those sequences of operations on L that are sequences of higher commutator operations of expansions of A. The properties of higher commutators proved so far delimit the number of such sequences: the number is always at most countably infinite; if it is infinite, then L is the union of two proper subintervals with nonempty intersection.

  12. Next-generation sequencing

    DEFF Research Database (Denmark)

    Rieneck, Klaus; Bak, Mads; Jønson, Lars;

    2013-01-01

    the feasibility of predicting the fetal KEL1 phenotype using next-generation sequencing (NGS) technology. STUDY DESIGN AND METHODS: The KEL1/2 single-nucleotide polymorphism was polymerase chain reaction (PCR) amplified with one adjoining base, and the PCR product was sequenced using a genome analyzer (GAIIx......, Illumina); several millions of PCR sequences were analyzed. RESULTS: The results demonstrated the feasibility of diagnosing the fetal KEL1 or KEL2 blood group from cell-free DNA purified from maternal plasma. CONCLUSION: This method requires only one primer pair, and the large amount of sequence...

  13. Cosmetology: Scope and Sequence.

    Science.gov (United States)

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a cosmetology vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  14. A calculation of all possible oligosaccharide isomers both branched and linear yields 1.05 x 10(12) structures for a reducing hexasaccharide: the Isomer Barrier to development of single-method saccharide sequencing or synthesis systems.

    Science.gov (United States)

    Laine, R A

    1994-12-01

    The number of all possible linear and branched isomers of a hexasaccharide was calculated and found to be > 1.05 x 10(12). This large number defines the Isomer Barrier, a persistent technological barrier to the development of a single analytical method for the absolute characterization of carbohydrates, regardless of sample quantity. Because of this isomer barrier, no single method can be employed to determine complete oligosaccharide structure in 100 nmol amounts with the same assurance that can be achieved for 100 pmol amounts with single-procedure Edman peptide or Sanger DNA sequencing methods. Difficulties in the development of facile synthetic schemes for oligosaccharides are also explained by this large number. No current method of chemical or physical analysis has the resolution necessary to distinguish among 10(12) structures having the same mass. Therefore the 'characterization' of a middle-weight oligosaccharide solely by NMR or mass spectrometry necessarily contains a very large margin of error. Greater uncertainty accompanies results performed solely by sequential enzyme degradation followed by gel-permeation chromatography or electrophoresis, as touted by some commercial advertisements. Much of the literature which uses these single methods to 'characterize' complex carbohydrates is, therefore, in question, and journals should beware of publishing structural characterizations unless the authors reveal all alternate possible structures which could result from their analysis.(ABSTRACT TRUNCATED AT 250 WORDS)

  15. HIV Sequence Compendium 2015

    Energy Technology Data Exchange (ETDEWEB)

    Foley, Brian Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas Kenneth [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Cristian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Pennsylvania, Philadelphia, PA (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette Tina Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  16. Mapping sequences by parts

    Directory of Open Access Journals (Sweden)

    Guziolowski Carito

    2007-09-01

    Full Text Available Abstract Background: We present the N-map method, a pairwise and asymmetrical approach which allows us to compare sequences by taking into account evolutionary events that produce shuffled, reversed or repeated elements. Basically, the optimal N-map of a sequence s over a sequence t is the best way of partitioning the first sequence into N parts and placing them, possibly complementary reversed, over the second sequence in order to maximize the sum of their gapless alignment scores. Results: We introduce an algorithm computing an optimal N-map with time complexity O (|s| × |t| × N using O (|s| × |t| × N memory space. Among all the numbers of parts taken in a reasonable range, we select the value N for which the optimal N-map has the most significant score. To evaluate this significance, we study the empirical distributions of the scores of optimal N-maps and show that they can be approximated by normal distributions with a reasonable accuracy. We test the functionality of the approach over random sequences on which we apply artificial evolutionary events. Practical Application: The method is illustrated with four case studies of pairs of sequences involving non-standard evolutionary events.

  17. Evolution of DNA Sequencing

    International Nuclear Information System (INIS)

    Sanger and coworkers introduced DNA sequencing in 1970s for the first time. It principally relied on termination of growing nucleotide chain when a dideoxythymidine triphosphate (ddTTP) was inserted in it. Detection of terminated sequences was done radiographically on Polyacrylamide Gel Electrophoresis (PAGE). Improvements that have evolved over time in original Sanger sequencing include replacement of radiography with fluorescence, use of separate fluorescent markers for each nucleotide, use of capillary electrophoresis instead of polyacrylamide gel electrophoresis and then introduction of capillary array electrophoresis. However, this technique suffered from few inherent limitations like decreased sensitivity for low level mutant alleles, complexities in analyzing highly polymorphic regions like Major Histocompatibility Complex (MHC) and high DNA concentrations required. Several Next Generation Sequencing (NGS) technologies have been introduced by Roche, Illumina and other commercial manufacturers that tend to overcome Sanger sequencing limitations and have been reviewed. Introduction of NGS in clinical research and medical diagnostics is expected to change entire diagnostic approach. These include study of cancer variants, detection of minimal residual disease, exome sequencing, detection of Single Nucleotide Polymorphisms (SNPs) and their disease association, epigenetic regulation of gene expression and sequencing of microorganisms genome. (author)

  18. Classification of base sequences

    CERN Document Server

    Djokovic, Dragomir Z

    2010-01-01

    Base sequences BS(n+1,n) are quadruples of {1,-1}-sequences (A;B;C;D), with A and B of length n+1 and C and D of length n, such that the sum of their nonperiodic autocorrelation functions is a delta-function. The base sequence conjecture, asserting that BS(n+1,n) exist for all n, is stronger than the famous Hadamard matrix conjecture. We introduce a new definition of equivalence for base sequences BS(n+1,n) and construct a canonical form. By using this canonical form, we have enumerated the equivalence classes of BS(n+1,n) for n <= 30. Due to excessive size of the equivalence classes, the tables in the paper cover only the cases n <= 12.

  19. Pierre Robin sequence

    Science.gov (United States)

    Pierre Robin syndrome; Pierre Robin complex; Pierre Robin anomaly ... The exact causes of Pierre Robin sequence are unknown. It may be part of many genetic syndromes. The lower jaw develops slowly before birth, but ...

  20. Scope and Sequence.

    Science.gov (United States)

    Callison, Daniel

    2002-01-01

    Discusses scope and sequence plans for curriculum coordination in elementary and secondary education related to school libraries. Highlights include library skills; levels of learning objectives; technology skills; media literacy skills; and information inquiry skills across disciplines by grade level. (LRW)

  1. Yeast genome sequencing:

    DEFF Research Database (Denmark)

    Piskur, Jure; Langkjær, Rikke Breinhold

    2004-01-01

    For decades, unicellular yeasts have been general models to help understand the eukaryotic cell and also our own biology. Recently, over a dozen yeast genomes have been sequenced, providing the basis to resolve several complex biological questions. Analysis of the novel sequence data has shown...... of closely related species helps in gene annotation and to answer how many genes there really are within the genomes. Analysis of non-coding regions among closely related species has provided an example of how to determine novel gene regulatory sequences, which were previously difficult to analyse because...... they are short and degenerate and occupy different positions. Comparative genomics helps to understand the origin of yeasts and points out crucial molecular events in yeast evolutionary history, such as whole-genome duplication and horizontal gene transfer(s). In addition, the accumulating sequence data provide...

  2. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose;

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene......This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis...

  3. HIV Sequence Compendium 2010

    Energy Technology Data Exchange (ETDEWEB)

    Kuiken, Carla [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Foley, Brian [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Christian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Alabama, Tuscaloosa, AL (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  4. Text Mining: (Asynchronous Sequences

    Directory of Open Access Journals (Sweden)

    Sheema Khan

    2014-12-01

    Full Text Available In this paper we tried to correlate text sequences those provides common topics for semantic clues. We propose a two step method for asynchronous text mining. Step one check for the common topics in the sequences and isolates these with their timestamps. Step two takes the topic and tries to give the timestamp of the text document. After multiple repetitions of step two, we could give optimum result.

  5. Nanapore Sequencing with MSPA

    Science.gov (United States)

    Gundlach, Jens H.

    2011-10-01

    Nanopore sequencing is the simplest concept of converting the sequence of a single DNA molecule directly into an electronic signal. We introduced the protein pore MspA. derived from Mycobacterium smegmatis, to nanpore sequencing [1]. MspA has a single, narrow (-1.2nm) and short (MspA is reproducible with sub-nanometer precision and is engineerable using genetic mutations. DNA moves through the pore at rates exceeding 1nt/microsec. too fast to observe the passage of each nucleotide. However, when DNA is held with double stranded DNA sections or an avidin anchor, single nucleotides resident in MspA's constriction can be identified with highly resolved current differences. We have provided proof of principle of a nanopore sequencing method [2] in which we use DNA modified by inserting double stranded DNA-sections between every nucleotide. The double stranded sections are designed to halt translocation for long enough to sequentially read the sequence of the original DNA molecule. Prospects and developments to sequence unmodified native DNA using MspA will be discussed.[4pt] [1] T.Z. Butler, et al, PNAS 105 20647 (2008)[0pt] [2] I.M. Derrington, et al, PNAS 107 16060 (2010).

  6. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed Affan

    2012-01-26

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  7. Program Synthesizes UML Sequence Diagrams

    Science.gov (United States)

    Barry, Matthew R.; Osborne, Richard N.

    2006-01-01

    A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.

  8. What is next generation sequencing?

    OpenAIRE

    Behjati, Sam; Tarpey, Patrick S.

    2013-01-01

    Next generation sequencing (NGS), massively parallel or deep sequencing are related terms that describe a DNA sequencing technology which has revolutionised genomic research. Using NGS an entire human genome can be sequenced within a single day. In contrast, the previous Sanger sequencing technology, used to decipher the human genome, required over a decade to deliver the final draft. Although in genome research NGS has mostly superseded conventional Sanger sequencing, it has not yet translat...

  9. Controlled processing during sequencing.

    Science.gov (United States)

    Thothathiri, Malathi; Rattinger, Michelle

    2015-01-01

    Longstanding evidence has identified a role for the frontal cortex in sequencing within both linguistic and non-linguistic domains. More recently, neuropsychological studies have suggested a specific role for the left premotor-prefrontal junction (BA 44/6) in selection between competing alternatives during sequencing. In this study, we used neuroimaging with healthy adults to confirm and extend knowledge about the neural correlates of sequencing. Participants reproduced visually presented sequences of syllables and words using manual button presses. Items in the sequence were presented either consecutively or concurrently. Concurrent presentation is known to trigger the planning of multiple responses, which might compete with one another. Therefore, we hypothesized that regions involved in controlled processing would show greater recruitment during the concurrent than the consecutive condition. Whole-brain analysis showed concurrent > consecutive activation in sensory, motor and somatosensory cortices and notably also in rostral-dorsal anterior cingulate cortex. Region of interest analyses showed increased activation within left BA 44/6 and correlation between this region's activation and behavioral response times. Functional connectivity analysis revealed increased connectivity between left BA 44/6 and the posterior lobe of the cerebellum during the concurrent than the consecutive condition. These results corroborate recent evidence and demonstrate the involvement of BA 44/6 and other control regions when ordering co-activated representations.

  10. Controlled processing during sequencing

    Directory of Open Access Journals (Sweden)

    Malathi eThothathiri

    2015-10-01

    Full Text Available Longstanding evidence has identified a role for the frontal cortex in sequencing within both linguistic and non-linguistic domains. More recently, neuropsychological studies have suggested a specific role for the left premotor-prefrontal junction (BA 44/6 in selection between competing alternatives during sequencing. In this study, we used neuroimaging with healthy adults to confirm and extend knowledge about the neural correlates of sequencing. Participants reproduced visually presented sequences of syllables and words using manual button presses. Items in the sequence were presented either consecutively or concurrently. Concurrent presentation is known to trigger the planning of multiple responses, which might compete with one another. Therefore, we hypothesized that regions involved in controlled processing would show greater recruitment during the concurrent than the consecutive condition. Whole-brain analysis showed concurrent > consecutive activation in sensory, motor and somatosensory cortices and notably also in rostral-dorsal anterior cingulate cortex (ACC. Region of interest analyses showed increased activation within left BA 44/6 and correlation between this region’s activation and behavioral response times. Functional connectivity analysis revealed increased connectivity between left BA 44/6 and the posterior lobe of the cerebellum during the concurrent than the consecutive condition. These results corroborate recent evidence and demonstrate the involvement of BA 44/6 and other control regions when ordering co-activated representations.

  11. Ternary Chaotic Pulse Compression Sequences

    Directory of Open Access Journals (Sweden)

    J. B. Seventline

    2010-09-01

    Full Text Available In this paper method available for generating ternary sequences is discussed. These sequences are useful in many applications but specifically in synchronization of block codes and pulse compression in radar. The ternary sequences are derived from chaotic maps. It is feasible to achieve simultaneously superior performances in detection range and range resolution using the proposed ternary sequences. The properties of these sequences like autocorrelation function, Peak Side Lobe Ratio (PSLR, ambiguity diagram and performance under AWGN noise background has been studied. The generation of these sequences is much simpler, and the available number of sequences is virtually infinite and not limited by the length of the sequence.

  12. Sequencing BPS Spectra

    CERN Document Server

    Gukov, Sergei; Saberi, Ingmar; Stosic, Marko; Sulkowski, Piotr

    2015-01-01

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincar\\'e polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (refined) modular $S$-matrix. This leads to the identifi...

  13. Instruction sequence processing operators

    NARCIS (Netherlands)

    J.A. Bergstra; C.A. Middelburg

    2009-01-01

    This paper concerns instruction sequences whose execution involves the processing of instructions by an execution environment that offers a family of services and may yield a Boolean value at termination. We introduce a composition operator for families of services and three operators that have a di

  14. THE RHIC SEQUENCER.

    Energy Technology Data Exchange (ETDEWEB)

    VAN ZEIJTS,J.; DOTTAVIO,T.; FRAK,B.; MICHNOFF,R.

    2001-06-18

    The Relativistic Heavy Ion Collider (RHIC) has a high level asynchronous time-line driven by a controlling program called the ''Sequencer''. Most high-level magnet and beam related issues are orchestrated by this system. The system also plays an important task in coordinated data acquisition and saving. We present the program, operator interface, operational impact and experience.

  15. Twin anemia polycythemia sequence

    NARCIS (Netherlands)

    Slaghekke, Femke

    2014-01-01

    In this thesis we describe that Twin Anemia Polycythemia Sequence (TAPS) is a form of chronic feto-fetal transfusion in monochorionic (identical) twins based on a small amount of blood transfusion through very small anastomoses. For the antenatal diagnosis of TAPS, Middle Cerebral Artery – Peak Syst

  16. Properties of Semijoin Sequences

    Institute of Scientific and Technical Information of China (English)

    BengC.Ooi; B.Srinivasan

    1989-01-01

    The problem of finding optimum semijoin sequ4ence of an arbitrary query under linear cost function for the transmission cost is NP.hard.Hence heuristic algorithms with desirable properties are explored.In this paper four properties of semijoin programs for distributed query processing are identified,The use of these properties in constructing semijoin sequence is justified.An existing algorithm is modified incorporating these properties.Empirical comparison with existing algorithms shows the superiority of the proposed algorithm.

  17. Learning Sequence Neighbourhood Metrics

    CERN Document Server

    Bayer, Justin; van der Smagt, Patrick

    2011-01-01

    Recurrent neural networks (RNNs) in combination with a pooling operator and the neighbourhood components analysis (NCA) objective function are able to detect the characterizing dynamics of sequences and embed them into a fixed-length vector space of arbitrary dimensionality. Subsequently, the resulting features are meaningful and can be used for visualization or nearest neighbour classification in linear time. This kind of metric learning for sequential data enables the use of algorithms tailored towards fixed length vector spaces such as R^n.

  18. Sequence Classification: 885394 [

    Lifescience Database Archive (English)

    Full Text Available 703); The expression pattern of this gene is described in PMID:12000842; possible frameshift detected when compared...Non-TMB TMH Non-TMB Non-TMB Non-TMB Non-TMB >gi|23619146|ref|NP_705108.1| Slight difference exist when compa...red to the published sequence of EBL-1 from Dd2 strain of P. falciparum (PMID:10613

  19. Image sequence analysis

    CERN Document Server

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  20. Information Theory of DNA Sequencing

    CERN Document Server

    Motahari, Abolfazl; Tse, David

    2012-01-01

    DNA sequencing is the basic workhorse of modern day biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original sequence. By drawing an analogy between the DNA sequencing problem and the classic communication problem, we define an information theoretic notion of sequencing capacity. This is the maximum number of DNA base pairs that can be resolved reliably per read, and provides a fundamental limit to the performance that can be achieved by any assembly algorithm. We compute the sequencing capacity explicitly for a simple statistical model of the DNA sequence and the read process. Using this framework, we also study the impact of noise in the read process on the sequencing capacity.

  1. Psychoacoustic Properties of Fibonacci Sequences

    Directory of Open Access Journals (Sweden)

    J. Sokoll

    2008-01-01

    Full Text Available 1202, Fibonacci set up one of the most interesting sequences in number theory. This sequence can be represented by so-called Fibonacci Numbers, and by a binary sequence of zeros and ones. If such a binary Fibonacci Sequence is played back as an audio file, a very dissonant sound results. This is caused by the “almost-periodic”, “self-similar” property of the binary sequence. The ratio of zeros and ones converges to the golden ratio, as do the primary and secondary spectral components intheir frequencies and amplitudes. These Fibonacci Sequences will be characterized using listening tests and psychoacoustic analyses. 

  2. Allele Re-sequencing Technologies

    DEFF Research Database (Denmark)

    Byrne, Stephen; Farrell, Jacqueline Danielle; Asp, Torben

    2013-01-01

    The development of next-generation sequencing technologies has made sequencing an affordable approach for detection of genetic variations associated with various traits. However, the cost of whole genome re-sequencing still remains too high to be feasible for many plant species with large and com...... alternative to whole genome re-sequencing to identify causative genetic variations in plants. One challenge, however, will be efficient bioinformatics strategies for data handling and analysis from the increasing amount of sequence information.......The development of next-generation sequencing technologies has made sequencing an affordable approach for detection of genetic variations associated with various traits. However, the cost of whole genome re-sequencing still remains too high to be feasible for many plant species with large...

  3. Spaces of Ideal Convergent Sequences

    Directory of Open Access Journals (Sweden)

    M. Mursaleen

    2014-01-01

    Full Text Available In the present paper, we introduce some sequence spaces using ideal convergence and Musielak-Orlicz function ℳ=Mk. We also examine some topological properties of the resulting sequence spaces.

  4. Novel sequences propel familiar folds.

    Science.gov (United States)

    Jawad, Zahra; Paoli, Massimo

    2002-04-01

    Recent structure determinations have made new additions to a set of strikingly different sequences that give rise to the same topology. Proteins with a beta propeller fold are characterized by extreme sequence diversity despite the similarity in their three-dimensional structures. Several fold predictions, based in part on sequence repeats thought to match modular beta sheets, have been proved correct.

  5. Sequencing Games with Repeated Players

    NARCIS (Netherlands)

    Estevez Fernandez, M.A.; Borm, P.E.M.; Calleja, P.; Hamers, H.J.M.

    2004-01-01

    Two classes of one machine sequencing situations are considered in which each job corresponds to exactly one player but a player may have more than one job to be processed, so called RP(repeated player) sequencing situations.In max-RP sequencing situations it is assumed that each player's cost funct

  6. Rapid Polymer Sequencer

    Science.gov (United States)

    Stolc, Viktor (Inventor); Brock, Matthew W (Inventor)

    2013-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component. The nanopore preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  7. The Galaxy End Sequence

    CERN Document Server

    Eales, Stephen; Smith, Matthew; Appah, Kiran; Ciesla, Laure; Duffield, Chris; Schofield, Simon

    2016-01-01

    A common assumption is that galaxies fall in two distinct regions on a plot of specific star-formation rate (SSFR) versus galaxy stellar mass: a star-forming Galaxy Main Sequence (GMS) and a separate region of `passive' or `red and dead galaxies'. Starting from a volume-limited sample of nearby galaxies designed to contain most of the stellar mass in this volume, and thus being a fair representation of the Universe at the end of 12 billion years of galaxy evolution, we investigate the distribution of galaxies in this diagram today. We show that galaxies follow a strongly curved extended GMS with a steep negative slope at high galaxy stellar masses. There is a gradual change in the morphologies of the galaxies along this distribution, but there is no clear break between early-type and late-type galaxies. Examining the other evidence that there are two distinct populations, we argue that the `red sequence' is the result of the colours of galaxies changing very little below a critical value of the SSFR, rather t...

  8. Complete Convergence of Exchangeable Sequences

    Directory of Open Access Journals (Sweden)

    George Stoica

    2011-01-01

    Full Text Available We prove that exchangeable sequences converge completely in the Baum-Katz sense under the same conditions as i.i.d. sequences do. Problem statement: The research was needed as the rate of convergence in the law of large numbers for exchangeable sequences was previously obtained under restricted hypotheses. Approach: We applied powerful techniques involving inequalities for independent sequences of random variables. Results: We obtained the maximal rate of convergence and provided an example to show that our findings are sharp. Conclusion/Recommendations: The technique used in the paper may be adapted in the similar study for identically distributed sequences.

  9. Solid phase sequencing of biopolymers

    Energy Technology Data Exchange (ETDEWEB)

    Cantor, Charles (Del Mar, CA); Koster, Hubert (La Jolla, CA)

    2010-09-28

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include DNA or RNA in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  10. The Evolution of Nanopore Sequencing

    Directory of Open Access Journals (Sweden)

    Yue eWang

    2015-01-01

    Full Text Available The $1,000 Genome project has been drawing increasing attention since its launch a decade ago. Nanopore sequencing, the third-generation, is believed to be one of the most promising sequencing technologies to reach four gold standards set for the $1,000 Genome while the second-generation sequencing technologies are bringing about a revolution in life sciences, particularly in genome sequencing-based personalized medicine. Both of protein and solid-state nanopores have been extensively investigated for a series of issues, from detection of ionic current blockage to field-effect-transistor (FET sensors. A newly released protein nanopore sequencer has shown encouraging potential that nanopore sequencing will ultimately fulfill the gold standards. In this review, we address advances, challenges, and possible solutions of nanopore sequencing according to these standards.

  11. Universal sequence map (USM of arbitrary discrete sequences

    Directory of Open Access Journals (Sweden)

    Almeida Jonas S

    2002-02-01

    Full Text Available Abstract Background For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. Results We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM, is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR. The latter enables the representation of 4 unit type sequences (like DNA as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. Conclusions USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules.

  12. Solid phase sequencing of biopolymers

    Energy Technology Data Exchange (ETDEWEB)

    Cantor, Charles R.; Hubert, Koster

    2014-06-24

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Probes may be affixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  13. Turtle Graphics of Morphic Sequences

    Science.gov (United States)

    Zantema, Hans

    2016-02-01

    The simplest infinite sequences that are not ultimately periodic are pure morphic sequences: fixed points of particular morphisms mapping single symbols to strings of symbols. A basic way to visualize a sequence is by a turtle curve: for every alphabet symbol fix an angle, and then consecutively for all sequence elements draw a unit segment and turn the drawing direction by the corresponding angle. This paper investigates turtle curves of pure morphic sequences. In particular, criteria are given for turtle curves being finite (consisting of finitely many segments), and for being fractal or self-similar: it contains an up-scaled copy of itself. Also space-filling turtle curves are considered, and a turtle curve that is dense in the plane. As a particular result we give an exact relationship between the Koch curve and a turtle curve for the Thue-Morse sequence, where until now for such a result only approximations were known.

  14. Graphene nanodevices for DNA sequencing

    Science.gov (United States)

    Heerema, Stephanie J.; Dekker, Cees

    2016-02-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with nanopores. Owing to its unique structure and properties, graphene provides interesting opportunities for the development of a new sequencing technology. In recent years, a wide range of creative ideas for graphene sequencers have been theoretically proposed and the first experimental demonstrations have begun to appear. Here, we review the different approaches to using graphene nanodevices for DNA sequencing, which involve DNA passing through graphene nanopores, nanogaps, and nanoribbons, and the physisorption of DNA on graphene nanostructures. We discuss the advantages and problems of each of these key techniques, and provide a perspective on the use of graphene in future DNA sequencing technology.

  15. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  16. Application of difference sequences theory

    OpenAIRE

    Khantarzhiev, Georgii

    2013-01-01

    The results of difference sequences theory are applied to analytic function theory and Diophantine equations. As a result we have the equation which connects the $n$-th derivative of a function with the difference sequence for the values of this function. Also the results of difference sequences theory helps to discover some features of the whole kind of Diophantine equations. The method presented allows to find limits where Diophantine equation does not have integer solutions. The higher pow...

  17. Nonlinear analysis of biological sequences

    Energy Technology Data Exchange (ETDEWEB)

    Torney, D.C.; Bruno, W.; Detours, V. [and others

    1998-11-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The main objectives of this project involved deriving new capabilities for analyzing biological sequences. The authors focused on tabulating the statistical properties exhibited by Human coding DNA sequences and on techniques of inferring the phylogenetic relationships among protein sequences related by descent.

  18. Blazar Sequence in Fermi Era

    Indian Academy of Sciences (India)

    Liang Chen

    2014-09-01

    In this paper, we review the latest research results on the topic of blazar sequence. It seems that the blazar sequence is phenomenally ruled out, while the theoretical blazar sequence still holds. We point out that black hole mass is a dominated parameter accounting for high-power-high-synchrotron-peaked and low-power-low-sychrotron-peaked blazars. Because most blazars have similar size of emission region, theoretical blazar sequence implies that the break of Spectral Energy Distribution (SED) is a cooling break in nature.

  19. SNMR pulse sequence phase cycling

    Science.gov (United States)

    Walsh, David O; Grunewald, Elliot D

    2013-11-12

    Technologies applicable to SNMR pulse sequence phase cycling are disclosed, including SNMR acquisition apparatus and methods, SNMR processing apparatus and methods, and combinations thereof. SNMR acquisition may include transmitting two or more SNMR pulse sequences and applying a phase shift to a pulse in at least one of the pulse sequences, according to any of a variety cycling techniques. SNMR processing may include combining SNMR from a plurality of pulse sequences comprising pulses of different phases, so that desired signals are preserved and indesired signals are canceled.

  20. Assembly sequencing with toleranced parts

    Energy Technology Data Exchange (ETDEWEB)

    Latombe, J.C. [Stanford Univ., CA (United States). Robotics Lab.; Wilson, R.H. [Sandia National Labs., Albuquerque, NM (United States). Intelligent Systems and Robotics Center

    1995-02-21

    The goal of assembly sequencing is to plan a feasible series of operations to construct a product from its individual parts. Previous research has thoroughly investigated assembly sequencing under the assumption that parts have nominal geometry. This paper considers the case where parts have toleranced geometry. Its main contribution is an efficient procedure that decides if a product admits an assembly sequence with infinite translations that is feasible for all possible instances of the components within the specified tolerances. If the product admits one such sequence, the procedure can also generate it. For the cases where there exists no such assembly sequence, another procedure is proposed which generates assembly sequences that are feasible only for some values of the toleranced dimensions. If this procedure produces no such sequence, then no instance of the product is assemblable. Finally, this paper analyzes the relation between assembly and disassembly sequences in the presence of toleranced parts. This work assumes a simple, but non-trivial tolerance language that falls short of capturing all imperfections of a manufacturing process. Hence, it is only one step toward assembly sequencing with toleranced parts.

  1. The ontology of biological sequences

    Directory of Open Access Journals (Sweden)

    Kelso Janet

    2009-11-01

    Full Text Available Abstract Background Biological sequences play a major role in molecular and computational biology. They are studied as information-bearing entities that make up DNA, RNA or proteins. The Sequence Ontology, which is part of the OBO Foundry, contains descriptions and definitions of sequences and their properties. Yet the most basic question about sequences remains unanswered: what kind of entity is a biological sequence? An answer to this question benefits formal ontologies that use the notion of biological sequences and analyses in computational biology alike. Results We provide both an ontological analysis of biological sequences and a formal representation that can be used in knowledge-based applications and other ontologies. We distinguish three distinct kinds of entities that can be referred to as "biological sequence": chains of molecules, syntactic representations such as those in biological databases, and the abstract information-bearing entities. For use in knowledge-based applications and inclusion in biomedical ontologies, we implemented the developed axiom system for use in automated theorem proving. Conclusion Axioms are necessary to achieve the main goal of ontologies: to formally specify the meaning of terms used within a domain. The axiom system for the ontology of biological sequences is the first elaborate axiom system for an OBO Foundry ontology and can serve as starting point for the development of more formal ontologies and ultimately of knowledge-based applications.

  2. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal

    2011-08-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  3. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal

    2011-11-01

    Bioinformatics database is growing exponentially in size. Processing these large amount of data may take hours of time even if super computers are used. One of the most important processing tool in Bioinformatics is sequence alignment. We introduce fast alignment algorithm, called \\'Alignment By Scanning\\' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the \\'GAP\\' (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 51% enhancement in alignment score when it is compared with the GAP Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  4. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon;

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS)...

  5. Finite extensions of Bessel sequences

    OpenAIRE

    Bakić, Damir; Berić, Tomislav

    2015-01-01

    The paper studies finite extensions of Bessel sequences in infinite-dimensional Hilbert spaces. We provide a characterization of Bessel sequences that can be extended to frames by adding finitely many vectors. We also characterize frames that can be converted to Parseval frames by finite-dimensional perturbations. Finally, some results on excesses of frames and near-Riesz bases are derived.

  6. Sequence conserved for subcellular localization

    Science.gov (United States)

    Nair, Rajesh; Rost, Burkhard

    2002-01-01

    The more proteins diverged in sequence, the more difficult it becomes for bioinformatics to infer similarities of protein function and structure from sequence. The precise thresholds used in automated genome annotations depend on the particular aspect of protein function transferred by homology. Here, we presented the first large-scale analysis of the relation between sequence similarity and identity in subcellular localization. Three results stood out: (1) The subcellular compartment is generally more conserved than what might have been expected given that short sequence motifs like nuclear localization signals can alter the native compartment; (2) the sequence conservation of localization is similar between different compartments; and (3) it is similar to the conservation of structure and enzymatic activity. In particular, we found the transition between the regions of conserved and nonconserved localization to be very sharp, although the thresholds for conservation were less well defined than for structure and enzymatic activity. We found that a simple measure for sequence similarity accounting for pairwise sequence identity and alignment length, the HSSP distance, distinguished accurately between protein pairs of identical and different localizations. In fact, BLAST expectation values outperformed the HSSP distance only for alignments in the subtwilight zone. We succeeded in slightly improving the accuracy of inferring localization through homology by fine tuning the thresholds. Finally, we applied our results to the entire SWISS-PROT database and five entirely sequenced eukaryotes. PMID:12441382

  7. PERIODIC COMPLEMENTARY BINARY SEQUENCE PAIRS

    Institute of Scientific and Technical Information of China (English)

    XuChengqian; ZhaoXiaoqun

    2002-01-01

    A new set of binary sequences-Periodic Complementary Binary Sequence Pair (PCSP)is proposed .A new class of block design-Difference Family Pair (DFP)is also proposed .The relationship between PCSP and DFP,the properties and exising conditions of PCSP and the recursive constructions for PCSP are given.

  8. PERIODIC COMPLEMENTARY BINARY SEQUENCE PAIRS

    Institute of Scientific and Technical Information of China (English)

    Xu Chengqian; Zhao Xiaoqun

    2002-01-01

    A new set of binary sequences-Periodic Complementary Binary Sequence Pair (PCSP) is proposed. A new class of block design-Difference Family Pair (DFP) is also proposed.The relationship between PCSP and DFP, the properties and existing conditions of PCSP and the recursive constructions for PCSP are given.

  9. Rapid Diagnostics of Onboard Sequences

    Science.gov (United States)

    Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

    2012-01-01

    Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command

  10. Spatiotemporal correlations of aftershock sequences

    CERN Document Server

    Peixoto, Tiago P; Davidsen, Jörn

    2010-01-01

    Aftershock sequences are of particular interest in seismic research since they may condition seismic activity in a given region over long time spans. While they are typically identified with periods of enhanced seismic activity after a large earthquake as characterized by the Omori law, our knowledge of the spatiotemporal correlations between events in an aftershock sequence is limited. Here, we study the spatiotemporal correlations of two aftershock sequences form California (Parkfield and Hector Mine) using the recently introduced concept of "recurrent" events. We find that both sequences have very similar properties and that most of them are captured by the space-time epidemic-type aftershock sequence (ETAS) model if one takes into account catalog incompleteness. However, the stochastic model does not capture the spatiotemporal correlations leading to the observed structure of seismicity on small spatial scales.

  11. Multilocus sequence typing of total-genome-sequenced bacteria.

    Science.gov (United States)

    Larsen, Mette V; Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Pontén, Thomas; Ussery, David W; Aarestrup, Frank M; Lund, Ole

    2012-04-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST. PMID:22238442

  12. Ossification sequence heterochrony among amphibians.

    Science.gov (United States)

    Harrington, Sean M; Harrison, Luke B; Sheil, Christopher A

    2013-01-01

    Heterochrony is an important mechanism in the evolution of amphibians. Although studies have centered on the relationship between size and shape and the rates of development, ossification sequence heterochrony also may have been important. Rigorous, phylogenetic methods for assessing sequence heterochrony are relatively new, and a comprehensive study of the relative timing of ossification of skeletal elements has not been used to identify instances of sequence heterochrony across Amphibia. In this study, a new version of the program Parsimov-based genetic inference (PGi) was used to identify shifts in ossification sequences across all extant orders of amphibians, for all major structural units of the skeleton. PGi identified a number of heterochronic sequence shifts in all analyses, the most interesting of which seem to be tied to differences in metamorphic patterns among major clades. Early ossification of the vomer, premaxilla, and dentary is retained by Apateon caducus and members of Gymnophiona and Urodela, which lack the strongly biphasic development seen in anurans. In contrast, bones associated with the jaws and face were identified as shifting late in the ancestor of Anura. The bones that do not shift late, and thereby occupy the earliest positions in the anuran cranial sequence, are those in regions of the skull that undergo the least restructuring throughout anuran metamorphosis. Additionally, within Anura, bones of the hind limb and pelvic girdle were also identified as shifting early in the sequence of ossification, which may be a result of functional constraints imposed by the drastic metamorphosis of most anurans.

  13. Variational Sequences, Representation Sequences and Applications in Physics

    OpenAIRE

    Palese, Marcella; Rossi, Olga; Winterroth, Ekkehart; Musilová, Jana

    2015-01-01

    This paper is a review containing new original results on the finite order variational sequence and its different representations with emphasis on applications in the theory of variational symmetries and conservation laws in physics.

  14. A Criterion for Regular Sequences

    Indian Academy of Sciences (India)

    D P Patil; U Storch; J Stückrad

    2004-05-01

    Let be a commutative noetherian ring and $f_1,\\ldots,f_r \\in R$. In this article we give (cf. the Theorem in $\\mathcal{x}$2) a criterion for $f_1,\\ldots,f_r$ to be regular sequence for a finitely generated module over which strengthens and generalises a result in [2]. As an immediate consequence we deduce that if $V(g_1,\\ldots,g_r) \\subseteq V(f_1,\\ldots,f_r)$ in Spec and if $f_1,\\ldots,f_r$ is a regular sequence in , then $g_1,\\ldots,g_r$ is also a regular sequence in .

  15. New Orthogonal Small Set Kasami Code Sequence

    OpenAIRE

    I Nyoman Pramaita; I G.A.G.K. Diafari; DNKP Negara; Agus Dharma

    2015-01-01

    In this paper, the authors propose the design of a new orthogonal small set Kasami code sequence generated using combination of non-orthogonal m-sequence and small set Kasami code sequence. The authors demonstrate that the proposed code sequence has comparable auto-correlation function (ACF), cross- correlation function (CCF), peak cross-correlation values with that of the existing orthogonal small set Kasami code sequence. Though the proposed code sequence has less code sequence sets than th...

  16. ISIS Individualized Support In Sequencing

    NARCIS (Netherlands)

    Drachsler, Hendrik; Hummel, Hans

    2007-01-01

    Drachsler, H., & Hummel, H. G. K. (2007). ISIS Individualized Support In Sequencing. Presentation given during the PIP meeting on March 22, 2007. Open University of the Netherlands: Heerlen, The Netherlands.

  17. DNA Sequencing Using capillary Electrophoresis

    Energy Technology Data Exchange (ETDEWEB)

    Dr. Barry Karger

    2011-05-09

    The overall goal of this program was to develop capillary electrophoresis as the tool to be used to sequence for the first time the Human Genome. Our program was part of the Human Genome Project. In this work, we were highly successful and the replaceable polymer we developed, linear polyacrylamide, was used by the DOE sequencing lab in California to sequence a significant portion of the human genome using the MegaBase multiple capillary array electrophoresis instrument. In this final report, we summarize our efforts and success. We began our work by separating by capillary electrophoresis double strand oligonucleotides using cross-linked polyacrylamide gels in fused silica capillaries. This work showed the potential of the methodology. However, preparation of such cross-linked gel capillaries was difficult with poor reproducibility, and even more important, the columns were not very stable. We improved stability by using non-cross linked linear polyacrylamide. Here, the entangled linear chains could move when osmotic pressure (e.g. sample injection) was imposed on the polymer matrix. This relaxation of the polymer dissipated the stress in the column. Our next advance was to use significantly lower concentrations of the linear polyacrylamide that the polymer could be automatically blown out after each run and replaced with fresh linear polymer solution. In this way, a new column was available for each analytical run. Finally, while testing many linear polymers, we selected linear polyacrylamide as the best matrix as it was the most hydrophilic polymer available. Under our DOE program, we demonstrated initially the success of the linear polyacrylamide to separate double strand DNA. We note that the method is used even today to assay purity of double stranded DNA fragments. Our focus, of course, was on the separation of single stranded DNA for sequencing purposes. In one paper, we demonstrated the success of our approach in sequencing up to 500 bases. Other

  18. Guitars, Violins, and Geometric Sequences

    Science.gov (United States)

    Barger, Rita; Haehl, Martha

    2007-01-01

    This article describes middle school mathematics activities that relate measurement, ratios, and geometric sequences to finger positions or the placement of frets on stringed musical instruments. (Contains 2 figures and 2 tables.)

  19. Pythagorean Triples from Harmonic Sequences.

    Science.gov (United States)

    DiDomenico, Angelo S.; Tanner, Randy J.

    2001-01-01

    Shows how all primitive Pythagorean triples can be generated from harmonic sequences. Use inductive and deductive reasoning to explore how Pythagorean triples are connected with another area of mathematics. (KHR)

  20. Classification of Base Sequences (+1,

    Directory of Open Access Journals (Sweden)

    Dragomir Ž. Ðoković

    2010-01-01

    Full Text Available Base sequences BS(+1, are quadruples of {±1}-sequences (;;;, with A and B of length +1 and C and D of length n, such that the sum of their nonperiodic autocor-relation functions is a -function. The base sequence conjecture, asserting that BS(+1, exist for all n, is stronger than the famous Hadamard matrix conjecture. We introduce a new definition of equivalence for base sequences BS(+1, and construct a canonical form. By using this canonical form, we have enumerated the equivalence classes of BS(+1, for ≤30. As the number of equivalence classes grows rapidly (but not monotonically with n, the tables in the paper cover only the cases ≤13.

  1. The Conditional Sequence Information Function

    Directory of Open Access Journals (Sweden)

    Osman Guzide

    2010-01-01

    Full Text Available Problem statement: A great deal of attention has been given to the theory of information. It has found its applications in science especially in the area of Biothecnology. Previous studies on the subject has been limited to either conditional or sequence problem solving. This study combines both conditional and sequence properties of the information function. By doing this, researchers can find solutions to more problems that are applicable in real life. Approach: First, properties of the dynamical systems and the information function defined by Shannon has been provided, Then, the conditional sequence information function of a dynamical system together with its proof was presented. Result: This new function now exists now and ready for the use in many real life problems such as finding solutions to DNA sequence with conditional sickness. Conclusion: This new function created opens a new avenue to researches in solving more complex problems by using its developed properties.

  2. Mining protein sequences for motifs.

    Science.gov (United States)

    Narasimhan, Giri; Bu, Changsong; Gao, Yuan; Wang, Xuning; Xu, Ning; Mathee, Kalai

    2002-01-01

    We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs in protein sequences. The algorithm assumes that a motif is constituted by the presence of a "good" combination of residues in appropriate locations of the motif. The algorithm attempts to compile such good combinations into a "pattern dictionary" by processing an aligned training set of protein sequences. The dictionary is subsequently used to detect motifs in new protein sequences. Statistical significance of the detection results are ensured by statistically determining the various parameters of the algorithm. Based on this approach, we have implemented a program called GYM. The Helix-Turn-Helix motif was used as a model system on which to test our program. The program was also extended to detect Homeodomain motifs. The detection results for the two motifs compare favorably with existing programs. In addition, the GYM program provides a lot of useful information about a given protein sequence. PMID:12487759

  3. Modified Genetic Algorithm for DNA Sequence Assembly by Shotgun and Hybridization Sequencing Techniques

    OpenAIRE

    Prof.Narayan Kumar Sahu; Prof.Somesh Dewangan; Prof.Akash Wanjari

    2012-01-01

    Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion of pair wise fragment overlap. While shotgun sequencing infers a DNA sequence given the sequences of overlapping fragments, a recent and complementary method, called sequencing by hy...

  4. Sequence diversity and functional conformity

    OpenAIRE

    de Lange, Orlando

    2015-01-01

    At least four phylogenetically distinct groups of bacteria encode repeat proteins with the common ability to bind specific DNA sequences with a unique but conserved code. Each repeat binds a single DNA base, and specificity is determined by the amino acid residue at position 13 of each repeat. Repeats are typically 33-35 amino acids long. Comparing repeat sequences across all groups reveals that only three positions are hyper-conserved. Repeats are in most cases functionally compatible such t...

  5. On train track splitting sequences

    CERN Document Server

    Masur, Howard; Schleimer, Saul

    2010-01-01

    We show that the subsurface projection of a train track splitting sequence is an unparameterized quasi-geodesic in the curve complex of the subsurface. For the proof we introduce induced tracks, efficient position, and wide curves. This result is an important step in the proof that the disk complex is Gromov hyperbolic. As another application we show that train track sliding and splitting sequences give quasi-geodesics in the train track graph, generalizing a result of Hamenstaedt [Invent. Math.].

  6. Integrated sequence analysis. Final report

    International Nuclear Information System (INIS)

    The NKS/RAK subprojet 3 'integrated sequence analysis' (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term 'methodology' denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  7. Structural complexity of DNA sequence.

    Science.gov (United States)

    Liou, Cheng-Yuan; Tseng, Shen-Han; Cheng, Wei-Chen; Tsai, Huai-Ying

    2013-01-01

    In modern bioinformatics, finding an efficient way to allocate sequence fragments with biological functions is an important issue. This paper presents a structural approach based on context-free grammars extracted from original DNA or protein sequences. This approach is radically different from all those statistical methods. Furthermore, this approach is compared with a topological entropy-based method for consistency and difference of the complexity results. PMID:23662161

  8. Structural Complexity of DNA Sequence

    Directory of Open Access Journals (Sweden)

    Cheng-Yuan Liou

    2013-01-01

    Full Text Available In modern bioinformatics, finding an efficient way to allocate sequence fragments with biological functions is an important issue. This paper presents a structural approach based on context-free grammars extracted from original DNA or protein sequences. This approach is radically different from all those statistical methods. Furthermore, this approach is compared with a topological entropy-based method for consistency and difference of the complexity results.

  9. Optimization of sequence alignment for simple sequence repeat regions

    Directory of Open Access Journals (Sweden)

    Ogbonnaya Francis C

    2011-07-01

    Full Text Available Abstract Background Microsatellites, or simple sequence repeats (SSRs, are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs. SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. Findings To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type. When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. Conclusions The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic

  10. Genome Sequence of Canine Herpesvirus.

    Directory of Open Access Journals (Sweden)

    Konstantinos V Papageorgiou

    Full Text Available Canine herpesvirus is a widespread alphaherpesvirus that causes a fatal haemorrhagic disease of neonatal puppies. We have used high-throughput methods to determine the genome sequences of three viral strains (0194, V777 and V1154 isolated in the United Kingdom between 1985 and 2000. The sequences are very closely related to each other. The canine herpesvirus genome is estimated to be 125 kbp in size and consists of a unique long sequence (97.5 kbp and a unique short sequence (7.7 kbp that are each flanked by terminal and internal inverted repeats (38 bp and 10.0 kbp, respectively. The overall nucleotide composition is 31.6% G+C, which is the lowest among the completely sequenced alphaherpesviruses. The genome contains 76 open reading frames predicted to encode functional proteins, all of which have counterparts in other alphaherpesviruses. The availability of the sequences will facilitate future research on the diagnosis and treatment of canine herpesvirus-associated disease.

  11. Long-range barcode labeling-sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Chen, Feng; Zhang, Tao; Singh, Kanwar K.; Pennacchio, Len A.; Froula, Jeff L.; Eng, Kevin S.

    2016-10-18

    Methods for sequencing single large DNA molecules by clonal multiple displacement amplification using barcoded primers. Sequences are binned based on barcode sequences and sequenced using a microdroplet-based method for sequencing large polynucleotide templates to enable assembly of haplotype-resolved complex genomes and metagenomes.

  12. Pig genome sequence - analysis and publication strategy

    NARCIS (Netherlands)

    Archibald, A.L.; Bolund, L.; Churcher, C.; Fredholm, M.; Groenen, M.A.M.; Harlizius, B.

    2010-01-01

    Background - The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. Results - Assemblies of the B

  13. Thread extraction for polyadic instruction sequences

    NARCIS (Netherlands)

    J.A. Bergstra; C.A. Middelburg

    2009-01-01

    Instruction sequences are often fragmented. An important reason for instruction sequence fragmentation is that the execution architecture at hand to execute instruction sequences sets bounds to the size of instruction sequences. In this paper, we study instruction sequences that have been split into

  14. ARC Code TI: sequenceMiner

    Data.gov (United States)

    National Aeronautics and Space Administration — The sequenceMiner was developed to address the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. sequenceMiner works...

  15. Sequencing Needs for Viral Diagnostics

    Energy Technology Data Exchange (ETDEWEB)

    Gardner, S N; Lam, M; Mulakken, N J; Torres, C L; Smith, J R; Slezak, T

    2004-01-26

    We built a system to guide decisions regarding the amount of genomic sequencing required to develop diagnostic DNA signatures, which are short sequences that are sufficient to uniquely identify a viral species. We used our existing DNA diagnostic signature prediction pipeline, which selects regions of a target species genome that are conserved among strains of the target (for reliability, to prevent false negatives) and unique relative to other species (for specificity, to avoid false positives). We performed simulations, based on existing sequence data, to assess the number of genome sequences of a target species and of close phylogenetic relatives (''near neighbors'') that are required to predict diagnostic signature regions that are conserved among strains of the target species and unique relative to other bacterial and viral species. For DNA viruses such as variola (smallpox), three target genomes provide sufficient guidance for selecting species-wide signatures. Three near neighbor genomes are critical for species specificity. In contrast, most RNA viruses require four target genomes and no near neighbor genomes, since lack of conservation among strains is more limiting than uniqueness. SARS and Ebola Zaire are exceptional, as additional target genomes currently do not improve predictions, but near neighbor sequences are urgently needed. Our results also indicate that double stranded DNA viruses are more conserved among strains than are RNA viruses, since in most cases there was at least one conserved signature candidate for the DNA viruses and zero conserved signature candidates for the RNA viruses.

  16. Sequence Patterns of Identity Authentication Protocols

    Institute of Scientific and Technical Information of China (English)

    Tao Hongcai; He Dake

    2006-01-01

    From the viewpoint of protocol sequence, analyses are made of the sequence patterns of possible identity authentication protocol under two cases: with or without the trusted third party (TTP). Ten feasible sequence patterns of authentication protocol with TTP and 5 sequence patterns without TTP are gained. These gained sequence patterns meet the requirements for identity authentication,and basically cover almost all the authentication protocols with TTP and without TTP at present. All of the sequence patterns gained are classified into unilateral or bilateral authentication. Then , according to the sequence symmetry, several good sequence patterns with TTP are evaluated. The accompolished results can provide a reference to design of new identity authentication protocols.

  17. Transgressive Surface as Sequence Boundary

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    Analysis of the four cases of the sequence boundary (SB)-transgressive surface (TS) relation in nature shows that applying transgressive surfaces as sequence boundaries has the following merits: it improves the methodology of stratigraphic subdivision; the position of transgressive surface in a sea level curve is relatively fixed; the transgressive surface is a transforming surface of the stratal structure; in platforms or ramps, the transgressive surface is the only choice for determining the sequence boundary; the transgressive surface is a readily recognized physical surface reflected by seismic records in seismostratigraphy. The paper reaches a conclusion that to delineate a SB in terms of the TS is theoretically and practically better than to delineate it between highstand and lowstand sediments as has been done traditionally.

  18. On the base sequence conjecture

    CERN Document Server

    Djokovic, Dragomir Z

    2010-01-01

    Let BS(m,n) denote the set of base sequences (A;B;C;D), with A and B of length m and C and D of length n. The base sequence conjecture (BSC) asserts that BS(n+1,n) exist (i.e., are non-empty) for all n. This is known to be true for n <= 36 and when n is a Golay number. We show that it is also true for n=37 and n=38. It is worth pointing out that BSC is stronger than the famous Hadamard matrix conjecture. In order to demonstrate the abundance of base sequences, we have previously attached to BS(n+1,n) a graph Gamma_n and computed the Gamma_n for n <= 27. We now extend these computations and determine the Gamma_n for n=28,...,35. We also propose a conjecture describing these graphs in general.

  19. Vector sequences - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available Budding yeast cDNA sequencing project Vector sequences Data detail Data name Vector sequences Description of data contents Vector seq...wnload License Update History of This Database Site Policy | Contact Us Vector sequences - Budding yeast cDNA sequencing project | LSDB Archive ... ...uences used for sequencing. Multi FASTA format. 7 entries. Data file File name: vec

  20. KERNEL WORDS AND GAP SEQUENCE OF THE TRIBONACCI SEQUENCE

    Institute of Scientific and Technical Information of China (English)

    Yuke HUANG; Zhiying WEN

    2016-01-01

    In this paper, we investigate the factor properties and gap sequence of the Tri-bonacci sequence, the fixed point of the substitution σ(a, b, c) = (ab, ac, a). Let ωp be the p-th occurrence of ω and Gp(ω) be the gap between ωp and ωp+1. We introduce a notion of kernel for each factor ω, and then give the decomposition of the factor ω with respect to its kernel. Using the kernel and the decomposition, we prove the main result of this paper:for each factorω, the gap sequence{Gp(ω)}p≥1 is the Tribonacci sequence over the alphabet{G1(ω), G2(ω), G4(ω)}, and the expressions of gaps are determined completely. As an appli-cation, for each factorω and p∈N, we determine the position ofωp. Finally we introduce a notion of spectrum for studying some typical combinatorial properties, such as power, overlap and separate of factors.

  1. A repetitive sequence assembler based on next-generation sequencing.

    Science.gov (United States)

    Lian, S; Tu, Y; Wang, Y; Chen, X; Wang, L

    2016-01-01

    Repetitive sequences of variable length are common in almost all eukaryotic genomes, and most of them are presumed to have important biomedical functions and can cause genomic instability. Next-generation sequencing (NGS) technologies provide the possibility of identifying capturing these repetitive sequences directly from the NGS data. In this study, we assessed the performances in identifying capturing repeats of leading assemblers, such as Velvet, SOAPdenovo, SGA, MSR-CA, Bambus2, ALLPATHS-LG, and AByss using three real NGS datasets. Our results indicated that most of them performed poorly in capturing the repeats. Consequently, we proposed a repetitive sequence assembler, named NGSReper, for capturing repeats from NGS data. Simulated datasets were used to validate the feasibility of NGSReper. The results indicate that the completeness of capturing repeat is up to 99%. Cross validation was performed in three real NGS datasets, and extensive comparisons indicate that NGSReper performed best in terms of completeness and accuracy in capturing repeats. In conclusion, NGSReper is an appropriate and suitable tool for capturing repeats directly from NGS data. PMID:27525861

  2. Sequences and series involving the sequence of composite numbers

    Directory of Open Access Journals (Sweden)

    Panayiotis Vlamos

    2002-01-01

    Full Text Available Denoting by pn and cn the nth prime number and the nth composite number, respectively, we prove that both the sequence (xnn≥1, defined by xn=∑k=1n (ck+1−ck / k−pn / n, and the series ∑n=1∞ (pcn−cpn / npn are convergent.

  3. Integrated sequence analysis. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Andersson, K.; Pyy, P

    1998-02-01

    The NKS/RAK subprojet 3 `integrated sequence analysis` (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term `methodology` denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  4. $delta$-Quasi Cauchy Sequences

    OpenAIRE

    Cakalli, Huseyin

    2010-01-01

    Recently, a concept of forward continuity and a concept of forward compactness are introduced in the senses that a function $f$ is forward continuous if $\\lim_{n\\to\\infty} \\Delta f(x_{n})=0$ whenever $\\lim_{n\\to\\infty} \\Delta x_{n}=0$,\\; and a subset $E$ of $\\textbf{R}$ is forward compact if any sequence $\\textbf{x}=(x_{n})$ of points in $E$ has a subsequence $\\textbf{z}=(z_{k})=(x_{n_{k}})$ of the sequence $\\textbf{x}$ such that $\\lim_{k\\to \\infty} \\Delta z_{k}=0$ where $\\Delta z_{k}=z_{k+1}...

  5. The origin of biased sequence depth in sequence-independent nucleic acid amplification and optimization for efficient massive parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Toon Rosseel

    Full Text Available Sequence Independent Single Primer Amplification is one of the most widely used random amplification approaches in virology for sequencing template preparation. This technique relies on oligonucleotides consisting of a 3' random part used to prime complementary DNA synthesis and a 5' defined tag sequence for subsequent amplification. Recently, this amplification method was combined with next generation sequencing to obtain viral sequences. However, these studies showed a biased distribution of the resulting sequence reads over the analyzed genomes. The aim of this study was to elucidate the mechanisms that lead to biased sequence depth when using random amplification. Avian paramyxovirus type 8 was used as a model RNA virus to investigate these mechanisms. We showed, based on in silico analysis of the sequence depth in relation to GC-content, predicted RNA secondary structure and sequence complementarity to the 3' part of the tag sequence, that the tag sequence has the main contribution to the observed bias in sequence depth. We confirmed this finding experimentally using both fragmented and non-fragmented viral RNAs as well as primers differing in random oligomer length (6 or 12 nucleotides and in the sequence of the amplification tag. The observed oligonucleotide annealing bias can be reduced by extending the random oligomer sequence and by in silico combining sequence data from SISPA experiments using different 5' defined tag sequences. These findings contribute to the optimization of random nucleic acid amplification protocols that are currently required for downstream applications such as viral metagenomics and microarray analysis.

  6. Sequences in language and text

    CERN Document Server

    Mikros, George K

    2015-01-01

    The aim of this volume is to present the diverse but highly interesting area of the quantitative analysis of the sequence of various linguistic structures. The collected articles present a wide spectrum of quantitative analyses of linguistic syntagmatic structures and explore novel sequential linguistic entities. This volume will be interesting to all researchers studying linguistics using quantitative methods.

  7. Single-primer fluorescent sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Ruth, J.L.; Morgan, C.A.; Middendorf, L.R.; Grone, D.L.; Brumbaugh, J.A.

    1987-05-01

    Modified linker arm oligonucleotides complementary to standard M13 priming sites were synthesized, labelled with either one, two, or three fluoresceins, and purified by reverse-phase HPLC. When used as primers in standard dideoxy M13 sequencing with /sup 32/P-dNTPs, normal autoradiographic patterns were obtained. To eliminate the radioactivity, direct on-line fluorescence detection was achieved by the use of a scanning 10 mW Argon laser emitting 488 nm light. Fluorescent bands were detected directly in standard 0.2 or 0.35 mm thick polyacrylamide gels at a distance of 24 cm from the loading wells by a photomultiplier tube filtered at 520 nm. Horizontal and temporal location of each band was displayed by computer as a band in real time, providing visual appearance similar to normal 4-lane autoradiograms. Using a single primer labelled with two fluoresceins, sequences of between 500 and 600 bases have been read in a single loading with better than 98% accuracy; up to 400 bases can be read reproducibly with no errors. More than 50 sequences have been determined by this method. This approach requires only 1-2 ug of cloned template, and produces continuous sequence data at about one band per minute.

  8. Single-cell semiconductor sequencing.

    Science.gov (United States)

    Kohn, Andrea B; Moroz, Tatiana P; Barnes, Jeffrey P; Netherton, Mandy; Moroz, Leonid L

    2013-01-01

    RNA-seq or transcriptome analysis of individual cells and small-cell populations is essential for virtually any biomedical field. It is especially critical for developmental, aging, and cancer biology as well as neuroscience where the enormous heterogeneity of cells present a significant methodological and conceptual challenge. Here we present two methods that allow for fast and cost-efficient transcriptome sequencing from ultra-small amounts of tissue or even from individual cells using semiconductor sequencing technology (Ion Torrent, Life Technologies). The first method is a reduced representation sequencing which maximizes capture of RNAs and preserves transcripts' directionality. The second, a template-switch protocol, is designed for small mammalian neurons. Both protocols, from cell/tissue isolation to final sequence data, take up to 4 days. The efficiency of these protocols has been validated with single hippocampal neurons and various invertebrate tissues including individually identified neurons within a simpler memory-forming circuit of Aplysia californica and early (1-, 2-, 4-, 8-cells) embryonic and developmental stages from basal metazoans.

  9. Farey Sequences and Resistor Networks

    Indian Academy of Sciences (India)

    Sameen Ahmed Khan

    2012-05-01

    In this article, we employ the Farey sequence and Fibonacci numbers to establish strict upper and lower bounds for the order of the set of equivalent resistances for a circuit constructed from equal resistors combined in series and in parallel. The method is applicable for networks involving bridge and non-planar circuits.

  10. Instruction Sequences for Computer Science

    NARCIS (Netherlands)

    J.A. Bergstra; C.A. Middelburg

    2012-01-01

    This book demonstrates that the concept of an instruction sequence offers a novel and useful viewpoint on issues relating to diverse subjects in computer science. Selected issues relating to well-known subjects from the theory of computation and the area of computer architecture are rigorously inves

  11. Fractals in DNA sequence analysis

    Institute of Scientific and Technical Information of China (English)

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  12. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol;

    2010-01-01

    BACKGROUND: The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. RESULTS: Assemblies......) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30x genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were...

  13. Stream cipher based on GSS sequences

    Institute of Scientific and Technical Information of China (English)

    HU Yupu; XIAO Guozhen

    2004-01-01

    Generalized self-shrinking sequences, simply named the GSS sequences,are novel periodic sequences that have many advantages in cryptography. In this paper,we give several results about GSS sequence's application to cryptography. First, we give a simple method for selecting those GSS sequences whose least periods reach the maximum. Second, we give a method for describing and computing the auto-correlation coefficients of GSS sequences. Finally, we point out that some GSS sequences, when used as stream ciphers, have a security weakness.

  14. The Toothpick Sequence and Other Sequences from Cellular Automata

    CERN Document Server

    Applegate, David; Sloane, N J A

    2010-01-01

    A two-dimensional arrangement of toothpicks is constructed by the following iterative procedure. At stage 1, place a single toothpick of length 1 on a square grid, aligned with the y-axis. At each subsequent stage, for every exposed toothpick end, place an orthogonal toothpick centered at that end. The resulting structure has a fractal-like appearance. We will analyze the toothpick sequence, which gives the total number of toothpicks after n steps. We also study several related sequences that arise from enumerating active cells in cellular automata. Some unusual recurrences appear: a typical example is that instead of the Fibonacci recurrence, which we may write as a(2+i) = a(i) + a(i+1), we set n = 2^k+i (0 = 0} (1+x^{2^k-1}+2x^{2^k}) and variations thereof.

  15. Sequence-structure relations of biopolymers

    CERN Document Server

    Barrett, Christopher; Reidys, Christian M

    2015-01-01

    Motivation: DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identify more general embedded "patterns" in DNA and RNA sequences. Results: We compute the partition function of sequences with respect to a fixed structure and connect this computation to the mutual information of a sequence-structure pair for RNA secondary structures. We present a Boltzmann sampler and obtain the a priori probability of specific sequence patterns. We present a detailed analysis for the three PDB-structures, 2JXV (hairpin), 2N3R (3-branch multi-loop) and 1EHZ (tRNA). We localize specific sequence patterns, contrast the energy spectrum of the Boltzmann sampled sequences versus those sequences that refold ...

  16. Orthogonal Basis Spreading Sequence for Optimal CDMA

    OpenAIRE

    Tsuda, Hirofumi; Umeno, Ken

    2016-01-01

    Recently, new spreading sequences have been proposed to multiplex the capacity of users. In particular, Weyl spreading sequences have the larger capacity of users than Gold code. This paper shows that Weyl spreading sequences appear in bit restoring model and they are orthogonal basis in the particular situation. This result shows the reason why they have the large capacity and that any spreading sequence are expressed as the sum of Weyl spreading sequences.

  17. Hardware Acceleration of Bioinformatics Sequence Alignment Applications

    OpenAIRE

    Hasan, L.

    2011-01-01

    Biological sequence alignment is an important and challenging task in bioinformatics. Alignment may be defined as an arrangement of two or more DNA or protein sequences to highlight the regions of their similarity. Sequence alignment is used to infer the evolutionary relationship between a set of protein or DNA sequences. An accurate alignment can provide valuable information for experimentation on the newly found sequences. It is indispensable in basic research as well as in practical applic...

  18. Hardware Accelerated Sequence Alignment with Traceback

    OpenAIRE

    Scott Lloyd; Snell, Quinn O

    2009-01-01

    Biological sequence alignment is an essential tool used in molecular biology and biomedical applications. The growing volume of genetic data and the complexity of sequence alignment present a challenge in obtaining alignment results in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is pres...

  19. Nonparametric Inference for Periodic Sequences

    KAUST Repository

    Sun, Ying

    2012-02-01

    This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.

  20. Cassini Mission Sequence Subsystem (MSS)

    Science.gov (United States)

    Alland, Robert

    2011-01-01

    This paper describes my work with the Cassini Mission Sequence Subsystem (MSS) team during the summer of 2011. It gives some background on the motivation for this project and describes the expected benefit to the Cassini program. It then introduces the two tasks that I worked on - an automatic system auditing tool and a series of corrections to the Cassini Sequence Generator (SEQ_GEN) - and the specific objectives these tasks were to accomplish. Next, it details the approach I took to meet these objectives and the results of this approach, followed by a discussion of how the outcome of the project compares with my initial expectations. The paper concludes with a summary of my experience working on this project, lists what the next steps are, and acknowledges the help of my Cassini colleagues.

  1. Genome Sequence of Mycobacteriophage Momo.

    Science.gov (United States)

    Pope, Welkin H; Bina, Elizabeth A; Brahme, Indraneel S; Hill, Amy B; Himmelstein, Philip H; Hunsicker, Sara M; Ish, Amanda R; Le, Tinh S; Martin, Mary M; Moscinski, Catherine N; Shetty, Sameer A; Swierzewski, Tomasz; Iyengar, Varun B; Kim, Hannah; Schafer, Claire E; Grubb, Sarah R; Warner, Marcie H; Bowman, Charles A; Russell, Daniel A; Hatfull, Graham F

    2015-06-18

    Momo is a newly discovered phage of Mycobacterium smegmatis mc(2)155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages.

  2. Sequencing Trade and Monetary Integration

    OpenAIRE

    Richard Pomfret

    2005-01-01

    Regional integration for at least the last sixty years has focused on trade integration. Balassa’s canonical taxonomy of regional trading arrangements is often interpreted as a sequence from free trade area through customs union and common market to economic union. In the 1980s the concept of deep integration went beyond trade with its focus on policy harmonization, which came to include monetary integration, but it presupposed trade integration as the first step in the regional integration s...

  3. Sequencing technologies to maximize recovery

    Energy Technology Data Exchange (ETDEWEB)

    Dusseault, M.B. [Waterloo Univ., ON (Canada)

    2006-07-01

    The deliberate sequencing of extraction technologies for viscous oil may optimize financial returns. Sequencing is based on an understanding that some technologies improve reservoir transport parameters through phenomena such as shear dilation; fracturing of shales through high pressure injection; thermal consolidation; shear rupture of clay dustings and shale laminae; and the dislodging of pore blocking agents. Cold heavy oil production with sand (CHOPS) increases reservoir permeability, porosity and compressibility, and can alter stress fields and flow paths, which can in turn lead to more effective subsequent thermal or gravity methods of extraction. High pressure thermal extraction processes such as steam flood (SF) and cyclic steam stimulation (CSS) generate reservoir dilation and viscosity reduction which mean that subsequent gravity methods such as VAPEX will achieve increased extraction efficiency. Massive dilation during cyclic injection phases means that recompaction drive mechanisms will improve recovery rates and recovery factors. Successful planning means that the physics and impacts on the reservoir of all commercial technologies must be understood, and initial systems of exploitation should be designed to minimize future well needs. It was concluded that if implemented correctly, the impact of sequencing on technically recoverable reserves estimates in heavy oil will be considerable. Recoverable reserves increases exceeding a trillion barrels are anticipated. 12 refs., 8 figs.

  4. Sequencing technologies to maximize recovery

    International Nuclear Information System (INIS)

    The deliberate sequencing of extraction technologies for viscous oil may optimize financial returns. Sequencing is based on an understanding that some technologies improve reservoir transport parameters through phenomena such as shear dilation; fracturing of shales through high pressure injection; thermal consolidation; shear rupture of clay dustings and shale laminae; and the dislodging of pore blocking agents. Cold heavy oil production with sand (CHOPS) increases reservoir permeability, porosity and compressibility, and can alter stress fields and flow paths, which can in turn lead to more effective subsequent thermal or gravity methods of extraction. High pressure thermal extraction processes such as steam flood (SF) and cyclic steam stimulation (CSS) generate reservoir dilation and viscosity reduction which mean that subsequent gravity methods such as VAPEX will achieve increased extraction efficiency. Massive dilation during cyclic injection phases means that recompaction drive mechanisms will improve recovery rates and recovery factors. Successful planning means that the physics and impacts on the reservoir of all commercial technologies must be understood, and initial systems of exploitation should be designed to minimize future well needs. It was concluded that if implemented correctly, the impact of sequencing on technically recoverable reserves estimates in heavy oil will be considerable. Recoverable reserves increases exceeding a trillion barrels are anticipated. 12 refs., 8 figs

  5. Bernoulli measure of complex admissible kneading sequences

    CERN Document Server

    Bruin, Henk

    2012-01-01

    Iterated quadratic polynomials give rise to a rich collection of different dynamical systems that are parametrized by a simple complex parameter $c$. The different dynamical features are encoded by the \\emph{kneading sequence} which is an infinite sequence over $\\{0,\\1\\}$. Not every such sequence actually occurs in complex dynamics. The set of admissible kneading sequences was described by Milnor and Thurston for real quadratic polynomials, and by the authors in the complex case. We prove that the set of admissible kneading sequences has positive Bernoulli measure within the set of sequences over $\\{0,\\1\\}$.

  6. Static multiplicities in heterogeneous azeotropic distillation sequences

    DEFF Research Database (Denmark)

    Esbjerg, Klavs; Andersen, Torben Ravn; Jørgensen, Sten Bay;

    1998-01-01

    In this paper the results of a bifurcation analysis on heterogeneous azeotropic distillation sequences are given. Two sequences suitable for ethanol dehydration are compared: The 'direct' and the 'indirect' sequence. It is shown, that the two sequences, despite their similarities, exhibit very...... performances are compared for minimal impurities in both products, where the direct sequence exhibits output multiplicity, while the indirect sequence exhibits state multiplicity. The latter multiplicity may be avoided by accepting a slightly increased impurity in the ethanol product. Copyright (C) 1998 IFAC....

  7. Modified Genetic Algorithm for DNA Sequence Assembly by Shotgun and Hybridization Sequencing Techniques

    Directory of Open Access Journals (Sweden)

    Prof.Narayan Kumar Sahu

    2012-09-01

    Full Text Available Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion of pair wise fragment overlap. While shotgun sequencing infers a DNA sequence given the sequences of overlapping fragments, a recent and complementary method, called sequencing by hybridization (SBH, infers a DNA sequence given the set of oligomers that represents all sub words of some fixed length, k. In this paper, we propose a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods. Based on our preliminary investigations, the algorithm promises- to be very fast and practical for DNA sequence assembly [1].

  8. Direct Chloroplast Sequencing: Comparison of Sequencing Platforms and Analysis Tools for Whole Chloroplast Barcoding

    OpenAIRE

    Marta Brozynska; Agnelo Furtado; Robert James Henry

    2014-01-01

    Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genom...

  9. Integration of retinal image sequences

    Science.gov (United States)

    Ballerini, Lucia

    1998-10-01

    In this paper a method for noise reduction in ocular fundus image sequences is described. The eye is the only part of the human body where the capillary network can be observed along with the arterial and venous circulation using a non invasive technique. The study of the retinal vessels is very important both for the study of the local pathology (retinal disease) and for the large amount of information it offers on systematic haemodynamics, such as hypertension, arteriosclerosis, and diabetes. In this paper a method for image integration of ocular fundus image sequences is described. The procedure can be divided in two step: registration and fusion. First we describe an automatic alignment algorithm for registration of ocular fundus images. In order to enhance vessel structures, we used a spatially oriented bank of filters designed to match the properties of the objects of interest. To evaluate interframe misalignment we adopted a fast cross-correlation algorithm. The performances of the alignment method have been estimated by simulating shifts between image pairs and by using a cross-validation approach. Then we propose a temporal integration technique of image sequences so as to compute enhanced pictures of the overall capillary network. Image registration is combined with image enhancement by fusing subsequent frames of a same region. To evaluate the attainable results, the signal-to-noise ratio was estimated before and after integration. Experimental results on synthetic images of vessel-like structures with different kind of Gaussian additive noise as well as on real fundus images are reported.

  10. Infinite matrices and sequence spaces

    CERN Document Server

    Cooke, Richard G

    2014-01-01

    This clear and correct summation of basic results from a specialized field focuses on the behavior of infinite matrices in general, rather than on properties of special matrices. Three introductory chapters guide students to the manipulation of infinite matrices, covering definitions and preliminary ideas, reciprocals of infinite matrices, and linear equations involving infinite matrices.From the fourth chapter onward, the author treats the application of infinite matrices to the summability of divergent sequences and series from various points of view. Topics include consistency, mutual consi

  11. Instruction Sequences with Indirect Jumps

    Directory of Open Access Journals (Sweden)

    C.A. Middelburg

    2007-01-01

    Full Text Available We study sequential programs that are instruction sequences with direct and indirect jump instructions. The intuition is that indirect jump instructions are jump instructions where the position of the instruction to jump to is the content of some memory cell.We consider several kinds of indirect jump instructions. For each kind, we define the meaning of programs with indirect jump instructions of that kind by means of a translation into programs without indirect jump instructions. For each kind, the intended behaviour of a program with indirect jump instructions of that kind under execution is the behaviour of the translated program under execution on interaction with some memory device.

  12. Arithmetic Self-Similarity of Infinite Sequences

    CERN Document Server

    Hendriks, Dimitri; Endrullis, Joerg; Dow, Mark; Klop, Jan Willem

    2012-01-01

    We define the arithmetic self-similarity (AS) of a one-sided infinite sequence sigma to be the set of arithmetic progressions through sigma which are a vertical shift of sigma. We classify the AS of several well-known sequences, such as the Thue-Morse sequence, the period doubling sequence, and the regular paperfolding sequence. The latter two are examples of (completely) additive sequences as well as of Toeplitz words. We investigate the intersection of these families. We give a complete characterization of single-gap patterns that yield additive Toeplitz words, and classify their AS. Moreover, we show that every arithmetic progression through a Toeplitz word generated by a one-gap pattern is again a Toeplitz word. Finally, we establish that generalized Morse sequences are specific sum-of-digits sequences, and show that their first difference is a Toeplitz word.

  13. An Assignment Sequence for Underprepared Writers.

    Science.gov (United States)

    Nimmo, Kristi

    2000-01-01

    Presents a sequenced writing assignment on shopping to aid basic writers. Describes a writing assignment focused around online and mail-order shopping. Notes steps in preparing for the assignment, the sequence, and discusses responses to the assignments. (SC)

  14. On Paranorm Zweier -Convergent Sequence Spaces

    Directory of Open Access Journals (Sweden)

    Vakeel A. Khan

    2013-01-01

    Full Text Available In this paper, we introduce the paranorm Zweier -convergent sequence spaces , , and , a sequence of positive real numbers. We study some topological properties, prove the decomposition theorem, and study some inclusion relations on these spaces.

  15. Goldbach conjecture sequences in quantum mechanics

    OpenAIRE

    Prudencio, Thiago; Silva, Edilberto O.

    2013-01-01

    We show that there is a correspondence between Goldbach conjecture sequences (GCS) and expectation values of the number operator in Fock states. We demonstrate that depending on the normalization or not of Fock state superpositions, we have sequences that are equivalent and sequences that are not equivalent to GCS. We propose an algorithm where sequences equivalent to GCS can be derived in terms of expectation values with normalized states. Defining states whose projections generate GCS, we r...

  16. Mappings of Type Special Space of Sequences

    Directory of Open Access Journals (Sweden)

    Awad A. Bakery

    2016-01-01

    Full Text Available We give sufficient conditions on a special space of sequences defined by Mohamed and Bakery (2013 such that the finite rank operators are dense in the complete space of operators whose approximation numbers belong to this sequence space. Hence, under a few conditions, every compact operator would be approximated by finite rank operators. We apply it on the sequence space defined by Tripathy and Mahanta (2003. Our results match those known for p-absolutely summable sequences of reals.

  17. Approximate word matches between two random sequences

    OpenAIRE

    Burden, Conrad J.; Kantorovitz, Miriam R; Wilson, Susan R

    2008-01-01

    Given two sequences over a finite alphabet $\\mathcal{L}$ , the D2 statistic is the number of m-letter word matches between the two sequences. This statistic is used in bioinformatics for expressed sequence tag database searches. Here we study a generalization of the D2 statistic in the context of DNA sequences, under the assumption of strand symmetric Bernoulli text. For k

  18. Discovering motifs that induce sequencing errors

    OpenAIRE

    Allhoff, Manuel; Schoenhuth, Alexander; Martin, M.(Deutsches Elektronen-Synchrotron, Hamburg, Germany); Costa, I.G.; Rahmann, S.; Marschall, Tobias

    2013-01-01

    Background Elevated sequencing error rates are the most predominant obstacle in single-nucleotide polymorphism (SNP) detection, which is a major goal in the bulk of current studies using next-generation sequencing (NGS). Beyond routinely handled generic sources of errors, certain base calling errors relate to specific sequence patterns. Statistically principled ways to associate sequence patterns with base calling errors have not been previously described. Extant approaches either incur decis...

  19. Nanopore DNA sequencing with MspA

    OpenAIRE

    Derrington, Ian M.; Butler, Tom Z.; Collins, Marcus D.; Manrao, Elizabeth; Pavlenok, Mikhail; Niederweis, Michael; Gundlach, Jens H.

    2010-01-01

    Nanopore sequencing has the potential to become a direct, fast, and inexpensive DNA sequencing technology. The simplest form of nanopore DNA sequencing utilizes the hypothesis that individual nucleotides of single-stranded DNA passing through a nanopore will uniquely modulate an ionic current flowing through the pore, allowing the record of the current to yield the DNA sequence. We demonstrate that the ionic current through the engineered Mycobacterium smegmatis porin A, MspA, has the ability...

  20. Quantitative phenotyping via deep barcode sequencing

    OpenAIRE

    SMITH, ANDREW M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey

    2009-01-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied...

  1. PacBio Sequencing and Its Applications

    Institute of Scientific and Technical Information of China (English)

    Anthony Rhoads; Kin Fai Au

    2015-01-01

    Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with dis-eases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Addition-ally, PacBio’s sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone.

  2. Incidental Sequence Learning across the Lifespan

    Science.gov (United States)

    Weiermann, Brigitte; Meier, Beat

    2012-01-01

    The purpose of the present study was to investigate incidental sequence learning across the lifespan. We tested 50 children (aged 7-16), 50 young adults (aged 20-30), and 50 older adults (aged >65) with a sequence learning paradigm that involved both a task and a response sequence. After several blocks of practice, all age groups slowed down…

  3. RNAome sequencing delineates the complete RNA landscape

    NARCIS (Netherlands)

    K.W.J. Derks (Kasper); J. Pothof (Joris)

    2015-01-01

    textabstractStandard RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species. For example, small and large RNAs from the same sample cannot be sequenced in a single sequence run. We designed RNAome sequencing, which is a strand-specif

  4. Hardware Acceleration of Bioinformatics Sequence Alignment Applications

    NARCIS (Netherlands)

    Hasan, L.

    2011-01-01

    Biological sequence alignment is an important and challenging task in bioinformatics. Alignment may be defined as an arrangement of two or more DNA or protein sequences to highlight the regions of their similarity. Sequence alignment is used to infer the evolutionary relationship between a set of pr

  5. The recurrence sequence via the Fibonacci groups

    Science.gov (United States)

    Aküzüm, Yeşim; Deveci, Ömür

    2016-04-01

    This work develops properties of the recurrence sequence defined by the aid of the relation matrix of the Fibonacci groups. The study of this sequence modulo m yields cyclic groups and semigroups from generating matrix. Finally, we extend the sequence defined to groups and then, we obtain its period in the Fibonacci groups.

  6. Sequence Analysis in Demographic Research

    Directory of Open Access Journals (Sweden)

    Billari, Francesco C.

    2001-01-01

    Full Text Available EnglishThis paper examines the salient features of sequence analysis in demogrpahicresearch. The new approach allows a holistic perspective on life course analysis and is based on arepresentation of lives as sequences of states. Some of the methods for analyzing such data aresketched, from complex description to optimal matching ot monoethetic divisive algorithms. Afer ashort ilustration of a demographically-relevant example, the needs in terms of data collection and theopportunities of applying the same aproach to synthetic data are discussed.FrenchOn examine ici les principaux éléments de l’analyse par séquence endémographie. Cette nouvelle technique permet une perspective unifiée del’analyse du cours de la vie, en représentant la vie comme une série d’états.Certaines des méthodes pour de telles analyses sont décrites, en commençant parla description complexe, pour considérer ensuite les alignements optimales, etles algorithmes de division. Après un court exemple en démographie, onconsidère les besoins en données et les possibilités d’application aux donnéessynthétique.

  7. Artificial sequences and complexity measures

    Science.gov (United States)

    Baronchelli, Andrea; Caglioti, Emanuele; Loreto, Vittorio

    2005-04-01

    In this paper we exploit concepts of information theory to address the fundamental problem of identifying and defining the most suitable tools for extracting, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to define a measure of remoteness and distance between pairs of sequences of characters (e.g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of artificial text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self-consistent classification.

  8. Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

    Directory of Open Access Journals (Sweden)

    Leho Tedersoo

    Full Text Available Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/ for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/, the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

  9. Exome sequencing: what clinicians need to know

    Directory of Open Access Journals (Sweden)

    Sastre L

    2014-03-01

    Full Text Available Leandro SastreInstituto de Investigaciones Biomédicas, CSIC/UAM, C/Arturo Duperier 4, Madrid, Spain; Terapias Experimentales y Biomarcadores en Cáncer, IdiPaz, Madrid, Spain; CIBER de Enfermedades Raras, CIBERER, Valencia, SpainAbstract: The recent development of high throughput methods of deoxyribonucleic acid (DNA sequencing has made it possible to determine individual genome sequences and their specific variations. A region of particular interest is the protein-coding part of the genome, or exome, which is composed of gene exons. The principles of exome purification and sequencing will be described in this review, as well as analyses of the data generated. Results will be discussed in terms of their possible functional and clinical significance. The advantages and limitations of exome sequencing will be compared to those of other massive sequencing approaches such as whole-genome sequencing, ribonucleic acid sequencing or selected DNA sequencing. Exome sequencing has been used recently in the study of various diseases. Monogenic diseases with Mendelian inheritance are among these, but studies have also been carried out on genetic variations that represent risk factors for complex diseases. Cancer is another intensive area for exome sequencing studies. Several examples of the use of exome sequencing in the diagnosis, prognosis, and treatment of these diseases will be described. Finally, remaining challenges and some practical and ethical considerations for the clinical application of exome sequencing will be discussed.Keywords: massively parallel sequencing, RNA sequencing, whole-genome sequencing, genetic variants, molecular diagnosis, pharmacogenomics, personalized medicine, NGS, SGS, SNP, SNV

  10. Problems of Sequence Stratigraphy in China

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    A comprehensive study of outcrop sequence stratigraphy in China began in the early 1990s.The investigated strata range from Mesoproterozoic to Quaternary and the studied areas cover the three platforms and margins, the Southern Himalayas and the East China and South China seas.Problems of general concern in the sequence stratigraphy of China are discussed. These are: the hierarchy for sequence stratigraphy, the third-order Sequence and eustasy, the chronostratigraphic boundaries and GSSP, and the International Stratigraphic Chart and the sequence chronostratigraphy of China. The average time interval of Mesosequence (25-40 Ma) and of the Sequence (2-5 Ma) is suggested and the minor sequences below the Sequence are discussed. The time interval of the Sequence shows no evident decrease with time, but several epochs with remarkable short intervals occur in the Phanerozoic, which may represent a planetary behavior denoting the special development stages in earth's evolution. Sea level change curves are given separately for the three platforms and the different regions. The Global Stratotype Section and Point (GSSP) concept and practice are discussed, and a comparison between the first appearance point of biozone and the first flooding surface in the Sequence is made for designation of the chronostratigraphic boundary.It is suggested that the chronostratigraphic boundaries might be set at the first flooding surface in the Sequence for easy recognition. The idea of sequence chronostratigraphy is recommended, and a comparison between the International Stratigraphic Chart and the sequence chronostratigraphy of China is made. The close relation between chronostratigraphy and sequence stratigraphy makes it possible for sequence stratigraphy to improve chronostratigraphic research. It is pointed out that multidisciplinary study in chronostratigraphy is necessary and should be promising and profitable.

  11. Permutation Entropy for Random Binary Sequences

    Directory of Open Access Journals (Sweden)

    Lingfeng Liu

    2015-12-01

    Full Text Available In this paper, we generalize the permutation entropy (PE measure to binary sequences, which is based on Shannon’s entropy, and theoretically analyze this measure for random binary sequences. We deduce the theoretical value of PE for random binary sequences, which can be used to measure the randomness of binary sequences. We also reveal the relationship between this PE measure with other randomness measures, such as Shannon’s entropy and Lempel–Ziv complexity. The results show that PE is consistent with these two measures. Furthermore, we use PE as one of the randomness measures to evaluate the randomness of chaotic binary sequences.

  12. Chip-based sequencing nucleic acids

    Science.gov (United States)

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  13. MatrixPlot: visualizing sequence constraints

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Stærfeldt, Hans Henrik; Lund, Ole;

    1999-01-01

    MatrixPlot: visualizing sequence constraints. Sub-title Abstract Summary : MatrixPlot is a program for making high-quality matrix plots, such as mutual information plots of sequence alignments and distance matrices of sequences with known three-dimensional coordinates. The user can add information...... about the sequences (e.g. a sequence logo profile) along the edges of the plot, as well as zoom in on any region in the plot. Availability : MatrixPlot can be obtained on request, and can also be accessed online at http://www. cbs.dtu.dk/services/MatrixPlot. Contact : gorodkin@cbs.dtu.dk...

  14. Design of Digital Hybrid Chaotic Sequence Generator

    Institute of Scientific and Technical Information of China (English)

    RAO Nini; ZENG Dong

    2004-01-01

    The feasibility of the hybrid chaotic sequences as the spreading codes in code divided multiple access(CDMA) system is analyzed.The design and realization of the digital hybrid chaotic sequence generator by very high speed integrated circuit hardware description language(VHDL) are described.A valid hazard canceledl method is presented.Computer simulations show that the stable digital sequence waveforms can be produced.The correlations of the digital hybrid chaotic sequences are compared with those of m-sequences.The results show that the correlations of the digital hybrid chaotic sequences are almost as good as those of m-sequences.The works in this paper explored a road for the practical applications of chaos.

  15. Nimble Protein Sequence Alignment in Grid (NPSAG

    Directory of Open Access Journals (Sweden)

    K. Somasundaram

    2008-01-01

    Full Text Available In Bio-Informatics application, the analysis of protein sequence is a kind of computation driven science which has rapidly and quickly growing biological data. Also databases used in these applications are heterogeneous in nature and alignment of protein sequence using physical techniques is expensive, slow and results are not always guaranteed/accurate. So this application requires cross-platform, cost-effective and more computing power algorithm for sequence matching and searching a sequence in database. Grid is one of the most emerging technologies of cost effective computing paradigm for large class of data and compute intensive application which enables large-scale aggregation and sharing of computational data and other resources across institutional boundaries. We proposed the Grid architecture for searching of distributed, heterogeneous genomic databases which contained protein sequences to speed up the analysis of large scale sequence data and performed sequence alignment for residues match.

  16. Computing with Hereditarily Finite Sequences

    CERN Document Server

    Tarau, Paul

    2011-01-01

    e use Prolog as a flexible meta-language to provide executable specifications of some fundamental mathematical objects and their transformations. In the process, isomorphisms are unraveled between natural numbers and combinatorial objects (rooted ordered trees representing hereditarily finite sequences and rooted ordered binary trees representing G\\"odel's System {\\bf T} types). This paper focuses on an application that can be seen as an unexpected "paradigm shift": we provide recursive definitions showing that the resulting representations are directly usable to perform symbolically arbitrary-length integer computations. Besides the theoretically interesting fact of "breaking the arithmetic/symbolic barrier", the arithmetic operations performed with symbolic objects like trees or types turn out to be genuinely efficient -- we derive implementations with asymptotic performance comparable to ordinary bitstring implementations of arbitrary-length integer arithmetic. The source code of the paper, organized as a ...

  17. Automatic Sequencing for Experimental Protocols

    Science.gov (United States)

    Hsieh, Paul F.; Stern, Ivan

    We present a paradigm and implementation of a system for the specification of the experimental protocols to be used for the calibration of AXAF mirrors. For the mirror calibration, several thousand individual measurements need to be defined. For each measurement, over one hundred parameters need to be tabulated for the facility test conductor and several hundred instrument parameters need to be set. We provide a high level protocol language which allows for a tractable representation of the measurement protocol. We present a procedure dispatcher which automatically sequences a protocol more accurately and more rapidly than is possible by an unassisted human operator. We also present back-end tools to generate printed procedure manuals and database tables required for review by the AXAF program. This paradigm has been tested and refined in the calibration of detectors to be used in mirror calibration.

  18. The RNA world, automatic sequences and oncogenetics

    International Nuclear Information System (INIS)

    We construct a model of the RNA world in terms of naturally evolving nucleotide sequences assuming only Crick-Watson base pairing and self-cleaving/splicing capability. These sequences have the following properties. 1) They are recognizable by an automation (or automata). That is, to each k-sequence, there exist a k-automation which accepts, recognizes or generates the k-sequence. These are known as automatic sequences. Fibonacci and Morse-Thue sequences are the most natural outcome of pre-biotic chemical conditions. 2) Infinite (resp. large) sequences are self-similar (resp. nearly self-similar) under certain rewrite rules and consequently give rise to fractal (resp.fractal-like) structures. Computationally, such sequences can also be generated by their corresponding deterministic parallel re-write system, known as a DOL system. The self-similar sequences are fixed points of their respective rewrite rules. Some of these automatic sequences have the capability that they can read or 'accept' other sequences while others can detect errors and trigger error-correcting mechanisms. They can be enlarged and have block and/or palindrome structure. Linear recurring sequences such as Fibonacci sequence are simply Feed-back Shift Registers, a well know model of information processing machines. We show that a mutation of any rewrite rule can cause a combinatorial explosion of error and relates this to oncogenetical behavior. On the other hand, a mutation of sequences that are not rewrite rules, leads to normal evolutionary change. Known experimental results support our hypothesis. (author). Refs

  19. An enhanced algorithm for multiple sequence alignment of protein sequences using genetic algorithm

    OpenAIRE

    Kumar, Manish

    2015-01-01

    One of the most fundamental operations in biological sequence analysis is multiple sequence alignment (MSA). The basic of multiple sequence alignment problems is to determine the most biologically plausible alignments of protein or DNA sequences. In this paper, an alignment method using genetic algorithm for multiple sequence alignment has been proposed. Two different genetic operators mainly crossover and mutation were defined and implemented with the proposed method in order to know the pop...

  20. Software for pre-processing Illumina next-generation sequencing short read sequences

    OpenAIRE

    Chen, Chuming; Khaleel, Sari S; Huang, Hongzhan; Cathy H Wu

    2014-01-01

    Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been devel...

  1. Next-Generation Sequencing Techniques for Eukaryotic Microorganisms: Sequencing-Based Solutions to Biological Problems▿

    OpenAIRE

    Nowrousian, Minou

    2010-01-01

    Over the past 5 years, large-scale sequencing has been revolutionized by the development of several so-called next-generation sequencing (NGS) technologies. These have drastically increased the number of bases obtained per sequencing run while at the same time decreasing the costs per base. Compared to Sanger sequencing, NGS technologies yield shorter read lengths; however, despite this drawback, they have greatly facilitated genome sequencing, first for prokaryotic genomes and within the las...

  2. Assembly Sequence Planning for Mechanical Products

    Institute of Scientific and Technical Information of China (English)

    1999-01-01

    A method for assembly sequence planning is proposed in this paper. First, two methods for assembly sequence planning are compared, which are indirect method and direct method. Then, the limits of the previous assembly planning system are pointed out. On the basis of indirect method, an improved method for assembly sequence planning is put forward. This method is composed of four parts, which are assembly modeling for products, assembly sequence representing, assembly sequence planning, and evaluation and optimization. The assembly model is established by human machine interaction, and the assembly model contains components' information and the assembly relation among the components. The assembly sequence planning is based on the breaking up of the assembly model. And/or graph is used to represent assembly sequence set. Every component which satisfies the disassembly condition is recorded as a node of an and/or graph. After the disassembly sequence and/or graph is generated, heuristic algorithm - AO* algorithm is used to search the disassembly sequence and/or graph, and the optimum assembly sequence planning is realized. This method is proved to be effective in a prototype system which is a sub-project of a state 863/CIMS research project of China - ‘Concurrent Engineering’.

  3. Randomness in Sequence Evolution Increases over Time.

    Science.gov (United States)

    Wang, Guangyu; Sun, Shixiang; Zhang, Zhang

    2016-01-01

    The second law of thermodynamics states that entropy, as a measure of randomness in a system, increases over time. Although studies have investigated biological sequence randomness from different aspects, it remains unknown whether sequence randomness changes over time and whether this change consists with the second law of thermodynamics. To capture the dynamics of randomness in molecular sequence evolution, here we detect sequence randomness based on a collection of eight statistical random tests and investigate the randomness variation of coding sequences with an application to Escherichia coli. Given that core/essential genes are more ancient than specific/non-essential genes, our results clearly show that core/essential genes are more random than specific/non-essential genes and accordingly indicate that sequence randomness indeed increases over time, consistent well with the second law of thermodynamics. We further find that an increase in sequence randomness leads to increasing randomness of GC content and longer sequence length. Taken together, our study presents an important finding, for the first time, that sequence randomness increases over time, which may provide profound insights for unveiling the underlying mechanisms of molecular sequence evolution. PMID:27224236

  4. On the activation of bovine plasma factor XIII. Amino acid sequence of the peptide released by thrombin and the terminal residues of the subunit polypeptides.

    Science.gov (United States)

    Nakamura, S; Iwanaga, S; Suzuki, T

    1975-12-01

    A blood coagulation factor, Factor XIII, was highly purified from bovine fresh plasma by a method similar to those used for human plasma Factor XIII. The isolated Factor XIII consisted of two subunit polypeptides, a and b chains, with molecular weights of 79,000 +/- 2,000 and 75,000 +/- 2,000, respectively. In the conversion of Factor XIII to the active enzyme, Factor XIIIa, by bovine thrombin [EC 3.4.21.5], a peptide was liberated. This peptide, designated tentatively as "activation peptide," was isolated by gel-filtration on a Sephadex G-75 column. It contained a total of 37 amino acid residues with a masked N-terminal residue and C-terminal arginine. The whole amino acid sequence of "Activation peptide" was established by the dansyl-Edman method and standard enzymatic techniques, and the masked N-terminal residue was identified as N-acetylserine by using a rat liver acylamino acid-releasing enzyme. This enzyme specifically cleaved the N-acetylserylglutamyl peptide bond serine and the remaining peptide, which was now reactive to 1-dimethylamino-naphthalene-5-sulfonyl chloride. A comparison of the sequences of human and bovine "Activation peptide" revealed five amino acids replacements, Ser-3 to Thr; Gly-5 to Arg; Ile-14 to Val; Thr-18 to Asn, and Pro-26 to Leu. Another difference was the deletion of Leu-34 in the human peptide. Adsorption chromatography on a hydroxylapatite column in the presence of 0.1% sodium dodecyl sulfate was developed as a preparative procedure for the resolution of the two subunit polypeptides, a or a' chain and b chain, constituting the protein molecule of Factor XIII or Factor XIIIa. End group analyses on the isolated pure chains revealed that the structural change of Factor XIII during activation with thrombin occurs only in the N-terminal portion of the a chain, not in the N-terminal end of the b chain or in the C-terminal ends of the a and b chains. From these results, it was concluded that the activation of bovine plasma Factor XIII

  5. Predicting Contextual Sequences via Submodular Function Maximization

    CERN Document Server

    Dey, Debadeepta; Hebert, Martial; Bagnell, J Andrew

    2012-01-01

    Sequence optimization, where the items in a list are ordered to maximize some reward has many applications such as web advertisement placement, search, and control libraries in robotics. Previous work in sequence optimization produces a static ordering that does not take any features of the item or context of the problem into account. In this work, we propose a general approach to order the items within the sequence based on the context (e.g., perceptual information, environment description, and goals). We take a simple, efficient, reduction-based approach where the choice and order of the items is established by repeatedly learning simple classifiers or regressors for each "slot" in the sequence. Our approach leverages recent work on submodular function maximization to provide a formal regret reduction from submodular sequence optimization to simple cost-sensitive prediction. We apply our contextual sequence prediction algorithm to optimize control libraries and demonstrate results on two robotics problems: ...

  6. Evolutionarily conserved sequences on human chromosome 21

    Energy Technology Data Exchange (ETDEWEB)

    Frazer, Kelly A.; Sheehan, John B.; Stokowski, Renee P.; Chen, Xiyin; Hosseini, Roya; Cheng, Jan-Fang; Fodor, Stephen P.A.; Cox, David R.; Patil, Nila

    2001-09-01

    Comparison of human sequences with the DNA of other mammals is an excellent means of identifying functional elements in the human genome. Here we describe the utility of high-density oligonucleotide arrays as a rapid approach for comparing human sequences with the DNA of multiple species whose sequences are not presently available. High-density arrays representing approximately 22.5 Mb of nonrepetitive human chromosome 21 sequence were synthesized and then hybridized with mouse and dog DNA to identify sequences conserved between humans and mice (human-mouse elements) and between humans and dogs (human-dog elements). Our data show that sequence comparison of multiple species provides a powerful empiric method for identifying actively conserved elements in the human genome. A large fraction of these evolutionarily conserved elements are present in regions on chromosome 21 that do not encode known genes.

  7. Comparison of next-generation sequencing systems.

    Science.gov (United States)

    Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie

    2012-01-01

    With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized.

  8. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  9. Nanopore DNA sequencing with MspA.

    Science.gov (United States)

    Derrington, Ian M; Butler, Tom Z; Collins, Marcus D; Manrao, Elizabeth; Pavlenok, Mikhail; Niederweis, Michael; Gundlach, Jens H

    2010-09-14

    Nanopore sequencing has the potential to become a direct, fast, and inexpensive DNA sequencing technology. The simplest form of nanopore DNA sequencing utilizes the hypothesis that individual nucleotides of single-stranded DNA passing through a nanopore will uniquely modulate an ionic current flowing through the pore, allowing the record of the current to yield the DNA sequence. We demonstrate that the ionic current through the engineered Mycobacterium smegmatis porin A, MspA, has the ability to distinguish all four DNA nucleotides and resolve single-nucleotides in single-stranded DNA when double-stranded DNA temporarily holds the nucleotides in the pore constriction. Passing DNA with a series of double-stranded sections through MspA provides proof of principle of a simple DNA sequencing method using a nanopore. These findings highlight the importance of MspA in the future of nanopore sequencing. PMID:20798343

  10. Pareto optimal pairwise sequence alignment.

    Science.gov (United States)

    DeRonne, Kevin W; Karypis, George

    2013-01-01

    Sequence alignment using evolutionary profiles is a commonly employed tool when investigating a protein. Many profile-profile scoring functions have been developed for use in such alignments, but there has not yet been a comprehensive study of Pareto optimal pairwise alignments for combining multiple such functions. We show that the problem of generating Pareto optimal pairwise alignments has an optimal substructure property, and develop an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. All possible sets of two, three, and four profile scoring functions are used from a pool of 11 functions and applied to 588 pairs of proteins in the ce_ref data set. The performance of the best objective combinations on ce_ref is also evaluated on an independent set of 913 protein pairs extracted from the BAliBASE RV11 data set. Our dynamic-programming-based heuristic approach produces approximated Pareto optimal frontiers of pairwise alignments that contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments whose quality is better than the alignments obtained by single objectives. However, the task of identifying a single high-quality alignment among those in the Pareto frontier remains challenging.

  11. On the Origin of Sequence

    Directory of Open Access Journals (Sweden)

    Peter T. S. van der Gulik

    2015-11-01

    Full Text Available Three aspects which make planet Earth special, and which must be taken in consideration with respect to the emergence of peptides, are the mineralogical composition, the Moon which is in the same size class, and the triple environment consisting of ocean, atmosphere, and continent. GlyGly is a remarkable peptide because it stimulates peptide bond formation in the Salt-Induced Peptide Formation reaction. The role glycine and aspartic acid play in the active site of RNA polymerase is remarkable too. GlyGly might have been the original product of coded peptide synthesis because of its importance in stimulating the production of oligopeptides with a high aspartic acid content, which protected small RNA molecules by binding Mg2+ ions. The feedback loop, which is closed by having RNA molecules producing GlyGly, is proposed as the essential element fundamental to life. Having this system running, longer sequences could evolve, gradually solving the problem of error catastrophe. The basic structure of the standard genetic code (8 fourfold degenerate codon boxes and 8 split codon boxes is an example of the way information concerning the emergence of life is frozen in the biological constitution of organisms: the structure of the code contains historical information.

  12. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    Science.gov (United States)

    de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084

  13. Predicting Contextual Sequences via Submodular Function Maximization

    OpenAIRE

    Dey, Debadeepta; Liu, Tian Yu; Hebert, Martial; Bagnell, J. Andrew

    2012-01-01

    Sequence optimization, where the items in a list are ordered to maximize some reward has many applications such as web advertisement placement, search, and control libraries in robotics. Previous work in sequence optimization produces a static ordering that does not take any features of the item or context of the problem into account. In this work, we propose a general approach to order the items within the sequence based on the context (e.g., perceptual information, environment description, ...

  14. cis sequence effects on gene expression

    Directory of Open Access Journals (Sweden)

    Jacobs Kevin

    2007-08-01

    Full Text Available Abstract Background Sequence and transcriptional variability within and between individuals are typically studied independently. The joint analysis of sequence and gene expression variation (genetical genomics provides insight into the role of linked sequence variation in the regulation of gene expression. We investigated the role of sequence variation in cis on gene expression (cis sequence effects in a group of genes commonly studied in cancer research in lymphoblastoid cell lines. We estimated the proportion of genes exhibiting cis sequence effects and the proportion of gene expression variation explained by cis sequence effects using three different analytical approaches, and compared our results to the literature. Results We generated gene expression profiling data at N = 697 candidate genes from N = 30 lymphoblastoid cell lines for this study and used available candidate gene resequencing data at N = 552 candidate genes to identify N = 30 candidate genes with sufficient variance in both datasets for the investigation of cis sequence effects. We used two additive models and the haplotype phylogeny scanning approach of Templeton (Tree Scanning to evaluate association between individual SNPs, all SNPs at a gene, and diplotypes, with log-transformed gene expression. SNPs and diplotypes at eight candidate genes exhibited statistically significant (p cis sequence effects in our study, respectively. Conclusion Based on analysis of our results and the extant literature, one in four genes exhibits significant cis sequence effects, and for these genes, about 30% of gene expression variation is accounted for by cis sequence variation. Despite diverse experimental approaches, the presence or absence of significant cis sequence effects is largely supported by previously published studies.

  15. Improving Recurrent Neural Networks For Sequence Labelling

    OpenAIRE

    Dinarelli, Marco; Tellier, Isabelle

    2016-01-01

    In this paper we study different types of Recurrent Neural Networks (RNN) for sequence labeling tasks. We propose two new variants of RNNs integrating improvements for sequence labeling, and we compare them to the more traditional Elman and Jordan RNNs. We compare all models, either traditional or new, on four distinct tasks of sequence labeling: two on Spoken Language Understanding (ATIS and MEDIA); and two of POS tagging for the French Treebank (FTB) and the Penn Treebank (PTB) corpora. The...

  16. A classification of periodic turtle sequences

    OpenAIRE

    J. Holdener; A. Wagaman

    2003-01-01

    A turtle sequence is a word constructed from an alphabet of two letters: F, which represents the forward motion of a turtle in the plane, and L, which represents a counterclockwise turn. In this paper, we investigate such sequences and establish links between the combinatoric properties of words and the geometric properties of the curves they generate. In particular, we classify periodic turtle sequences in terms of their closure (or lack thereof).

  17. WEAK CONVERGENCE OF HENSTOCK INTEGRABLE SEQUENCES

    Institute of Scientific and Technical Information of China (English)

    LuisaDiPiazza

    1994-01-01

    Some relationships between pointwise and weak convergence of a sequence of Henstock integrable functions are studied, In particular it is provided an example of a sequence of Henstock integrable functions whose pointwise limit is different from the weak one. By introducing an asymptotic version of the Henstock equiintegrability notion it is given a necessary and sufficient condition in order that a pointwisely convergent sequence of Henstock integrable functions is weakly convergent to its pointwise limit.

  18. Next-generation sequencing: applications beyond genomes

    OpenAIRE

    Marguerat, Samuel; Wilhelm, Brian T.; Bähler, Jürg

    2008-01-01

    The development of DNA sequencing more than 30 years ago has profoundly impacted biological research. In the last couple of years, remarkable technological innovations have emerged that allow the direct and cost-effective sequencing of complex samples at unprecedented scale and speed. These next-generation technologies make it feasible to sequence not only static genomes, but also entire transcriptomes expressed under different conditions. These and other powerful applications of next-generat...

  19. Repetitive sequence environment distinguishes housekeeping genes

    OpenAIRE

    Eller, C. Daniel; Regelson, Moira; Merriman, Barry; Nelson, Stan,; Horvath, Steve; Marahrens, York

    2006-01-01

    Housekeeping genes are expressed across a wide variety of tissues. Since repetitive sequences have been reported to influence the expression of individual genes, we employed a novel approach to determine whether housekeeping genes can be distinguished from tissue-specific genes their repetitive sequence context. We show that Alu elements are more highly concentrated around housekeeping genes while various longer (>400-bp) repetitive sequences ("repeats"), including Long Interspersed Nuclear E...

  20. A statistical study of aftershock sequences

    OpenAIRE

    Giorgio Ranalli

    2010-01-01

    A comprehensive statistical study of the phenomenology of aftershock sequences is made in this paper. The spatial distribution of aftershocks indicates that they are mainly crustal events; however, deeper sequences also take place. The analysis of the distribution of aftershocks in 15 sequences with respect to time and magnitude leads to the statistical confirmation of a set of phenomenological laws describing the process, namely, the time-frequency law of hyperbolic decay of aftershock activ...

  1. Scalable Transcriptome Preparation for Massive Parallel Sequencing

    OpenAIRE

    Stranneheim, Henrik; Werne, Beata; Sherwood, Ellen; Lundeberg, Joakim

    2011-01-01

    Background The tremendous output of massive parallel sequencing technologies requires automated robust and scalable sample preparation methods to fully exploit the new sequence capacity. Methodology In this study, a method for automated library preparation of RNA prior to massively parallel sequencing is presented. The automated protocol uses precipitation onto carboxylic acid paramagnetic beads for purification and size selection of both RNA and DNA. The automated sample preparation was comp...

  2. Scalable Transcriptome Preparation for Massive Parallel Sequencing

    OpenAIRE

    Henrik Stranneheim; Beata Werne; Ellen Sherwood; Joakim Lundeberg

    2011-01-01

    BACKGROUND: The tremendous output of massive parallel sequencing technologies requires automated robust and scalable sample preparation methods to fully exploit the new sequence capacity. METHODOLOGY: In this study, a method for automated library preparation of RNA prior to massively parallel sequencing is presented. The automated protocol uses precipitation onto carboxylic acid paramagnetic beads for purification and size selection of both RNA and DNA. The automated sample preparation was co...

  3. Accurate and comprehensive sequencing of personal genomes

    OpenAIRE

    Ajay, Subramanian S.; Parker, Stephen C.J.; Ozel Abaan, Hatice; Fuentes Fajardo, Karin V.; Margulies, Elliott H.

    2011-01-01

    As whole-genome sequencing becomes commoditized and we begin to sequence and analyze personal genomes for clinical and diagnostic purposes, it is necessary to understand what constitutes a complete sequencing experiment for determining genotypes and detecting single-nucleotide variants. Here, we show that the current recommendation of ∼30× coverage is not adequate to produce genotype calls across a large fraction of the genome with acceptably low error rates. Our results are based on analyses...

  4. Sequence and structural analysis of antibodies

    OpenAIRE

    Raghavan, A. K.

    2009-01-01

    The work presented in this thesis focusses on the sequence and structural analysis of antibodies and has fallen into three main areas. First I developed a method to assess how typical an antibody sequence is of the expressed human antibody repertoire. My hypothesis was that the more \\humanlike" an antibody sequence is (in other words how typical it is of the expressed human repertoire), the less likely it is to elicit an immune response when used in vivo in humans. In practi...

  5. Some properties of generalized Fibonacci sequence

    Science.gov (United States)

    Chong, Chin-Yoon; Ho, C. K.

    2015-12-01

    For all non-negative integer n and real constants a, b, p and q, the generalized Fibonacci sequence {U n } is defined by Un+2 = pUn+1 + qUn with the initial values U0 = a and U1 = b. Throughout the paper, we study some properties of the generalized Fibonacci sequence. Our results will motivate some new research problems concerning the contribution of the generalized sequence.

  6. Effects of Sequence Partitioning on Compression Rate

    OpenAIRE

    Alagoz, B. Baykant

    2010-01-01

    In the paper, a theoretical work is done for investigating effects of splitting data sequence into packs of data set. We proved that a partitioning of data sequence is possible to find such that the entropy rate at each subsequence is lower than entropy rate of the source. Effects of sequence partitioning on overall compression rate are argued on the bases of partitioning statistics, and then, an optimization problem for an optimal partition is defined to improve overall compression rate of a...

  7. Nucleotide sequences encoding a thermostable alkaline protease

    Science.gov (United States)

    Wilson, David B.; Lao, Guifang

    1998-01-01

    Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium.

  8. Detecting Emotions from Connected Action Sequences

    Science.gov (United States)

    Bernhardt, Daniel; Robinson, Peter

    In this paper we deal with the problem of detecting emotions from the body movements produced by naturally connected action sequences. Although action sequences are one of the most common forms of body motions in everyday scenarios their potential for emotion recognition has not been explored in the past. We show that there are fundamental differences between actions recorded in isolation and in natural sequences and demonstrate a number of techniques which allow us to correctly label action sequences with one of four emotions up to 86% of the time. Our results bring us an important step closer to recognizing emotions from body movements in natural scenarios.

  9. Genomic sequencing of Pleistocene cave bears

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  10. Hardware Accelerated Sequence Alignment with Traceback

    Directory of Open Access Journals (Sweden)

    Scott Lloyd

    2009-01-01

    in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations. With 256 processing elements in FPGA technology, a performance gain over 300 times that of a desktop computer is demonstrated on sequence lengths of 16000. For greater performance, the architecture is scalable to more processing elements.

  11. Researches on Sequence of Plant Cystatin: Phytocystatin

    Institute of Scientific and Technical Information of China (English)

    QINQingfeng; HEWei; LIANGJun; ZHANGXingyao

    2005-01-01

    Plant cystatins or phytocystatins are cysteine proteinase inhibitors exist widely in different plant species. Because they can kill insects by inhibiting the digestive function of the cysteine proteinase in gut, they are believed to play an important role in plant's defense against pests. Phytocystatins contain the conserved QXVXG motif and show some features on their sequence different to animal cystatins.After sequencing the protein directly and the cDNA clone, a large number of plant cystatins have been characterized. A multialignment with BLAST software and a detail analysis of 38 phytocystatins show that phytocystatins possess a specific conserved amino acid sequence [LRVI]-[AGT]-[RQKE]-[FY]-[AS]-[VI]-X-[EGHDQV]-[HYFQ]-N different to the conserved sequence demonstrated by Margis in 1998. This conserved sequence can be enough to detect with exclusivity phytocystatin sequences on protein data banks. A classification of these phytocystatins is performed and they can be divided into 3 groups according to their features on amino acid sequence, and the group-I can be still divided into 3 subgroups based on the feature of their amino acid and genomic sequence. By the CLUSTALX software,the most conserved nucleotide sequences of phytocystatins were found, which could be used to design the degenerate premiers to search new phytocystatins with PCR reaction.

  12. Massively parallel sequencing of forensic STRs

    DEFF Research Database (Denmark)

    Parson, Walther; Ballard, David; Budowle, Bruce;

    2016-01-01

    The DNA Commission of the International Society for Forensic Genetics (ISFG) is reviewing factors that need to be considered ahead of the adoption by the forensic community of short tandem repeat (STR) genotyping by massively parallel sequencing (MPS) technologies. MPS produces sequence data...... accessible genome assembly, and in place before the uptake of MPS by the general forensic community starts to generate sequence data on a large scale. While the established nomenclature for CE-based STR analysis will remain unchanged in the future, the nomenclature of sequence-based STR genotypes will need...

  13. Visible periodicity of strong nucleosome DNA sequences.

    Science.gov (United States)

    Salih, Bilal; Tripathi, Vijay; Trifonov, Edward N

    2015-01-01

    Fifteen years ago, Lowary and Widom assembled nucleosomes on synthetic random sequence DNA molecules, selected the strongest nucleosomes and discovered that the TA dinucleotides in these strong nucleosome sequences often appear at 10-11 bases from one another or at distances which are multiples of this period. We repeated this experiment computationally, on large ensembles of natural genomic sequences, by selecting the strongest nucleosomes--i.e. those with such distances between like-named dinucleotides, multiples of 10.4 bases, the structural and sequence period of nucleosome DNA. The analysis confirmed the periodicity of TA dinucleotides in the strong nucleosomes, and revealed as well other periodic sequence elements, notably classical AA and TT dinucleotides. The matrices of DNA bendability and their simple linear forms--nucleosome positioning motifs--are calculated from the strong nucleosome DNA sequences. The motifs are in full accord with nucleosome positioning sequences derived earlier, thus confirming that the new technique, indeed, detects strong nucleosomes. Species- and isochore-specific variations of the matrices and of the positioning motifs are demonstrated. The strong nucleosome DNA sequences manifest the highest hitherto nucleosome positioning sequence signals, showing the dinucleotide periodicities in directly observable rather than in hidden form.

  14. A note on bifix-free sequences

    DEFF Research Database (Denmark)

    Nielsen, Peter Tolstrup

    1973-01-01

    A bifix of anL-aryn-tuple is a sequence which is both a prefix and a suffix of thatn-tuple. The practical importance of bifix-free patterns is emphasized, and we devise a systematic way of generating all such sequences and determine their number.......A bifix of anL-aryn-tuple is a sequence which is both a prefix and a suffix of thatn-tuple. The practical importance of bifix-free patterns is emphasized, and we devise a systematic way of generating all such sequences and determine their number....

  15. Recursive sequences in first-year calculus

    Science.gov (United States)

    Krainer, Thomas

    2016-02-01

    This article provides ready-to-use supplementary material on recursive sequences for a second-semester calculus class. It equips first-year calculus students with a basic methodical procedure based on which they can conduct a rigorous convergence or divergence analysis of many simple recursive sequences on their own without the need to invoke inductive arguments as is typically required in calculus textbooks. The sequences that are accessible to this kind of analysis are predominantly (eventually) monotonic, but also certain recursive sequences that alternate around their limit point as they converge can be considered.

  16. Representing objects, relations, and sequences.

    Science.gov (United States)

    Gallant, Stephen I; Okaywe, T Wendy

    2013-08-01

    Vector symbolic architectures (VSAs) are high-dimensional vector representations of objects (e.g., words, image parts), relations (e.g., sentence structures), and sequences for use with machine learning algorithms. They consist of a vector addition operator for representing a collection of unordered objects, a binding operator for associating groups of objects, and a methodology for encoding complex structures. We first develop constraints that machine learning imposes on VSAs; for example, similar structures must be represented by similar vectors. The constraints suggest that current VSAs should represent phrases ("The smart Brazilian girl") by binding sums of terms, in addition to simply binding the terms directly. We show that matrix multiplication can be used as the binding operator for a VSA, and that matrix elements can be chosen at random. A consequence for living systems is that binding is mathematically possible without the need to specify, in advance, precise neuron-to-neuron connection properties for large numbers of synapses. A VSA that incorporates these ideas, Matrix Binding of Additive Terms (MBAT), is described that satisfies all constraints. With respect to machine learning, for some types of problems appropriate VSA representations permit us to prove learnability rather than relying on simulations. We also propose dividing machine (and neural) learning and representation into three stages, with differing roles for learning in each stage. For neural modeling, we give representational reasons for nervous systems to have many recurrent connections, as well as for the importance of phrases in language processing. Sizing simulations and analyses suggest that VSAs in general, and MBAT in particular, are ready for real-world applications. PMID:23607563

  17. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding.

    Directory of Open Access Journals (Sweden)

    Marta Brozynska

    Full Text Available Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina and Ion Torrent (Life Technology sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare. Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.

  18. Timing-Sequence Testing of Parallel Programs

    Institute of Scientific and Technical Information of China (English)

    LIANG Yu; LI Shu; ZHANG Hui; HAN Chengde

    2000-01-01

    Testing of parallel programs involves two parts-testing of controlflow within the processes and testing of timing-sequence.This paper focuses on the latter, particularly on the timing-sequence of message-passing paradigms.Firstly the coarse-grained SYN-sequence model is built up to describe the execution of distributed programs. All of the topics discussed in this paper are based on it. The most direct way to test a program is to run it. A fault-free parallel program should be of both correct computing results and proper SYN-sequence. In order to analyze the validity of observed SYN-sequence, this paper presents the formal specification (Backus Normal Form) of the valid SYN-sequence. Till now there is little work about the testing coverage for distributed programs. Calculating the number of the valid SYN-sequences is the key to coverage problem, while the number of the valid SYN-sequences is terribly large and it is very hard to obtain the combination law among SYN-events. In order to resolve this problem, this paper proposes an efficient testing strategy-atomic SYN-event testing, which is to linearize the SYN-sequence (making it only consist of serial atomic SYN-events) first and then test each atomic SYN-event independently. This paper particularly provides the calculating formula about the number of the valid SYN-sequences for tree-topology atomic SYN-event (broadcast and combine). Furthermore,the number of valid SYN-sequences also,to some degree, mirrors the testability of parallel programs. Taking tree-topology atomic SYN-event as an example, this paper demonstrates the testability and communication speed of the tree-topology atomic SYN-event under different numbers of branches in order to achieve a more satisfactory tradeoff between testability and communication efficiency.

  19. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  20. Generalized Identities of Companion Fibonacci-Like Sequences

    Directory of Open Access Journals (Sweden)

    Shikha Bhatnagar

    2013-06-01

    Full Text Available The Fibonacci sequence, Lucas sequence, Pell sequence, Pell-Lucas sequence, Jacobsthalsequence and Jacobsthal-Lucas sequence are most prominent examples of second order recursivesequences. In this paper, we deal with two companion Fibonacci- Like sequences which aregeneralization of Fibonacci-Like sequence. Further we obtain some generalized identities amongthe terms of companion Fibonacci-Like sequences, Jacobsthal and Jacobsthal-Lucas sequencesthrough Binet’s formulae.

  1. Generalized Identities of Companion Fibonacci-Like Sequences

    OpenAIRE

    Shikha Bhatnagar; Bijendra Singh; Omprakash Sikhwal

    2013-01-01

    The Fibonacci sequence, Lucas sequence, Pell sequence, Pell-Lucas sequence, Jacobsthalsequence and Jacobsthal-Lucas sequence are most prominent examples of second order recursivesequences. In this paper, we deal with two companion Fibonacci- Like sequences which aregeneralization of Fibonacci-Like sequence. Further we obtain some generalized identities amongthe terms of companion Fibonacci-Like sequences, Jacobsthal and Jacobsthal-Lucas sequencesthrough Binet’s formulae.

  2. Fibonacci-triple sequences and some fundamental properties

    Directory of Open Access Journals (Sweden)

    Bijendra Singh

    2010-12-01

    Full Text Available Fibonacci sequence stands as a kind of super sequence with fabulous properties. This note presents Fibonacci-Triple sequences that may also be called 3-F sequences. This is the explosive development in the region of Fibonacci sequence. Our purpose of this paper is to demonstrate fundamental properties of Fibonacci-Triple sequence.

  3. Genome Sequence of Pseudomonas chlororaphis Strain 189

    Science.gov (United States)

    Town, Jennifer; Audy, Patrice; Boyetchko, Susan M.

    2016-01-01

    Pseudomonas chlororaphis strain 189 is a potent inhibitor of the growth of the potato pathogen Phytophthora infestans. We determined the complete, finished sequence of the 6.8-Mbp genome of this strain, consisting of a single contiguous molecule. Strain 189 is closely related to previously sequenced strains of P. chlororaphis. PMID:27340063

  4. About a new family of sequences

    OpenAIRE

    Diniz, Felipe Bottega

    2016-01-01

    First we define a new kind of function over $\\mathbb{N}$. For each $i\\in\\mathbb{N}$ we have an associated function, which will be called $S_i$ . Then we define a new kind of sequence, to be made from the functions $S_i$ . Finally, we will see that some of these sequences has a self-similarity feature.

  5. Bonobos extract meaning from call sequences.

    Directory of Open Access Journals (Sweden)

    Zanna Clay

    Full Text Available Studies on language-trained bonobos have revealed their remarkable abilities in representational and communication tasks. Surprisingly, however, corresponding research into their natural communication has largely been neglected. We address this issue with a first playback study on the natural vocal behaviour of bonobos. Bonobos produce five acoustically distinct call types when finding food, which they regularly mix together into longer call sequences. We found that individual call types were relatively poor indicators of food quality, while context specificity was much greater at the call sequence level. We therefore investigated whether receivers could extract meaning about the quality of food encountered by the caller by integrating across different call sequences. We first trained four captive individuals to find two types of foods, kiwi (preferred and apples (less preferred at two different locations. We then conducted naturalistic playback experiments during which we broadcasted sequences of four calls, originally produced by a familiar individual responding to either kiwi or apples. All sequences contained the same number of calls but varied in the composition of call types. Following playbacks, we found that subjects devoted significantly more search effort to the field indicated by the call sequence. Rather than attending to individual calls, bonobos attended to the entire sequences to make inferences about the food encountered by a caller. These results provide the first empirical evidence that bonobos are able to extract information about external events by attending to vocal sequences of other individuals and highlight the importance of call combinations in their natural communication system.

  6. Molecular selection in a unified evolutionary sequence

    Science.gov (United States)

    Fox, S. W.

    1986-01-01

    With guidance from experiments and observations that indicate internally limited phenomena, an outline of unified evolutionary sequence is inferred. Such unification is not visible for a context of random matrix and random mutation. The sequence proceeds from Big Bang through prebiotic matter, protocells, through the evolving cell via molecular and natural selection, to mind, behavior, and society.

  7. Stochastic Modelling of Daily Rainfall sequences

    NARCIS (Netherlands)

    Buishand, T.A.

    1977-01-01

    Rainfall series of different climatic regions were analysed with the aim of generating daily rainfall sequences. A survey of the data is given in I, 1. When analysing daily rainfall sequences one must be aware of the following points:a. Seasonality. Because of seasonal variation of features of the r

  8. Novel algorithms for protein sequence analysis

    NARCIS (Netherlands)

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  9. Complete Genome Sequence of Gordonia terrae 3612

    Science.gov (United States)

    Russell, Daniel A.; Guerrero Bustamante, Carlos A.; Garlena, Rebecca A.

    2016-01-01

    Here, we report the complete genome sequence of Gordonia terrae 3612, also known by the strain designations ATCC 25594, NRRL B-16283, and NBRC 100016. The genome sequence reveals it to be free of prophage and clustered regularly interspaced short palindromic repeats (CRISPRs), and it is an effective host for the isolation and characterization of Gordonia bacteriophages. PMID:27688316

  10. Project Report: Automatic Sequence Processor Software Analysis

    Science.gov (United States)

    Benjamin, Brandon

    2011-01-01

    The Mission Planning and Sequencing (MPS) element of Multi-Mission Ground System and Services (MGSS) provides space missions with multi-purpose software to plan spacecraft activities, sequence spacecraft commands, and then integrate these products and execute them on spacecraft. Jet Propulsion Laboratory (JPL) is currently is flying many missions. The processes for building, integrating, and testing the multi-mission uplink software need to be improved to meet the needs of the missions and the operations teams that command the spacecraft. The Multi-Mission Sequencing Team is responsible for collecting and processing the observations, experiments and engineering activities that are to be performed on a selected spacecraft. The collection of these activities is called a sequence and ultimately a sequence becomes a sequence of spacecraft commands. The operations teams check the sequence to make sure that no constraints are violated. The workflow process involves sending a program start command, which activates the Automatic Sequence Processor (ASP). The ASP is currently a file-based system that is comprised of scripts written in perl, c-shell and awk. Once this start process is complete, the system checks for errors and aborts if there are any; otherwise the system converts the commands to binary, and then sends the resultant information to be radiated to the spacecraft.

  11. Archaebacterial rhodopsin sequences: Implications for evolution

    Science.gov (United States)

    Lanyi, J. K.

    1991-01-01

    It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.

  12. Complete Genome Sequence of Gordonia terrae 3612.

    Science.gov (United States)

    Russell, Daniel A; Guerrero Bustamante, Carlos A; Garlena, Rebecca A; Hatfull, Graham F

    2016-01-01

    Here, we report the complete genome sequence of Gordonia terrae 3612, also known by the strain designations ATCC 25594, NRRL B-16283, and NBRC 100016. The genome sequence reveals it to be free of prophage and clustered regularly interspaced short palindromic repeats (CRISPRs), and it is an effective host for the isolation and characterization of Gordonia bacteriophages. PMID:27688316

  13. From Arithmetic Sequences to Linear Equations

    Science.gov (United States)

    Matsuura, Ryota; Harless, Patrick

    2012-01-01

    The first part of the article focuses on deriving the essential properties of arithmetic sequences by appealing to students' sense making and reasoning. The second part describes how to guide students to translate their knowledge of arithmetic sequences into an understanding of linear equations. Ryota Matsuura originally wrote these lessons for…

  14. Multilocus Sequence Typing Tool for Cyclospora cayetanensis

    Science.gov (United States)

    Guo, Yaqiong; Roellig, Dawn M.; Li, Na; Tang, Kevin; Frace, Michael; Ortega, Ynes; Arrowood, Michael J.; Qvarnstrom, Yvonne; Wang, Lin; Moss, Delynn M.; Zhang, Longxian; Xiao, Lihua

    2016-01-01

    Because the lack of typing tools for Cyclospora cayetanensis has hampered outbreak investigations, we sequenced its genome and developed a genotyping tool. We observed 2 to 10 geographically segregated sequence types at each of 5 selected loci. This new tool could be useful for case linkage and infection/contamination source tracking. PMID:27433881

  15. Concept For Generation Of Long Pseudorandom Sequences

    Science.gov (United States)

    Wang, C. C.

    1990-01-01

    Conceptual very-large-scale integrated (VLSI) digital circuit performs exponentiation in finite field. Algorithm that generates unusually long sequences of pseudorandom numbers executed by digital processor that includes such circuits. Concepts particularly advantageous for such applications as spread-spectrum communications, cryptography, and generation of ranging codes, synthetic noise, and test data, where usually desirable to make pseudorandom sequences as long as possible.

  16. Acoustic Packaging of Action Sequences by Infants

    Science.gov (United States)

    Brand, Rebecca J.; Tapscott, Stephanie

    2007-01-01

    This study investigated whether acoustic input, in the form of infant-directed speech, influenced infants' segmenting of action sequences. Thirty-two 7.5- to 11.5-month-old infants were familiarized with video sequences made up of short action clips. Narration coincided with portions of the action stream to package certain pairs of clips together.…

  17. What's Next? Judging Sequences of Binary Events

    Science.gov (United States)

    Oskarsson, An T.; Van Boven, Leaf; McClelland, Gary H.; Hastie, Reid

    2009-01-01

    The authors review research on judgments of random and nonrandom sequences involving binary events with a focus on studies documenting gambler's fallacy and hot hand beliefs. The domains of judgment include random devices, births, lotteries, sports performances, stock prices, and others. After discussing existing theories of sequence judgments,…

  18. SPARSE SEQUENCE CONSTRUCTION OF LDPC CODES

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    This letter proposes a novel and simple construction of regular Low-Density Parity-Check (LDPC) codes using sparse binary sequences. It utilizes the cyclic cross correlation function of sparse sequences to generate codes with girth8. The new codes perform well using the sumproduct decoding. Low encodingcomplexity can also be achieved due to the inherent quasi-cyclic structure of the codes.

  19. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    OpenAIRE

    de Souza, Sandro J.; Anamaria A Camargo; Briones, Marcelo R. S.; Fernando F. Costa; NAGAI, MARIA APARECIDA; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 ...

  20. EXTENSION OF SEQUENCE STRATIGRAPHIC MODEL AND SEQUENCE STRATIGRAPHC ANALYSIS IN LIMNIC DEPOSITIONAL BASIN

    Institute of Scientific and Technical Information of China (English)

    李增学; 魏久传; 王民镇; 李守春; 李青山; 金秀昆; 兰恒星

    1996-01-01

    The architectural patterns of sedimentary succession are diverse in different depositionalbasins. The sedimentary architecture and geological condition of such basins asepicontinental sea, intraplate limnic basins, etc., differ clearly from those of continentalmargin basin. Extension, complement and perfection of sequence stratigraphic models are needed in the studies ofvarious depositional basins based on the classical sequence model. This paper, for this reason,expounds the thought, principles of sequence division, methodology and technology of the studyof sequence stratigraphy in epicontinental and limnic basins.

  1. NGS-based deep bisulfite sequencing.

    Science.gov (United States)

    Lee, Suman; Kim, Joomyeong

    2016-01-01

    We have developed an NGS-based deep bisulfite sequencing protocol for the DNA methylation analysis of genomes. This approach allows the rapid and efficient construction of NGS-ready libraries with a large number of PCR products that have been individually amplified from bisulfite-converted DNA. This approach also employs a bioinformatics strategy to sort the raw sequence reads generated from NGS platforms and subsequently to derive DNA methylation levels for individual loci. The results demonstrated that this NGS-based deep bisulfite sequencing approach provide not only DNA methylation levels but also informative DNA methylation patterns that have not been seen through other existing methods.•This protocol provides an efficient method generating NGS-ready libraries from individually amplified PCR products.•This protocol provides a bioinformatics strategy sorting NGS-derived raw sequence reads.•This protocol provides deep bisulfite sequencing results that can measure DNA methylation levels and patterns of individual loci.

  2. Aligning Sequences by Minimum Description Length

    Directory of Open Access Journals (Sweden)

    John S. Conery

    2008-01-01

    Full Text Available This paper presents a new information theoretic framework for aligning sequences in bioinformatics. A transmitter compresses a set of sequences by constructing a regular expression that describes the regions of similarity in the sequences. To retrieve the original set of sequences, a receiver generates all strings that match the expression. An alignment algorithm uses minimum description length to encode and explore alternative expressions; the expression with the shortest encoding provides the best overall alignment. When two substrings contain letters that are similar according to a substitution matrix, a code length function based on conditional probabilities defined by the matrix will encode the substrings with fewer bits. In one experiment, alignments produced with this new method were found to be comparable to alignments from CLUSTALW. A second experiment measured the accuracy of the new method on pairwise alignments of sequences from the BAliBASE alignment benchmark.

  3. Sequence Affects the Cyclization of DNA Minicircles.

    Science.gov (United States)

    Wang, Qian; Pettitt, B Montgomery

    2016-03-17

    Understanding how the sequence of a DNA molecule affects its dynamic properties is a central problem affecting biochemistry and biotechnology. The process of cyclizing short DNA, as a critical step in molecular cloning, lacks a comprehensive picture of the kinetic process containing sequence information. We have elucidated this process by using coarse-grained simulations, enhanced sampling methods, and recent theoretical advances. We are able to identify the types and positions of structural defects during the looping process at a base-pair level. Correlations along a DNA molecule dictate critical sequence positions that can affect the looping rate. Structural defects change the bending elasticity of the DNA molecule from a harmonic to subharmonic potential with respect to bending angles. We explore the subelastic chain as a possible model in loop formation kinetics. A sequence-dependent model is developed to qualitatively predict the relative loop formation time as a function of DNA sequence. PMID:26938490

  4. M-sequences in ophthalmic electrophysiology.

    Science.gov (United States)

    Müller, Philipp L; Meigen, Thomas

    2016-01-01

    The aim of this review is to use the multimedia aspects of a purely digital online publication to explain and illustrate the highly capable technique of m-sequences in multifocal ophthalmic electrophysiology. M-sequences have been successfully applied in clinical routines during the past 20 years. However, the underlying mathematical rationale is often daunting. These mathematical properties of m-sequences allow one not only to separate the responses from different fields but also to analyze adaptational effects and impacts of former events. By explaining the history, the formation, and the different aspects of application, a better comprehension of the technique is intended. With this review we aim to clarify the opportunities of m-sequences in order to motivate scientists to use m-sequences in their future research. PMID:26818968

  5. Genotyping-by-Sequencing in Plants

    Directory of Open Access Journals (Sweden)

    Gregory D. May

    2012-09-01

    Full Text Available The advent of next-generation DNA sequencing (NGS technologies has led to the development of rapid genome-wide Single Nucleotide Polymorphism (SNP detection applications in various plant species. Recent improvements in sequencing throughput combined with an overall decrease in costs per gigabase of sequence is allowing NGS to be applied to not only the evaluation of small subsets of parental inbred lines, but also the mapping and characterization of traits of interest in much larger populations. Such an approach, where sequences are used simultaneously to detect and score SNPs, therefore bypassing the entire marker assay development stage, is known as genotyping-by-sequencing (GBS. This review will summarize the current state of GBS in plants and the promises it holds as a genome-wide genotyping application.

  6. Transcriptome sequencing goals, assembly, and assessment.

    Science.gov (United States)

    Wheat, Christopher W; Vogel, Heiko

    2011-01-01

    Transcriptome sequencing provides quick, direct access to the mRNA. With this information, one can design primers for PCR of thousands of different genes, SNP markers, probes for microarrays and qPCR, or just use the sequence data itself in comparative studies. Transcriptome sequencing, while getting cheaper, is still an expensive endeavor, with an examination of data quality and its assembly infrequently performed in depth. Here, we outline many of the important issues we think need consideration when starting a transcriptome sequencing project. We also walk the reader through a detailed analysis of an example transcriptome dataset, highlighting the importance of both within-dataset analysis and comparative inferences. Our hope is that with greater attention focused upon assessing assembly performance, advances in transcriptome assembly will increase as prices continue to drop and new technologies, such as Illumina sequencing, start to be used. PMID:22065435

  7. Smgcd: metrics for biological sequence data

    International Nuclear Information System (INIS)

    In the realm of bioinformatics, the key challenges are to manage, store and retrieve the biological data efficiently. It can be classified in to structured, unstructured and semi-structured contents. Typically, the semi-structured biological data comprised of biological sequences. The complex biological sequences produce huge volume of biological data which further produce much more problems for its management, storage and retrieval. This paper proposed metrics; namely, symmetry measure, molecular weight measure, similarity or diversity measure, size base measure, size gap measure, complexity measure and size complexity diversity measure to manage the raised problems in biological data sequences. These metrics measure the sequence complexity, molecular weights, length with gaps and without gaps, its symmetry and similarity through mathematical formulations. The metrics are demonstrated and validated using the proposed hybrid technique which combines empirical evidence with theoretical formulation. This research opens new horizons for efficient management to measure the functionality and quality of metadata for single and multiple biological sequences. (author)

  8. The DNA sequence of human chromosome 7.

    Science.gov (United States)

    Hillier, Ladeana W; Fulton, Robert S; Fulton, Lucinda A; Graves, Tina A; Pepin, Kymberlie H; Wagner-McPherson, Caryn; Layman, Dan; Maas, Jason; Jaeger, Sara; Walker, Rebecca; Wylie, Kristine; Sekhon, Mandeep; Becker, Michael C; O'Laughlin, Michelle D; Schaller, Mark E; Fewell, Ginger A; Delehaunty, Kimberly D; Miner, Tracie L; Nash, William E; Cordes, Matt; Du, Hui; Sun, Hui; Edwards, Jennifer; Bradshaw-Cordum, Holland; Ali, Johar; Andrews, Stephanie; Isak, Amber; Vanbrunt, Andrew; Nguyen, Christine; Du, Feiyu; Lamar, Betty; Courtney, Laura; Kalicki, Joelle; Ozersky, Philip; Bielicki, Lauren; Scott, Kelsi; Holmes, Andrea; Harkins, Richard; Harris, Anthony; Strong, Cynthia Madsen; Hou, Shunfang; Tomlinson, Chad; Dauphin-Kohlberg, Sara; Kozlowicz-Reilly, Amy; Leonard, Shawn; Rohlfing, Theresa; Rock, Susan M; Tin-Wollam, Aye-Mon; Abbott, Amanda; Minx, Patrick; Maupin, Rachel; Strowmatt, Catrina; Latreille, Phil; Miller, Nancy; Johnson, Doug; Murray, Jennifer; Woessner, Jeffrey P; Wendl, Michael C; Yang, Shiaw-Pyng; Schultz, Brian R; Wallis, John W; Spieth, John; Bieri, Tamberlyn A; Nelson, Joanne O; Berkowicz, Nicolas; Wohldmann, Patricia E; Cook, Lisa L; Hickenbotham, Matthew T; Eldred, James; Williams, Donald; Bedell, Joseph A; Mardis, Elaine R; Clifton, Sandra W; Chissoe, Stephanie L; Marra, Marco A; Raymond, Christopher; Haugen, Eric; Gillett, Will; Zhou, Yang; James, Rose; Phelps, Karen; Iadanoto, Shawn; Bubb, Kerry; Simms, Elizabeth; Levy, Ruth; Clendenning, James; Kaul, Rajinder; Kent, W James; Furey, Terrence S; Baertsch, Robert A; Brent, Michael R; Keibler, Evan; Flicek, Paul; Bork, Peer; Suyama, Mikita; Bailey, Jeffrey A; Portnoy, Matthew E; Torrents, David; Chinwalla, Asif T; Gish, Warren R; Eddy, Sean R; McPherson, John D; Olson, Maynard V; Eichler, Evan E; Green, Eric D; Waterston, Robert H; Wilson, Richard K

    2003-07-10

    Human chromosome 7 has historically received prominent attention in the human genetics community, primarily related to the search for the cystic fibrosis gene and the frequent cytogenetic changes associated with various forms of cancer. Here we present more than 153 million base pairs representing 99.4% of the euchromatic sequence of chromosome 7, the first metacentric chromosome completed so far. The sequence has excellent concordance with previously established physical and genetic maps, and it exhibits an unusual amount of segmentally duplicated sequence (8.2%), with marked differences between the two arms. Our initial analyses have identified 1,150 protein-coding genes, 605 of which have been confirmed by complementary DNA sequences, and an additional 941 pseudogenes. Of genes confirmed by transcript sequences, some are polymorphic for mutations that disrupt the reading frame. PMID:12853948

  9. Locomotor sequence learning in visually guided walking

    DEFF Research Database (Denmark)

    Choi, Julia T; Jensen, Peter; Nielsen, Jens Bo

    2016-01-01

    to modify step length from one trial to the next. Our sequence learning paradigm is derived from the serial reaction-time (SRT) task that has been used in upper limb studies. Both random and ordered sequences of step lengths were used to measure sequence-specific and sequence non-specific learning during......Voluntary limb modifications must be integrated with basic walking patterns during visually guided walking. Here we tested whether voluntary gait modifications can become more automatic with practice. We challenged walking control by presenting visual stepping targets that instructed subjects...... of step lengths over 300 training steps. Younger children (age 6-10 years, N = 8) have lower baseline performance, but their magnitude and rate of sequence learning was the same compared to older children (11-16 years, N = 10) and healthy adults. In addition, learning capacity may be more limited...

  10. Metagenomics using next-generation sequencing.

    Science.gov (United States)

    Bragg, Lauren; Tyson, Gene W

    2014-01-01

    Traditionally, microbial genome sequencing has been restricted to the small number of species that can be grown in pure culture. The progressive development of culture-independent methods over the last 15 years now allows researchers to sequence microbial communities directly from environmental samples. This approach is commonly referred to as "metagenomics" or "community genomics". However, the term metagenomics is applied liberally in the literature to describe any culture-independent analysis of microbial communities. Here, we define metagenomics as shotgun ("random") sequencing of the genomic DNA of a sample taken directly from the environment. The metagenome can be thought of as a sampling of the collective genome of the microbial community. We outline the considerations and analyses that should be undertaken to ensure the success of a metagenomic sequencing project, including the choice of sequencing platform and methods for assembly, binning, annotation, and comparative analysis.

  11. Sequencing and comparing whole mitochondrial genomes ofanimals

    Energy Technology Data Exchange (ETDEWEB)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  12. Reading biological processes from nucleotide sequences

    Science.gov (United States)

    Murugan, Anand

    Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical

  13. Finding Common Sequence and Structure Motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, Laurie J.; Stormo, Gary D.

    1997-01-01

    We present a computational scheme to search for the most common motif, composed of a combination of sequence and structure constraints, among a collection of RNA sequences. The method uses a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences...

  14. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    DEFF Research Database (Denmark)

    de Souza, S J; Camargo, A A; Briones, M R;

    2000-01-01

    Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central ...

  15. Sequencing and Analysis of a Genomic Fragment Provide an Insight into the Dunaliella viridis Genomic Sequence

    Institute of Scientific and Technical Information of China (English)

    Xiao-Ming SUN; Yuan-Ping TANG; Xiang-Zong MENG; Wen-Wen ZHANG; Shan LI; Zhi-Rui DENG; Zheng-Kai XU; Ren-Tao SONG

    2006-01-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)n type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features.

  16. SOME GEOMETRIC PROPERTIES OF A NEW DIFFERENCE SEQUENCE SPACE INVOLVING LACUNARY SEQUENCES

    Institute of Scientific and Technical Information of China (English)

    Murat KARAKAŞ; Mikail ET; Vatan KARAKAYA

    2013-01-01

    In this paper, we define a new generalized difference sequence space involving lacunary sequence. Then, we examine k-NUC property and property (β) for this space and also show that it is not rotund where p=(pr) is a bounded sequence of positive real numbers with pr ≥1 for all r∈N.

  17. Targeted next-generation sequencing can replace Sanger sequencing in clinical diagnostics

    NARCIS (Netherlands)

    Sikkema-Raddatz, B.; Johansson, L.F.; de Boer, E.N.; Almomani, R.; Boven, L.G.; van den Berg, M.P.; van Spaendonck-Zwarts, K.Y.; van Tintelen, J.P.; Sijmons, R.H.; Jongbloed, J.D.H.; Sinke, R.J.

    2013-01-01

    Mutation detection through exome sequencing allows simultaneous analysis of all coding sequences of genes. However, it cannot yet replace Sanger sequencing (SS) in diagnostics because of incomplete representation and coverage of exons leading to missing clinically relevant mutations. Targeted next-g

  18. Sequencing and analysis of a genomic fragment provide an insight into the Dunaliella viridis genomic sequence.

    Science.gov (United States)

    Sun, Xiao-Ming; Tang, Yuan-Ping; Meng, Xiang-Zong; Zhang, Wen-Wen; Li, Shan; Deng, Zhi-Rui; Xu, Zheng-Kai; Song, Ren-Tao

    2006-11-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)(n) type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features. PMID:17091199

  19. Whole-Genome Shotgun Sequencing of a Colonizing Multilocus Sequence Type 17 Streptococcus agalactiae Strain

    OpenAIRE

    Singh, Pallavi; Springman, A. Cody; Davies, H Dele; Manning, Shannon D.

    2012-01-01

    This report highlights the whole-genome shotgun draft sequence for a Streptococcus agalactiae strain representing multilocus sequence type (ST) 17, isolated from a colonized woman at 8 weeks postpartum. This sequence represents an important addition to the published genomes and will promote comparative genomic studies of S. agalactiae recovered from diverse sources.

  20. Large Zero Autocorrelation Zone of Golay Sequences and $4^q$-QAM Golay Complementary Sequences

    CERN Document Server

    Gong, Guang; Yang, Yang

    2011-01-01

    Sequences with good correlation properties have been widely adopted in modern communications, radar and sonar applications. In this paper, we present our new findings on some constructions of single $H$-ary Golay sequence and $4^q$-QAM Golay complementary sequence with a large zero autocorrelation zone, where $H\\ge 2$ is an arbitrary even integer and $q\\ge 2$ is an arbitrary integer. Those new results on Golay sequences and QAM Golay complementary sequences can be explored during synchronization and detection at the receiver end and thus improve the performance of the communication system.

  1. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV)

    OpenAIRE

    Martin, A. C. R.

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and ’dotifying’ repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI li...

  2. Value of a newly sequenced bacterial genome.

    Science.gov (United States)

    Barbosa, Eudes Gv; Aburjaile, Flavia F; Ramos, Rommel Tj; Carneiro, Adriana R; Le Loir, Yves; Baumbach, Jan; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco

    2014-05-26

    Next-generation sequencing (NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft (partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information. PMID:24921006

  3. Enhanced Dynamic Algorithm of Genome Sequence Alignments

    Directory of Open Access Journals (Sweden)

    Arabi E. keshk

    2014-05-01

    Full Text Available The merging of biology and computer science has created a new field called computational biology that explore the capacities of computers to gain knowledge from biological data, bioinformatics. Computational biology is rooted in life sciences as well as computers, information sciences, and technologies. The main problem in computational biology is sequence alignment that is a way of arranging the sequences of DNA, RNA or protein to identify the region of similarity and relationship between sequences. This paper introduces an enhancement of dynamic algorithm of genome sequence alignment, which called EDAGSA. It is filling the three main diagonals without filling the entire matrix by the unused data. It gets the optimal solution with decreasing the execution time and therefore the performance is increased. To illustrate the effectiveness of optimizing the performance of the proposed algorithm, it is compared with the traditional methods such as Needleman-Wunsch, Smith-Waterman and longest common subsequence algorithms. Also, database is implemented for using the algorithm in multi-sequence alignments for searching the optimal sequence that matches the given sequence.

  4. CATEGORIZATION OF EVENT SEQUENCES FOR LICENSE APPLICATION

    Energy Technology Data Exchange (ETDEWEB)

    G.E. Ragan; P. Mecheret; D. Dexheimer

    2005-04-14

    The purposes of this analysis are: (1) Categorize (as Category 1, Category 2, or Beyond Category 2) internal event sequences that may occur before permanent closure of the repository at Yucca Mountain. (2) Categorize external event sequences that may occur before permanent closure of the repository at Yucca Mountain. This includes examining DBGM-1 seismic classifications and upgrading to DBGM-2, if appropriate, to ensure Beyond Category 2 categorization. (3) State the design and operational requirements that are invoked to make the categorization assignments valid. (4) Indicate the amount of material put at risk by Category 1 and Category 2 event sequences. (5) Estimate frequencies of Category 1 event sequences at the maximum capacity and receipt rate of the repository. (6) Distinguish occurrences associated with normal operations from event sequences. It is beyond the scope of the analysis to propose design requirements that may be required to control radiological exposure associated with normal operations. (7) Provide a convenient compilation of the results of the analysis in tabular form. The results of this analysis are used as inputs to the consequence analyses in an iterative design process that is depicted in Figure 1. Categorization of event sequences for permanent retrieval of waste from the repository is beyond the scope of this analysis. Cleanup activities that take place after an event sequence and other responses to abnormal events are also beyond the scope of the analysis.

  5. Exploration of noncoding sequences in metagenomes.

    Directory of Open Access Journals (Sweden)

    Fabián Tobar-Tosse

    Full Text Available Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C content, Codon Usage (Cd, Trinucleotide Usage (Tn, and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.

  6. Value of a newly sequenced bacterial genome

    Institute of Scientific and Technical Information of China (English)

    Eudes; GV; Barbosa; Flavia; F; Aburjaile; Rommel; TJ; Ramos; Adriana; R; Carneiro; Yves; Le; Loir; Jan; Baumbach; Anderson; Miyoshi; Artur; Silva; Vasco; Azevedo

    2014-01-01

    Next-generation sequencing(NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft(partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information.

  7. Fungal genome sequencing: basic biology to biotechnology.

    Science.gov (United States)

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research. PMID:25721271

  8. Fungal genome sequencing: basic biology to biotechnology.

    Science.gov (United States)

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research.

  9. PHASE TRANSITION IN SEQUENCE UNIQUE RECONSTRUCTION

    Institute of Scientific and Technical Information of China (English)

    Li XIA; Chan ZHOU

    2007-01-01

    In this paper,sequence unique reconstruction refers to the property that a sequence is uniquely reconstructable from all its K-tuples.We propose and study the phase transition behavior of the probability P(K)of unique reconstruction with regard to tuple size K in random sequences (iid model).Based on Monte Carlo experiments,artificial proteins generated from iid model exhibit a phase transition when P(K)abruptly jumps from a low value phase(e.g.<0.1)to a high value phase (e.g.>0.9).With a generalization to any alphabet,we prove that for a random sequence of length L,as L is large enough,P(K)undergoes a sharp phase transition when P≤0.1015 where p=P(two random letters match).Besides,formulas are derived to estimate the transition points,which may be of practical use in sequencing DNA by hybridization.Concluded from our study,most proteins do not deviate greatly from random sequences in the sense of sequence unique reconstruction,while there are some "stubborn" proteins which only become uniquely reconstructable at a very large K and probably have biological implications.

  10. Robot Sequencing and Visualization Program (RSVP)

    Science.gov (United States)

    Cooper, Brian K.; Maxwell,Scott A.; Hartman, Frank R.; Wright, John R.; Yen, Jeng; Toole, Nicholas T.; Gorjian, Zareh; Morrison, Jack C

    2013-01-01

    The Robot Sequencing and Visualization Program (RSVP) is being used in the Mars Science Laboratory (MSL) mission for downlink data visualization and command sequence generation. RSVP reads and writes downlink data products from the operations data server (ODS) and writes uplink data products to the ODS. The primary users of RSVP are members of the Rover Planner team (part of the Integrated Planning and Execution Team (IPE)), who use it to perform traversability/articulation analyses, take activity plan input from the Science and Mission Planning teams, and create a set of rover sequences to be sent to the rover every sol. The primary inputs to RSVP are downlink data products and activity plans in the ODS database. The primary outputs are command sequences to be placed in the ODS for further processing prior to uplink to each rover. RSVP is composed of two main subsystems. The first, called the Robot Sequence Editor (RoSE), understands the MSL activity and command dictionaries and takes care of converting incoming activity level inputs into command sequences. The Rover Planners use the RoSE component of RSVP to put together command sequences and to view and manage command level resources like time, power, temperature, etc. (via a transparent realtime connection to SEQGEN). The second component of RSVP is called HyperDrive, a set of high-fidelity computer graphics displays of the Martian surface in 3D and in stereo. The Rover Planners can explore the environment around the rover, create commands related to motion of all kinds, and see the simulated result of those commands via its underlying tight coupling with flight navigation, motor, and arm software. This software is the evolutionary replacement for the Rover Sequencing and Visualization software used to create command sequences (and visualize the Martian surface) for the Mars Exploration Rover mission.

  11. Stem kernels for RNA sequence analyses.

    Science.gov (United States)

    Sakakibara, Yasubumi; Popendorf, Kris; Ogawa, Nana; Asai, Kiyoshi; Sato, Kengo

    2007-10-01

    Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA, and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from nonmembers and hence detect noncoding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVMs) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences, and calculates the inner product of common stem structure counts. An efficient algorithm is developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from nonmembers using SVMs. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Furthermore, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel in order to find novel RNA families from genome sequences. PMID:17933013

  12. Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius.

    Directory of Open Access Journals (Sweden)

    Abdulaziz M Al-Swailem

    Full Text Available Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and approximately 40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/, hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism.

  13. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  14. Effects of the Ion PGM™ Hi-Q™ sequencing chemistry on sequence data quality.

    Science.gov (United States)

    Churchill, Jennifer D; King, Jonathan L; Chakraborty, Ranajit; Budowle, Bruce

    2016-09-01

    Massively parallel sequencing (MPS) offers substantial improvements over current forensic DNA typing methodologies such as increased resolution, scalability, and throughput. The Ion PGM™ is a promising MPS platform for analysis of forensic biological evidence. The system employs a sequencing-by-synthesis chemistry on a semiconductor chip that measures a pH change due to the release of hydrogen ions as nucleotides are incorporated into the growing DNA strands. However, implementation of MPS into forensic laboratories requires a robust chemistry. Ion Torrent's Hi-Q™ Sequencing Chemistry was evaluated to determine if it could improve on the quality of the generated sequence data in association with selected genetic marker targets. The whole mitochondrial genome and the HID-Ion STR 10-plex panel were sequenced on the Ion PGM™ system with the Ion PGM™ Sequencing 400 Kit and the Ion PGM™ Hi-Q™ Sequencing Kit. Concordance, coverage, strand balance, noise, and deletion ratios were assessed in evaluating the performance of the Ion PGM™ Hi-Q™ Sequencing Kit. The results indicate that reliable, accurate data are generated and that sequencing through homopolymeric regions can be improved with the use of Ion Torrent's Hi-Q™ Sequencing Chemistry. Overall, the quality of the generated sequencing data supports the potential for use of the Ion PGM™ in forensic genetic laboratories.

  15. Effects of the Ion PGM™ Hi-Q™ sequencing chemistry on sequence data quality.

    Science.gov (United States)

    Churchill, Jennifer D; King, Jonathan L; Chakraborty, Ranajit; Budowle, Bruce

    2016-09-01

    Massively parallel sequencing (MPS) offers substantial improvements over current forensic DNA typing methodologies such as increased resolution, scalability, and throughput. The Ion PGM™ is a promising MPS platform for analysis of forensic biological evidence. The system employs a sequencing-by-synthesis chemistry on a semiconductor chip that measures a pH change due to the release of hydrogen ions as nucleotides are incorporated into the growing DNA strands. However, implementation of MPS into forensic laboratories requires a robust chemistry. Ion Torrent's Hi-Q™ Sequencing Chemistry was evaluated to determine if it could improve on the quality of the generated sequence data in association with selected genetic marker targets. The whole mitochondrial genome and the HID-Ion STR 10-plex panel were sequenced on the Ion PGM™ system with the Ion PGM™ Sequencing 400 Kit and the Ion PGM™ Hi-Q™ Sequencing Kit. Concordance, coverage, strand balance, noise, and deletion ratios were assessed in evaluating the performance of the Ion PGM™ Hi-Q™ Sequencing Kit. The results indicate that reliable, accurate data are generated and that sequencing through homopolymeric regions can be improved with the use of Ion Torrent's Hi-Q™ Sequencing Chemistry. Overall, the quality of the generated sequencing data supports the potential for use of the Ion PGM™ in forensic genetic laboratories. PMID:27025714

  16. Pittosporum cryptic virus 1: genome sequence completion using next-generation sequencing.

    Science.gov (United States)

    Elbeaino, Toufic; Kubaa, Raied Abou; Tuzlali, Hasan Tuna; Digiaro, Michele

    2016-07-01

    Next-generation sequencing (NGS) was applied to dsRNAs extracted from an Italian pittosporum plant infected with pittosporum cryptic virus 1 (PiCV1). NGS allowed assembly of the full genome sequence of PiCV1, comprising dsRNA1 (1.9 kbp) and dsRNA2 (1.5 kbp), which encode the RNA-dependent RNA polymerase and capsid protein genes, respectively. Phylogenetic and sequence analyses confirmed that PiCV1 is a new member of the genus Deltapartitivirus, family Partiviridae. From the same plant, NSG also permitted assembly of the complete genome sequence of eggplant mottled dwarf virus (EMDV), which shared 86 % to 98 % nucleotide sequence identity with complete and partial sequences (ca 6750 nt) of other known EMDV isolates with sequences available in the GenBank database. PMID:27087112

  17. Iterative Method for Generating Correlated Binary Sequences

    CERN Document Server

    Usatenko, O V; Apostolov, S S; Makarov, N M; Krokhin, A A

    2014-01-01

    We propose a new efficient iterative method for generating random correlated binary sequences with prescribed correlation function. The method is based on consecutive linear modulations of initially uncorrelated sequence into a correlated one. Each step of modulation increases the correlations until the desired level has been reached. Robustness and efficiency for the proposed algorithm are tested by generating sequences with inverse power-law correlations. The substantial increase in the strength of correlation in the iterative method with respect to the single-step filtering generation is shown for all studied correlation functions. Our results can be used for design of disordered superlattices, waveguides, and surfaces with selective transport properties.

  18. Generation of Goldbach sequences in quantum mechanics

    CERN Document Server

    Prudencio, Thiago

    2013-01-01

    This Letter reports for the first time the problem of generating Goldbach sequences in terms of Fock states and examine their equivalence with the corresponding Goldbach conjecture terms. We show that the expectation values of number operators in Fock states cannot generate these sequences without ambiguity due to normalization of superpositions of Fock states. Instead, we show that a complete correspondence between the Goldbach conjecture terms and the Goldbach sequences generated by Fock states can be achieved by means of projections involving number operator and superpositions of Fock states. Our proposal appears as a new alternative way to interpret measurements in fundamental quantum mechanics.

  19. Scale-PC shielding analysis sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bowman, S.M.

    1996-05-01

    The SCALE computational system is a modular code system for analyses of nuclear fuel facility and package designs. With the release of SCALE-PC Version 4.3, the radiation shielding analysis community now has the capability to execute the SCALE shielding analysis sequences contained in the control modules SAS1, SAS2, SAS3, and SAS4 on a MS- DOS personal computer (PC). In addition, SCALE-PC includes two new sequences, QADS and ORIGEN-ARP. The capabilities of each sequence are presented, along with example applications.

  20. Cloned endogenous retroviral sequences from human DNA.

    OpenAIRE

    Bonner, T I; O'Connell, C; Cohen, M.

    1982-01-01

    We have screened a human DNA library using as probe a chimpanzee sequence that contains homology to the polymerase gene of the endogenous baboon virus. One set of overlapping clones spans about 20 kilobases and contains regions of DNA sequence homology to the gag p30, gag p15, and polymerase genes of Moloney murine leukemia virus. Furthermore, the spacings are the same as in Moloney virus between these sequences and a 480-nucleotide region that has the structural characteristics of a 3' copy ...

  1. Sequence and structual analysis of antibodies.

    OpenAIRE

    Raghavan, A. K.

    2008-01-01

    The work presented in this thesis focuses on the sequence and structural analysis of antibodies and has fallen into three main areas. First I developed a method to assess how typical an antibody sequence is of the expressed human antibody repertoire. My hypothesis was that the more "human like" an antibody sequence is (in other words how typical it is of the expressed human repertoire), the less likely it is to elicit an immune response when used in vivo in humans. In practice, I found that, ...

  2. Initial retrieval sequence and blending strategy

    Energy Technology Data Exchange (ETDEWEB)

    Pemwell, D.L.; Grenard, C.E.

    1996-09-01

    This report documents the initial retrieval sequence and the methodology used to select it. Waste retrieval, storage, pretreatment and vitrification were modeled for candidate single-shell tank retrieval sequences. Performance of the sequences was measured by a set of metrics (for example,high-level waste glass volume, relative risk and schedule).Computer models were used to evaluate estimated glass volumes,process rates, retrieval dates, and blending strategy effects.The models were based on estimates of component inventories and concentrations, sludge wash factors and timing, retrieval annex limitations, etc.

  3. How Long is an Aftershock Sequence?

    Science.gov (United States)

    Godano, Cataldo; Tramelli, Anna

    2016-07-01

    The occurrence of a mainschok is always followed by aftershocks spatially distributed within the fault area. The aftershocks rate decay with time is described by the empirical Omori law which was inferred by catalogues analysis. The sequences discrimination within catalogues is not a straightforward operation, especially for low-magnitude mainshocks. Here, we describe the rate decay of the Omori law obtained using different sequence discrimination tools and we discover that, when the background seismicity is excluded, the sequences tend to last for the temporal extension of the catalogue.

  4. Deep sequencing increases hepatitis C virus phylogenetic cluster detection compared to Sanger sequencing.

    Science.gov (United States)

    Montoya, Vincent; Olmstead, Andrea; Tang, Patrick; Cook, Darrel; Janjua, Naveed; Grebely, Jason; Jacka, Brendan; Poon, Art F Y; Krajden, Mel

    2016-09-01

    Effective surveillance and treatment strategies are required to control the hepatitis C virus (HCV) epidemic. Phylogenetic analyses are powerful tools for reconstructing the evolutionary history of viral outbreaks and identifying transmission clusters. These studies often rely on Sanger sequencing which typically generates a single consensus sequence for each infected individual. For rapidly mutating viruses such as HCV, consensus sequencing underestimates the complexity of the viral quasispecies population and could therefore generate different phylogenetic tree topologies. Although deep sequencing provides a more detailed quasispecies characterization, in-depth phylogenetic analyses are challenging due to dataset complexity and computational limitations. Here, we apply deep sequencing to a characterized population to assess its ability to identify phylogenetic clusters compared with consensus Sanger sequencing. For deep sequencing, a sample specific threshold determined by the 50th percentile of the patristic distance distribution for all variants within each individual was used to identify clusters. Among seven patristic distance thresholds tested for the Sanger sequence phylogeny ranging from 0.005-0.06, a threshold of 0.03 was found to provide the maximum balance between positive agreement (samples in a cluster) and negative agreement (samples not in a cluster) relative to the deep sequencing dataset. From 77 HCV seroconverters, 10 individuals were identified in phylogenetic clusters using both methods. Deep sequencing analysis identified an additional 4 individuals and excluded 8 other individuals relative to Sanger sequencing. The application of this deep sequencing approach could be a more effective tool to understand onward HCV transmission dynamics compared with Sanger sequencing, since the incorporation of minority sequence variants improves the discrimination of phylogenetically linked clusters. PMID:27282472

  5. Papaya (Carica papaya) lysozyme is a member of the family 19 (Basic, class II) chitinases

    NARCIS (Netherlands)

    Subroto, T; Sufiati, S; Beintema, JJ

    1999-01-01

    The most comprehensive studies on a plant lysozyme (EC 3.2.1.17) are those on the enzyme from papaya (Carica papaya) latex, published in 1967 and 1969. However, the N-terminal amino acid sequence of five amino acid sequence of this enzyme, determined by manual Edman degradation, did not allow assign

  6. Identifying driver mutations in sequenced cancer genomes

    DEFF Research Database (Denmark)

    Raphael, Benjamin J; Dobson, Jason R; Oesper, Layla;

    2014-01-01

    High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise......, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA...... sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein...

  7. Using mobile sequencers in an academic classroom.

    Science.gov (United States)

    Zaaijer, Sophie; Erlich, Yaniv

    2016-01-01

    The advent of mobile DNA sequencers has made it possible to generate DNA sequencing data outside of laboratories and genome centers. Here, we report our experience of using the MinION, a mobile sequencer, in a 13-week academic course for undergraduate and graduate students. The course consisted of theoretical sessions that presented fundamental topics in genomics and several applied hackathon sessions. In these hackathons, the students used MinION sequencers to generate and analyze their own data and gain hands-on experience in the topics discussed in the theoretical classes. The manuscript describes the structure of our class, the educational material, and the lessons we learned in the process. We hope that the knowledge and material presented here will provide the community with useful tools to help educate future generations of genome scientists. PMID:27054412

  8. Sequencing Information Management System (SIMS). Final report

    Energy Technology Data Exchange (ETDEWEB)

    Fields, C.

    1996-02-15

    A feasibility study to develop a requirements analysis and functional specification for a data management system for large-scale DNA sequencing laboratories resulted in a functional specification for a Sequencing Information Management System (SIMS). This document reports the results of this feasibility study, and includes a functional specification for a SIMS relational schema. The SIMS is an integrated information management system that supports data acquisition, management, analysis, and distribution for DNA sequencing laboratories. The SIMS provides ad hoc query access to information on the sequencing process and its results, and partially automates the transfer of data between laboratory instruments, analysis programs, technical personnel, and managers. The SIMS user interfaces are designed for use by laboratory technicians, laboratory managers, and scientists. The SIMS is designed to run in a heterogeneous, multiplatform environment in a client/server mode. The SIMS communicates with external computational and data resources via the internet.

  9. Viral detection by high-throughput sequencing.

    Science.gov (United States)

    Motooka, Daisuke; Nakamura, Shota; Hagiwara, Katsuro; Nakaya, Takaaki

    2015-01-01

    We applied a high-throughput sequencing platform, Ion PGM, for viral detection in fecal samples from adult cows collected in Hokkaido, Japan. Random RT-PCR was performed to amplify RNA extracted from 0.25 ml of fecal specimens (N = 8), and more than 5 μg of cDNA was synthesized. Unbiased high-throughput sequencing using the 318 v2 semiconductor chip of these eight samples yielded 57-580 K (average: 270 K, after data analysis) reads in a single run. As a result, viral genome sequences were detected in each specimen. In addition to bacteriophage, mammal- and insect-derived viruses, partial genome sequences of plant, algal, and protozoal viruses were detected. Thus, this metagenomic analysis of fecal specimens could be useful to comprehensively understand viral populations of the intestine and food sources in animals. PMID:25287501

  10. Long range correlations in DNA sequences

    CERN Document Server

    Mohanty, A K

    2002-01-01

    The so called long range correlation properties of DNA sequences are studied using the variance analyses of the density distribution of a single or a group of nucleotides in a model independent way. This new method which was suggested earlier has been applied to extract slope parameters that characterize the correlation properties for several intron containing and intron less DNA sequences. An important aspect of all the DNA sequences is the properties of complimentarity by virtue of which any two complimentary distributions (like GA is complimentary to TC or G is complimentary to ATC) have identical fluctuations at all scales although their distribution functions need not be identical. Due to this complimentarity, the famous DNA walk representation whose statistical interpretation is still unresolved is shown to be a special case of the present formalism with a density distribution corresponding to a purine or a pyrimidine group. Another interesting aspect of most of the DNA sequences is that the factorial m...

  11. Algebraic structures of sequences of numbers

    Science.gov (United States)

    Huang, I.-Chiau

    2012-09-01

    For certain sequences of numbers, commutative rings with a module structure over a non-commutative ring are constructed. Identities of these numbers are considered as realizations of algebraic relations.

  12. New stopping criteria for segmenting DNA sequences

    CERN Document Server

    Li, W

    2001-01-01

    We propose a solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns. This new stopping criterion is based on Bayesian Information Criterion (BIC) in the model selection framework. When this stopping criterion is applied to a left telomere sequence of yeast Saccharomyces cerevisiae and the complete genome sequence of bacterium Escherichia coli, borders of biologically meaningful units were identified (e.g. subtelomeric units, replication origin, and replication terminus), and a more reasonable number of domains was obtained. We also introduce a measure called segmentation strength which can be used to control the delineation of large domains. The relationship between the average domain size and the threshold of segmentation strength is determined for several genome sequences.

  13. Viral detection by high-throughput sequencing.

    Science.gov (United States)

    Motooka, Daisuke; Nakamura, Shota; Hagiwara, Katsuro; Nakaya, Takaaki

    2015-01-01

    We applied a high-throughput sequencing platform, Ion PGM, for viral detection in fecal samples from adult cows collected in Hokkaido, Japan. Random RT-PCR was performed to amplify RNA extracted from 0.25 ml of fecal specimens (N = 8), and more than 5 μg of cDNA was synthesized. Unbiased high-throughput sequencing using the 318 v2 semiconductor chip of these eight samples yielded 57-580 K (average: 270 K, after data analysis) reads in a single run. As a result, viral genome sequences were detected in each specimen. In addition to bacteriophage, mammal- and insect-derived viruses, partial genome sequences of plant, algal, and protozoal viruses were detected. Thus, this metagenomic analysis of fecal specimens could be useful to comprehensively understand viral populations of the intestine and food sources in animals.

  14. Extracting biological knowledge from DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    De La Vega, F.M. [CINVESTAV-IPN (Mexico); Thieffry, D. [Universite Libre de Bruxelles, Rhode-Saint-Genese (Belgium)]|[Universidad Nacional Autonoma de Mexico, Morelos (Mexico); Collado-Vides, J. [Universidad Nacional Autonoma de Mexico, Morelos (Mexico)

    1996-12-31

    This session describes the elucidation of information from dna sequences and what challenges computational biologists face in their task of summarizing and deciphering the human genome. Techniques discussed include methods from statistics, information theory, artificial intelligence and linguistics. 1 ref.

  15. Optimization of a sequence of reactors

    DEFF Research Database (Denmark)

    Vidal, Rene Victor Valqui

    1991-01-01

    Concerns the optimal production of sulphuric acid in a sequence of reactors. Using a suitable approximation to the objective function, this problem can easily be solved using the maximum principle. A numerical example documents the applicability of the suggested approach...

  16. Characterizing leader sequences of CRISPR loci

    DEFF Research Database (Denmark)

    Alkhnbashi, Omer; Shah, Shiraz Ali; Garrett, Roger Antony;

    2016-01-01

    The CRISPR-Cas system is an adaptive immune system in many archaea and bacteria, which provides resistance against invading genetic elements. The first phase of CRISPR-Cas immunity is called adaptation, in which small DNA fragments are excised from genetic elements and are inserted into a CRISPR...... array generally adjacent to its so called leader sequence at one end of the array. It has been shown that transcription initiation and adaptation signals of the CRISPR array are located within the leader. However, apart from promoters, there is very little knowledge of sequence or structural motifs...... sequences by focusing on the consensus repeat of the adjacent CRISPR array and weak upstream conservation signals. We applied our tool to the analysis of a comprehensive genomic database and identified several characteristic properties of leader sequences specific to archaea and bacteria, ranging from...

  17. Fibonacci Sequence and Supramolecular Structure of DNA.

    Science.gov (United States)

    Shabalkin, I P; Grigor'eva, E Yu; Gudkova, M V; Shabalkin, P I

    2016-05-01

    We proposed a new model of supramolecular DNA structure. Similar to the previously developed by us model of primary DNA structure [11-15], 3D structure of DNA molecule is assembled in accordance to a mathematic rule known as Fibonacci sequence. Unlike primary DNA structure, supramolecular 3D structure is assembled from complex moieties including a regular tetrahedron and a regular octahedron consisting of monomers, elements of the primary DNA structure. The moieties of the supramolecular DNA structure forming fragments of regular spatial lattice are bound via linker (joint) sequences of the DNA chain. The lattice perceives and transmits information signals over a considerable distance without acoustic aberrations. Linker sequences expand conformational space between lattice segments allowing their sliding relative to each other under the action of external forces. In this case, sliding is provided by stretching of the stacked linker sequences.

  18. Comparative genomics beyond sequence-based alignments

    DEFF Research Database (Denmark)

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.;

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment...... methods to misalign, or even refuse to align, homologous ncRNAs, consequently obscuring that structural signal. We have used CMfinder, a structure-oriented local alignment tool, to search the ENCODE regions of vertebrate multiple alignments. In agreement with other studies, we find a large number...... of potential RNA structures in the ENCODE regions. We report 6587 candidate regions with an estimated false-positive rate of 50%. More intriguingly, many of these candidates may be better represented by alignments taking the RNA secondary structure into account than those based on primary sequence alone, often...

  19. Network of tRNA Gene Sequences

    Institute of Scientific and Technical Information of China (English)

    WEI Fang-ping; LI Sheng; MA Hong-ru

    2008-01-01

    A network of 3719 tRNA gene sequences was constructed using simplest alignment. Its topology, degree distribution and clustering coefficient were studied. The behaviors of the network shift from fluctuated distribution to scale-free distribution when the similarity degree of the tRNA gene sequences increases. The tRNA gene sequences with the same anticodon identity are more self-organized than those with different anticodon identities and form local clusters in the network. Some vertices of the local cluster have a high connection with other local clusters, and the probable reason was given. Moreover, a network constructed by the same number of random tRNA sequences was used to make comparisons. The relationships between the properties of the tRNA similarity network and the characters of tRNA evolutionary history were discussed.

  20. Supervised Sequence Labelling with Recurrent Neural Networks

    CERN Document Server

    Graves, Alex

    2012-01-01

    Supervised sequence labelling is a vital area of machine learning, encompassing tasks such as speech, handwriting and gesture recognition, protein secondary structure prediction and part-of-speech tagging. Recurrent neural networks are powerful sequence learning tools—robust to input noise and distortion, able to exploit long-range contextual information—that would seem ideally suited to such problems. However their role in large-scale sequence labelling systems has so far been auxiliary.    The goal of this book is a complete framework for classifying and transcribing sequential data with recurrent neural networks only. Three main innovations are introduced in order to realise this goal. Firstly, the connectionist temporal classification output layer allows the framework to be trained with unsegmented target sequences, such as phoneme-level speech transcriptions; this is in contrast to previous connectionist approaches, which were dependent on error-prone prior segmentation. Secondly, multidimensional...

  1. Sequence finishing and mapping of Drosophila melanogasterheterochromatin

    Energy Technology Data Exchange (ETDEWEB)

    Hoskins, Roger A.; Carlson, Joseph W.; Kennedy, Cameron; Acevedo,David; Evans-Holm, Martha; Frise, Erwin; Wan, Kenneth H.; Park, Soo; Mendez-Lago, Maria; Rossi, Fabrizio; Villasante, Alfredo; Dimitri,Patrizio; Karpen, Gary H.; Celniker, Susan E.

    2007-06-15

    Genome sequences for most metazoans are incomplete due tothe presence of repeated DNA in the pericentromeric heterochromatin. Theheterochromatic regions of D. melanogaster contain 20 Mb of sequenceamenable to mapping, sequence assembly and finishing. Here we describethe generation of 15 Mb of finished or improved heterochromatic sequenceusing available clone resources and assembly and mapping methods. We alsoconstructed a BAC-based physical map that spans approximately 13 Mb ofthe pericentromeric heterochromatin, and a cytogenetic map that positionsapproximately 11 Mb of BAC contigs and sequence scaffolds in specificchromosomal locations. The integrated sequence assembly and maps greatlyimprove our understanding of the structure and composition of this poorlyunderstood fraction of a metazoan genome and provide a framework forfunctional analyses.

  2. Fibonacci Sequence and Supramolecular Structure of DNA.

    Science.gov (United States)

    Shabalkin, I P; Grigor'eva, E Yu; Gudkova, M V; Shabalkin, P I

    2016-05-01

    We proposed a new model of supramolecular DNA structure. Similar to the previously developed by us model of primary DNA structure [11-15], 3D structure of DNA molecule is assembled in accordance to a mathematic rule known as Fibonacci sequence. Unlike primary DNA structure, supramolecular 3D structure is assembled from complex moieties including a regular tetrahedron and a regular octahedron consisting of monomers, elements of the primary DNA structure. The moieties of the supramolecular DNA structure forming fragments of regular spatial lattice are bound via linker (joint) sequences of the DNA chain. The lattice perceives and transmits information signals over a considerable distance without acoustic aberrations. Linker sequences expand conformational space between lattice segments allowing their sliding relative to each other under the action of external forces. In this case, sliding is provided by stretching of the stacked linker sequences. PMID:27265133

  3. The complete DNA sequence of vaccinia virus.

    Science.gov (United States)

    Goebel, S J; Johnson, G P; Perkus, M E; Davis, S W; Winslow, J P; Paoletti, E

    1990-11-01

    The complete DNA sequence of the genome of vaccinia virus has been determined. The genome consisted of 191,636 bp with a base composition of 66.6% A + T. We have identified 198 "major" protein-coding regions and 65 overlapping "minor" regions, for a total of 263 potential genes. Genes encoded by the virus were located by examination of DNA sequence characteristics and compared with existing vaccinia virus mapping analyses, sequence data, and transcription data. These genes were found to be compactly organized along the genome with relatively few regions of noncoding sequences. Whereas several similarities to proteins of known function were discerned, the function of the majority of proteins encoded by these open reading frames is as yet undetermined.

  4. "X"-tending the Fibonacci Sequence.

    Science.gov (United States)

    Moran, Glenn T.

    2002-01-01

    Outlines a lesson on the Fibonacci and Lucas sequences that captures student interest by presenting the opportunity for computation practice, mental mathematics, and proof for algebra students. Discusses an extension for solving simultaneous equations. (YDS)

  5. Fathead minnow genome sequencing and assembly

    Data.gov (United States)

    U.S. Environmental Protection Agency — The dataset provides the URLs for accessing the genome sequence data and two draft assemblies as well as fathead minnow genotyping data associated with estimating...

  6. ASAP: Amplification, sequencing & annotation of plastomes

    Directory of Open Access Journals (Sweden)

    Folta Kevin M

    2005-12-01

    Full Text Available Abstract Background Availability of DNA sequence information is vital for pursuing structural, functional and comparative genomics studies in plastids. Traditionally, the first step in mining the valuable information within a chloroplast genome requires sequencing a chloroplast plasmid library or BAC clones. These activities involve complicated preparatory procedures like chloroplast DNA isolation or identification of the appropriate BAC clones to be sequenced. Rolling circle amplification (RCA is being used currently to amplify the chloroplast genome from purified chloroplast DNA and the resulting products are sheared and cloned prior to sequencing. Herein we present a universal high-throughput, rapid PCR-based technique to amplify, sequence and assemble plastid genome sequence from diverse species in a short time and at reasonable cost from total plant DNA, using the large inverted repeat region from strawberry and peach as proof of concept. The method exploits the highly conserved coding regions or intergenic regions of plastid genes. Using an informatics approach, chloroplast DNA sequence information from 5 available eudicot plastomes was aligned to identify the most conserved regions. Cognate primer pairs were then designed to generate ~1 – 1.2 kb overlapping amplicons from the inverted repeat region in 14 diverse genera. Results 100% coverage of the inverted repeat region was obtained from Arabidopsis, tobacco, orange, strawberry, peach, lettuce, tomato and Amaranthus. Over 80% coverage was obtained from distant species, including Ginkgo, loblolly pine and Equisetum. Sequence from the inverted repeat region of strawberry and peach plastome was obtained, annotated and analyzed. Additionally, a polymorphic region identified from gel electrophoresis was sequenced from tomato and Amaranthus. Sequence analysis revealed large deletions in these species relative to tobacco plastome thus exhibiting the utility of this method for structural and

  7. Probabilistic motor sequence yields greater offline and less online learning than fixed sequence

    Directory of Open Access Journals (Sweden)

    Yue eDu

    2016-03-01

    Full Text Available It is well acknowledged that motor sequences can be learned quickly through online learning. Subsequently, the initial acquisition of a motor sequence is boosted or consolidated by offline learning. However, little is known whether offline learning can drive the fast learning of motor sequences (i.e., initial sequence learning in the first training session. To examine offline learning in the fast learning stage, we asked four groups of young adults to perform the serial reaction time (SRT task with either a fixed or probabilistic sequence and with or without preliminary knowledge of the presence of a sequence. The sequence and instruction types were manipulated to emphasize either procedural (probabilistic sequence; no preliminary knowledge or declarative (fixed sequence; with preliminary knowledge memory that were found to either facilitate or inhibit offline learning. In the SRT task, there were six learning blocks with a two-minute break between each consecutive block. Throughout the session, stimuli followed the same fixed or probabilistic pattern except in Block 5, in which stimuli appeared in a random order. We found that preliminary knowledge facilitated the learning of a fixed sequence, but not a probabilistic sequence. In addition to overall learning measured by the mean reaction time (RT, we examined the progressive changes in RT within and between blocks (i.e., online and offline learning, respectively. It was found that the two groups who performed the fixed sequence, regardless of preliminary knowledge, showed greater online learning than the other two groups who performed the probabilistic sequence. The groups who performed the probabilistic sequence, regardless of preliminary knowledge, did not display online learning, as indicated by a decline in performance within the learning blocks. However, they did demonstrate remarkably greater offline improvement in RT, which suggests that they are learning the probabilistic sequence

  8. Visually lossless compression of digital hologram sequences

    OpenAIRE

    Darakis E.; Kowiel M.; Nasanen R.; Naughton T.J.

    2010-01-01

    Digital hologram sequences have great potential for the recording of 3D scenes of moving macroscopic objects as their numerical reconstruction can yield a range of perspective views of the scene. Digital holograms inherently have large information content and lossless coding of holographic data is rather inefficient due to the speckled nature of the interference fringes they contain. Lossy coding of still holograms and hologram sequences has shown promising results. By definition,...

  9. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  10. Symbolic Representations in Motor Sequence Learning

    OpenAIRE

    Bo, J.; Peltier, S.J.; Noll, D.C.; Seidler, R.D.

    2010-01-01

    It has been shown that varying the spatial versus symbolic nature of stimulus presentation and response production, which affects stimulus-response (S-R) mapping requirements, influences the magnitude of implicit sequence learning (Koch & Hoffman, 2000). Here, we evaluated how spatial and symbolic stimuli and responses affect the neural bases of sequence learning. We selectively eliminated the spatial component of stimulus presentation (spatial versus symbolic), response execution (manual ver...

  11. Next generation sequencing in cardiovascular diseases

    OpenAIRE

    Faita, Francesca; Vecoli, Cecilia; Foffa, Ilenia; Andreassi, Maria Grazia

    2012-01-01

    In the last few years, the advent of next generation sequencing (NGS) has revolutionized the approach to genetic studies, making whole-genome sequencing a possible way of obtaining global genomic information. NGS has very recently been shown to be successful in identifying novel causative mutations of rare or common Mendelian disorders. At the present time, it is expected that NGS will be increasingly important in the study of inherited and complex cardiovascular diseases (CVDs). However, the...

  12. Chromatid interchanges at intrachromosomal telomeric DNA sequences

    International Nuclear Information System (INIS)

    Chinese hamster Don cells were exposed to X-rays, mitomycin C and teniposide (VM-26) to induce chromatid exchanges (quadriradials and triradials). After fluorescence in situ hybridization (FISH) of telomere sequences it was found that interstitial telomere-like DNA sequence arrays presented around five times more breakage-rearrangements than the genome overall. This high recombinogenic capacity was independent of the clastogen, suggesting that this susceptibility is not related to the initial mechanisms of DNA damage. (author)

  13. Enhanced Dynamic Algorithm of Genome Sequence Alignments

    OpenAIRE

    Arabi E. keshk

    2014-01-01

    The merging of biology and computer science has created a new field called computational biology that explore the capacities of computers to gain knowledge from biological data, bioinformatics. Computational biology is rooted in life sciences as well as computers, information sciences, and technologies. The main problem in computational biology is sequence alignment that is a way of arranging the sequences of DNA, RNA or protein to identify the region of similarity and relationship between se...

  14. Logarithm of Irrationals and Beatty Sequences

    OpenAIRE

    E, Geremías Polanco

    2015-01-01

    In this paper we find an identity that gives a representation for the logarithm of any two irrational numbers $a, b >1$ in terms of a series whose terms are ratios of elements from the Beatty Sequences generated by these two numbers. We also show that Sturmian sequences can be defined in terms of these ratios. Furthermore, we find an identity for such series that bears a superficial resemblance to (a discrete version of) Frullani's Integral.

  15. Suicidal nucleotide sequences for DNA polymerization.

    OpenAIRE

    Samadashwily, G M; Dayn, A; Mirkin, S M

    1993-01-01

    Studying the activity of T7 DNA polymerase (Sequenase) on open circular DNAs, we observed virtually complete termination within potential triplex-forming sequences. Mutations destroying the triplex potential of the sequences prevented termination, while compensatory mutations restoring triplex potential restored it. We hypothesize that strand displacement during DNA polymerization of double-helical templates brings three DNA strands (duplex DNA downstream of the polymerase plus a displaced ov...

  16. Sequencing antibody repertoires: The next generation

    OpenAIRE

    Fischer, Nicolas

    2011-01-01

    Genomic studies have been revolutionized by the use of next generation sequencing (NGS), which delivers huge amounts of sequence information in a short span of time. The number of applications for NGS is rapidly expanding and significantly transforming many areas of life sciences. The field of antibody research and discovery is no exception. Several recent studies have harnessed the power of NGS for analyzing natural or synthetic immunoglobulin repertoires with unprecedented resolution and ex...

  17. Bernuau spline wavelets and Sturmian sequences

    OpenAIRE

    Andrle, Miroslav; Burdik, Cestmir; Gazeau, Jean-Pierre

    2003-01-01

    A spline wavelets construction of class C^n(R) supported by sequences of aperiodic discretizations of R is presented. The construction is based on multiresolution analysis recently elaborated by G. Bernuau. At a given scale, we consider discretizations that are sets of left-hand ends of tiles in a self-similar tiling of the real line with finite local complexity. Corresponding tilings are determined by two-letter Sturmian substitution sequences. We illustrate the construction with examples ha...

  18. A Method to Construct Generalized Fibonacci Sequences

    Directory of Open Access Journals (Sweden)

    Adalberto García-Máynez

    2016-01-01

    Full Text Available The main purpose of this paper is to study the convergence properties of Generalized Fibonacci Sequences and the series of partial sums associated with them. When the proper values of an s×s real matrix A are real and different, we give a necessary and sufficient condition for the convergence of the matrix sequence A,A2,A3,… to a matrix B.

  19. A Sequencer for the LHC ERA

    CERN Document Server

    Baggiolini, V; Gorbonosov, R; Khasbulatov, D; Lamont, M

    2009-01-01

    The Sequencer is a high level software application that helps operators and physicists to commission and control the LHC. It is an important operational tool for the LHC and a core part of the control system that interacts with all LHC sub-systems. This paper describes the architecture and design of the sequencer and illustrates some innovative parts of the implementation, based on modern Java technology.

  20. Comparison of Next-Generation Sequencing Systems

    OpenAIRE

    Lin Liu; Yinhu Li; Siliang Li; Ni Hu; Yimin He; Ray Pong; Danni Lin; Lihua Lu; Maggie Law

    2012-01-01

    With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world’s ...

  1. Fluorescent signatures for variable DNA sequences

    OpenAIRE

    Rice, John E; Arthur H. Reis; Rice, Lisa M.; Carver-Brown, Rachel K.; Wangh, Lawrence J.

    2012-01-01

    Life abounds with genetic variations writ in sequences that are often only a few hundred nucleotides long. Rapid detection of these variations for identification of genetic diseases, pathogens and organisms has become the mainstay of molecular science and medicine. This report describes a new, highly informative closed-tube polymerase chain reaction (PCR) strategy for analysis of both known and unknown sequence variations. It combines efficient quantitative amplification of single-stranded DN...

  2. Calling SNPs without a reference sequence

    OpenAIRE

    Schuster Stephan C; Hayes Vanessa M; Zhang Yu; Ratan Aakrosh; Miller Webb

    2010-01-01

    Abstract Background The most common application for the next-generation sequencing technologies is resequencing, where short reads from the genome of an individual are aligned to a reference genome sequence for the same species. These mappings can then be used to identify genetic differences among individuals in a population, and perhaps ultimately to explain phenotypic variation. Many algorithms capable of aligning short reads to the reference, and determining differences between them have b...

  3. Calling SNPs without a reference sequence

    OpenAIRE

    Ratan, Aakrosh; Zhang, Yu; Hayes, Vanessa M.; Stephan C Schuster; Miller, Webb

    2010-01-01

    Background The most common application for the next-generation sequencing technologies is resequencing, where short reads from the genome of an individual are aligned to a reference genome sequence for the same species. These mappings can then be used to identify genetic differences among individuals in a population, and perhaps ultimately to explain phenotypic variation. Many algorithms capable of aligning short reads to the reference, and determining differences between them have been repor...

  4. Fibonacci difference sequence spaces for modulus functions

    Directory of Open Access Journals (Sweden)

    Kuldip Raj

    2015-05-01

    Full Text Available In the present paper we introduce Fibonacci difference sequence spaces l(F, Ƒ, p, u and  l_∞(F, Ƒ, p, u by using a sequence of modulus functions and a new band matrix F. We also make an effort to study some inclusion relations, topological and geometric properties of these spaces. Furthermore, the alpha, beta, gamma duals and matrix transformation of the space l(F, Ƒ, p, u are determined.

  5. Sequence characteristics of T4-like bacteriophage IME08 benome termini revealed by high throughput sequencing

    Directory of Open Access Journals (Sweden)

    An Xiaoping

    2011-04-01

    Full Text Available Abstract Background T4 phage is a model species that has contributed broadly to our understanding of molecular biology. T4 DNA replication and packaging share various mechanisms with human double-stranded DNA viruses such as herpes virus. The literature indicates that T4-like phage genomes have permuted terminal sequences, and are generated by a DNA terminase in a sequence-independent manner; Methods genomic DNA of T4-like bacteriophage IME08 was subjected to high throughput sequencing, and the read sequences with extraordinarily high occurrences were analyzed; Results we demonstrate that both the 5' and 3' termini of the IME08 genome starts with base G or A. The presence of a consensus sequence TTGGA|G around the breakpoint of the high frequency read sequences suggests that the terminase cuts the branched pre-genome in a sequence-preferred manner. Our analysis also shows that terminal cleavage is asymmetric, with one end cut at a consensus sequence, and the other end generated randomly. The sequence-preferred cleavage may produce sticky-ends, but with each end being packaged with different efficiencies; Conclusions this study illustrates how high throughput sequencing can be used to probe replication and packaging mechanisms in bacteriophages and/or viruses.

  6. Monitoring genomic sequences during SELEX using high-throughput sequencing: neutral SELEX.

    Directory of Open Access Journals (Sweden)

    Bob Zimmermann

    Full Text Available BACKGROUND: SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. METHODOLOGY/PRINCIPAL FINDINGS: To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. CONCLUSIONS/SIGNIFICANCE: Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

  7. The accuracy, feasibility and challenges of sequencing short tandem repeats using next-generation sequencing platforms.

    Directory of Open Access Journals (Sweden)

    Monika Zavodna

    Full Text Available To date we have little knowledge of how accurate next-generation sequencing (NGS technologies are in sequencing repetitive sequences beyond known limitations to accurately sequence homopolymers. Only a handful of previous reports have evaluated the potential of NGS for sequencing short tandem repeats (microsatellites and no empirical study has compared and evaluated the performance of more than one NGS platform with the same dataset. Here we examined yeast microsatellite variants from both long-read (454-sequencing and short-read (Illumina NGS platforms and compared these to data derived through Sanger sequencing. In addition, we investigated any locus-specific biases and differences that might have resulted from variability in microsatellite repeat number, repeat motif or type of mutation. Out of 112 insertion/deletion variants identified among 45 microsatellite amplicons in our study, we found 87.5% agreement between the 454-platform and Sanger sequencing in frequency of variant detection after Benjamini-Hochberg correction for multiple tests. For a subset of 21 microsatellite amplicons derived from Illumina sequencing, the results of short-read platform were highly consistent with the other two platforms, with 100% agreement with 454-sequencing and 93.6% agreement with the Sanger method after Benjamini-Hochberg correction. We found that the microsatellite attributes copy number, repeat motif and type of mutation did not have a significant effect on differences seen between the sequencing platforms. We show that both long-read and short-read NGS platforms can be used to sequence short tandem repeats accurately, which makes it feasible to consider the use of these platforms in high-throughput genotyping. It appears the major requirement for achieving both high accuracy and rare variant detection in microsatellite genotyping is sufficient read depth coverage. This might be a challenge because each platform generates a consistent pattern of non

  8. Leaf sequencing algorithms for segmented multileaf collimation

    Energy Technology Data Exchange (ETDEWEB)

    Kamath, Srijit [Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL (United States); Sahni, Sartaj [Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL (United States); Li, Jonathan [Department of Radiation Oncology, University of Florida, Gainesville, FL (United States); Palta, Jatinder [Department of Radiation Oncology, University of Florida, Gainesville, FL (United States); Ranka, Sanjay [Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL (United States)

    2003-02-07

    The delivery of intensity-modulated radiation therapy (IMRT) with a multileaf collimator (MLC) requires the conversion of a radiation fluence map into a leaf sequence file that controls the movement of the MLC during radiation delivery. It is imperative that the fluence map delivered using the leaf sequence file is as close as possible to the fluence map generated by the dose optimization algorithm, while satisfying hardware constraints of the delivery system. Optimization of the leaf sequencing algorithm has been the subject of several recent investigations. In this work, we present a systematic study of the optimization of leaf sequencing algorithms for segmental multileaf collimator beam delivery and provide rigorous mathematical proofs of optimized leaf sequence settings in terms of monitor unit (MU) efficiency under most common leaf movement constraints that include minimum leaf separation constraint and leaf interdigitation constraint. Our analytical analysis shows that leaf sequencing based on unidirectional movement of the MLC leaves is as MU efficient as bidirectional movement of the MLC leaves.

  9. Estimating the entropy of DNA sequences.

    Science.gov (United States)

    Schmitt, A O; Herzel, H

    1997-10-01

    The Shannon entropy is a standard measure for the order state of symbol sequences, such as, for example, DNA sequences. In order to incorporate correlations between symbols, the entropy of n-mers (consecutive strands of n symbols) has to be determined. Here, an assay is presented to estimate such higher order entropies (block entropies) for DNA sequences when the actual number of observations is small compared with the number of possible outcomes. The n-mer probability distribution underlying the dynamical process is reconstructed using elementary statistical principles: The theorem of asymptotic equi-distribution and the Maximum Entropy Principle. Constraints are set to force the constructed distributions to adopt features which are characteristic for the real probability distribution. From the many solutions compatible with these constraints the one with the highest entropy is the most likely one according to the Maximum Entropy Principle. An algorithm performing this procedure is expounded. It is tested by applying it to various DNA model sequences whose exact entropies are known. Finally, results for a real DNA sequence, the complete genome of the Epstein Barr virus, are presented and compared with those of other information carriers (texts, computer source code, music). It seems as if DNA sequences possess much more freedom in the combination of the symbols of their alphabet than written language or computer source codes. PMID:9344742

  10. Sequence determinants of human microsatellite variability

    Directory of Open Access Journals (Sweden)

    Jakobsson Mattias

    2009-12-01

    Full Text Available Abstract Background Microsatellite loci are frequently used in genomic studies of DNA sequence repeats and in population studies of genetic variability. To investigate the effect of sequence properties of microsatellites on their level of variability we have analyzed genotypes at 627 microsatellite loci in 1,048 worldwide individuals from the HGDP-CEPH cell line panel together with the DNA sequences of these microsatellites in the human RefSeq database. Results Calibrating PCR fragment lengths in individual genotypes by using the RefSeq sequence enabled us to infer repeat number in the HGDP-CEPH dataset and to calculate the mean number of repeats (as opposed to the mean PCR fragment length, under the assumption that differences in PCR fragment length reflect differences in the numbers of repeats in the embedded repeat sequences. We find the mean and maximum numbers of repeats across individuals to be positively correlated with heterozygosity. The size and composition of the repeat unit of a microsatellite are also important factors in predicting heterozygosity, with tetra-nucleotide repeat units high in G/C content leading to higher heterozygosity. Finally, we find that microsatellites containing more separate sets of repeated motifs generally have higher heterozygosity. Conclusions These results suggest that sequence properties of microsatellites have a significant impact in determining the features of human microsatellite variability.

  11. Sequencing proteins with transverse ionic transport

    Science.gov (United States)

    Boynton, Paul; di Ventra, Massimiliano

    2015-03-01

    De novo protein sequencing is essential for understanding cellular processes that govern the function of living organisms. By obtaining the order of the amino acids that composes a given protein one can determine both its secondary and tertiary structures through protein structure prediction, which is used to create models for protein aggregation diseases such as Alzheimer's Disease. Mass spectrometry is the current technique of choice for de novo sequencing, but because some amino acids have the same mass the sequence cannot be completely determined in many cases. In this paper we propose a new technique for de novo protein sequencing that involves translocating a polypeptide through a synthetic nanochannel and measuring the ionic current of each amino acid through an intersecting perpendicular nanochannel, similar to that proposed in for DNA sequencing. Indeed, we find that the distribution of ionic currents for each of the 20 proteinogenic amino acids encoded by eukaryotic genes is statistically distinct, showing this technique's potential for de novo protein sequencing.

  12. Global patterns of sequence evolution in Drosophila

    Directory of Open Access Journals (Sweden)

    Marín Ignacio

    2007-11-01

    Full Text Available Abstract Background Sequencing of the genomes of several Drosophila allows for the first precise analyses of how global sequence patterns change among multiple, closely related animal species. A basic question is whether there are characteristic features that differentiate chromosomes within a species or between different species. Results We explored the euchromatin of the chromosomes of seven Drosophila species to establish their global patterns of DNA sequence diversity. Between species, differences in the types and amounts of simple sequence repeats were found. Within each species, the autosomes have almost identical oligonucleotide profiles. However, X chromosomes and autosomes have, in all species, a qualitatively different composition. The X chromosomes are less complex than the autosomes, containing both a higher amount of simple DNA sequences and, in several cases, chromosome-specific repetitive sequences. Moreover, we show that the right arm of the X chromosome of Drosophila pseudoobscura, which evolved from an autosome 10 – 18 millions of years ago, has a composition which is identical to that of the original, left arm of the X chromosome. Conclusion The consistent differences among species, differences among X chromosomes and autosomes and the convergent evolution of X and neo-X chromosomes demonstrate that strong forces are acting on drosophilid genomes to generate peculiar chromosomal landscapes. We discuss the relationships of the patterns observed with differential recombination and mutation rates and with the process of dosage compensation.

  13. An enhanced algorithm for multiple sequence alignment of protein sequences using genetic algorithm.

    Science.gov (United States)

    Kumar, Manish

    2015-01-01

    One of the most fundamental operations in biological sequence analysis is multiple sequence alignment (MSA). The basic of multiple sequence alignment problems is to determine the most biologically plausible alignments of protein or DNA sequences. In this paper, an alignment method using genetic algorithm for multiple sequence alignment has been proposed. Two different genetic operators mainly crossover and mutation were defined and implemented with the proposed method in order to know the population evolution and quality of the sequence aligned. The proposed method is assessed with protein benchmark dataset, e.g., BALIBASE, by comparing the obtained results to those obtained with other alignment algorithms, e.g., SAGA, RBT-GA, PRRP, HMMT, SB-PIMA, CLUSTALX, CLUSTAL W, DIALIGN and PILEUP8 etc. Experiments on a wide range of data have shown that the proposed algorithm is much better (it terms of score) than previously proposed algorithms in its ability to achieve high alignment quality. PMID:27065770

  14. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene.

    Science.gov (United States)

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the 'CCCGCC' motif in the GFP coding sequence. PMID:27193250

  15. Artificial duplicate reads in sequencing data of 454 Genome Sequencer FLX System

    Institute of Scientific and Technical Information of China (English)

    Hui Dong; Yangyi Chen; Yan Shen; Shengyue Wang; Guoping Zhao; Weirong Jin

    2011-01-01

    The 454 Genome Sequencer (GS) FLX System is one of the next-generation sequencing systems featured by long reads, high accuracy, and ultra-high throughput.Based on the mechanism of emulsion PCR, a unique DNA template would only generate a unique sequence read after being amplified and sequenced on GS FLX.However,biased amplification of DNA templates might occur in the process of emulsion PCR, which results in production of artificial duplicate reads.Under the condition that each DNA template is unique to another, 3.49%-18.14% of total reads in GS FLX-sequencing data were found to be artificial duplicate reads.These duplicate reads may lead to misunderstanding of sequencing data and special attention should be paid to the potential biases they introduced to the data.

  16. Wavelet Analysis on Symbolic Sequences and Two-Fold de Bruijn Sequences

    Science.gov (United States)

    Osipov, V. Al.

    2016-07-01

    The concept of symbolic sequences play important role in study of complex systems. In the work we are interested in ultrametric structure of the set of cyclic sequences naturally arising in theory of dynamical systems. Aimed at construction of analytic and numerical methods for investigation of clusters we introduce operator language on the space of symbolic sequences and propose an approach based on wavelet analysis for study of the cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a formula for counting of two-fold de Bruijn sequences, the extension of the notion of de Bruijn sequences. Possible advantages of the developed description is also discussed in context of applied problem of construction of efficient DNA sequence assembly algorithms.

  17. TWIN REVERSED ARTERIAL PERFUSION SEQUENCE (TRAP SEQUENCE). THE ACARDIAC /ACEPHALIC TWIN

    OpenAIRE

    S. Saritha; Sumedha S. Anjankar

    2013-01-01

    Twin-Reversed Arterial Perfusion (TRAP sequence) is a rare complication of monochorionic twins (MC, twins sharing one placenta). TRAP sequence is known as acardius or chorioangiopagus parasiticus. It occurs in 1% of monochorionic twin pregnancies and in 1 in 35,000 pregnancies. The risk of recurrence was estimated 1:10,000. TRAP sequence is characterized by a structurally normal pump twin perfusing an anomalous twin. In TRAP syndrome, there is mortality and deformities in both twins. ...

  18. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    Directory of Open Access Journals (Sweden)

    Baldwin Stephen A

    2011-03-01

    Full Text Available Abstract Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  19. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    Science.gov (United States)

    2011-01-01

    Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/. PMID:21385349

  20. Identification of 10 882 porcine microsatellite sequences and virtual mapping of 4528 of these sequences

    DEFF Research Database (Denmark)

    Karlskov-Mortensen, Peter; Hu, Z.L.; Gorodkin, Jan;

    2007-01-01

    the human genome (BLAST cut-off threshold = 1 x 10-5). All microsatellite sequences placed on the comparative map are accessible at http://www.animalgenome.org/QTLdb/pig.html . These sequences increase the number of identified microsatellites in the porcine genome by several orders of magnitude....... They are a new resource of microsatellite sequences for generating markers to be used in linkage studies and in fine mapping and positional cloning of quantitative trait loci....

  1. Coverage statistics for sequence census methods

    Directory of Open Access Journals (Sweden)

    Evans Steven N

    2010-08-01

    Full Text Available Abstract Background We study the statistical properties of fragment coverage in genome sequencing experiments. In an extension of the classic Lander-Waterman model, we consider the effect of the length distribution of fragments. We also introduce a coding of the shape of the coverage depth function as a tree and explain how this can be used to detect regions with anomalous coverage. This modeling perspective is especially germane to current high-throughput sequencing experiments, where both sample preparation protocols and sequencing technology particulars can affect fragment length distributions. Results Under the mild assumptions that fragment start sites are Poisson distributed and successive fragment lengths are independent and identically distributed, we observe that, regardless of fragment length distribution, the fragments produced in a sequencing experiment can be viewed as resulting from a two-dimensional spatial Poisson process. We then study the successive jumps of the coverage function, and show that they can be encoded as a random tree that is approximately a Galton-Watson tree with generation-dependent geometric offspring distributions whose parameters can be computed. Conclusions We extend standard analyses of shotgun sequencing that focus on coverage statistics at individual sites, and provide a null model for detecting deviations from random coverage in high-throughput sequence census based experiments. Our approach leads to explicit determinations of the null distributions of certain test statistics, while for others it greatly simplifies the approximation of their null distributions by simulation. Our focus on fragments also leads to a new approach to visualizing sequencing data that is of independent interest.

  2. Compilation of tRNA sequences.

    Science.gov (United States)

    Sprinzl, M; Grueter, F; Spelzhaus, A; Gauss, D H

    1980-01-11

    This compilation presents in a small space the tRNA sequences so far published. The numbering of tRNAPhe from yeast is used following the rules proposed by the participants of the Cold Spring Harbor Meeting on tRNA 1978 (1,2;Fig. 1). This numbering allows comparisons with the three dimensional structure of tRNAPhe. The secondary structure of tRNAs is indicated by specific underlining. In the primary structure a nucleoside followed by a nucleoside in brackets or a modification in brackets denotes that both types of nucleosides can occupy this position. Part of a sequence in brackets designates a piece of sequence not unambiguosly analyzed. Rare nucleosides are named according to the IUPACIUB rules (for complicated rare nucleosides and their identification see Table 1); those with lengthy names are given with the prefix x and specified in the footnotes. Footnotes are numbered according to the coordinates of the corresponding nucleoside and are indicated in the sequence by an asterisk. The references are restricted to the citation of the latest publication in those cases where several papers deal with one sequence. For additional information the reader is referred either to the original literature or to other tRNA sequence compilations (3-7). Mutant tRNAs are dealt with in a compilation by J. Celis (8). The compilers would welcome any information by the readers regarding missing material or erroneous presentation. On the basis of this numbering system computer printed compilations of tRNA sequences in a linear form and in cloverleaf form are in preparation. PMID:6986608

  3. Calling SNPs without a reference sequence

    Directory of Open Access Journals (Sweden)

    Schuster Stephan C

    2010-03-01

    Full Text Available Abstract Background The most common application for the next-generation sequencing technologies is resequencing, where short reads from the genome of an individual are aligned to a reference genome sequence for the same species. These mappings can then be used to identify genetic differences among individuals in a population, and perhaps ultimately to explain phenotypic variation. Many algorithms capable of aligning short reads to the reference, and determining differences between them have been reported. Much less has been reported on how to use these technologies to determine genetic differences among individuals of a species for which a reference sequence is not available, which drastically limits the number of species that can easily benefit from these new technologies. Results We describe a computational pipeline, called DIAL (De novo Identification of Alleles, for identifying single-base substitutions between two closely related genomes without the help of a reference genome. The method works even when the depth of coverage is insufficient for de novo assembly, and it can be extended to determine small insertions/deletions. We evaluate the software's effectiveness using published Roche/454 sequence data from the genome of Dr. James Watson (to detect heterozygous positions and recent Illumina data from orangutan, in each case comparing our results to those from computational analysis that uses a reference genome assembly. We also illustrate the use of DIAL to identify nucleotide differences among transcriptome sequences. Conclusions DIAL can be used for identification of nucleotide differences in species for which no reference sequence is available. Our main motivation is to use this tool to survey the genetic diversity of endangered species as the identified sequence differences can be used to design genotyping arrays to assist in the species' management. The DIAL source code is freely available at http://www.bx.psu.edu/miller_lab/.

  4. Comparative sequence analysis for Brassica oleracea with similar sequences in B. rapa and Arabidopsis thaliana

    OpenAIRE

    Qiu, Dan; Gao, Muqiang; Li, Genyi; Quiros, Carlos

    2009-01-01

    We sequenced five BAC clones of Brassica oleracea doubled haploid ‘Early Big' broccoli containing major genes in the aliphatic glucosinolate pathway, and comparatively analyzed them with similar sequences in A. thaliana and B. rapa. Additionally, we included in the analysis published sequences from three other B. oleracea BAC clones and a contig of this species corresponding to segments in A. thaliana chromosomes IV and V. A total of 2,946 kb of B. oleracea, 1,069 kb of B. rapa sequence and 2...

  5. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Science.gov (United States)

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid...

  6. Microsatellite Length Scoring by Single Molecule Real Time Sequencing - Effects of Sequence Structure and PCR Regime.

    Science.gov (United States)

    Liljegren, Mikkel Meyn; de Muinck, Eric Jacques; Trosvik, Pål

    2016-01-01

    Microsatellites are DNA sequences consisting of repeated, short (1-6 bp) sequence motifs that are highly mutable by enzymatic slippage during replication. Due to their high intrinsic variability, microsatellites have important applications in population genetics, forensics, genome mapping, as well as cancer diagnostics and prognosis. The current analytical standard for microsatellites is based on length scoring by high precision electrophoresis, but due to increasing efficiency next-generation sequencing techniques may provide a viable alternative. Here, we evaluated single molecule real time (SMRT) sequencing, implemented in the PacBio series of sequencing apparatuses, as a means of microsatellite length scoring. To this end we carried out multiplexed SMRT sequencing of plasmid-carried artificial microsatellites of varying structure under different pre-sequencing PCR regimes. For each repeat structure, reads corresponding to the target length dominated. We found that pre-sequencing amplification had large effects on scoring accuracy and error distribution relative to controls, but that the effects of the number of amplification cycles were generally weak. In line with expectations enzymatic slippage decreased proportionally with microsatellite repeat unit length and increased with repetition number. Finally, we determined directional mutation trends, showing that PCR and SMRT sequencing introduced consistent but opposing error patterns in contraction and expansion of the microsatellites on the repeat motif and single nucleotide level. PMID:27414800

  7. Microsatellite Length Scoring by Single Molecule Real Time Sequencing - Effects of Sequence Structure and PCR Regime.

    Directory of Open Access Journals (Sweden)

    Mikkel Meyn Liljegren

    Full Text Available Microsatellites are DNA sequences consisting of repeated, short (1-6 bp sequence motifs that are highly mutable by enzymatic slippage during replication. Due to their high intrinsic variability, microsatellites have important applications in population genetics, forensics, genome mapping, as well as cancer diagnostics and prognosis. The current analytical standard for microsatellites is based on length scoring by high precision electrophoresis, but due to increasing efficiency next-generation sequencing techniques may provide a viable alternative. Here, we evaluated single molecule real time (SMRT sequencing, implemented in the PacBio series of sequencing apparatuses, as a means of microsatellite length scoring. To this end we carried out multiplexed SMRT sequencing of plasmid-carried artificial microsatellites of varying structure under different pre-sequencing PCR regimes. For each repeat structure, reads corresponding to the target length dominated. We found that pre-sequencing amplification had large effects on scoring accuracy and error distribution relative to controls, but that the effects of the number of amplification cycles were generally weak. In line with expectations enzymatic slippage decreased proportionally with microsatellite repeat unit length and increased with repetition number. Finally, we determined directional mutation trends, showing that PCR and SMRT sequencing introduced consistent but opposing error patterns in contraction and expansion of the microsatellites on the repeat motif and single nucleotide level.

  8. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Science.gov (United States)

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697

  9. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Directory of Open Access Journals (Sweden)

    Chun-Tien Chang

    2012-01-01

    Full Text Available The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs, insertion-deletions (indels, short tandem repeats (STRs, and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR, which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS; (iii determine human papilloma virus (HPV genotypes by searching current viral databases in cases of double infections; (iv estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4 and its paralog HSPDP3.

  10. Mixed sequence reader: a program for analyzing DNA sequences with heterozygous base calling.

    Science.gov (United States)

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3.

  11. Only correlated sequences that are actively processed contribute to implicit sequence learning.

    Science.gov (United States)

    Meier, Beat; Weiermann, Brigitte; Cock, Josephine

    2012-09-01

    The purpose of the study was to investigate how implicit sequence learning is affected by the presence of secondary information that is correlated with the primary sequence but not necessarily relevant to performance. In a previous work, we have shown that correlation plays an important role but other prerequisites may also be involved. In Experiments 1 and 2, using a task sequence learning paradigm, we found that primary sequence learning was not affected by secondary information that was sequenced but irrelevant to performance, even though the two streams of information were correlated. In contrast, in Experiment 3, we found that sensitivity to the main sequence was greater with the provision of extra sequenced information that was relevant to performance in addition to being correlated. This suggests that sequence learning was enhanced through the integration of information. We conclude that information in secondary as well as primary sequences must be actively processed if it is to have a beneficial impact. By actively processed we mean information that is selectively attended and necessary for carrying out the tasks.

  12. Detection of M-Sequences from Spike Sequence in Neuronal Networks

    Directory of Open Access Journals (Sweden)

    Yoshi Nishitani

    2012-01-01

    Full Text Available In circuit theory, it is well known that a linear feedback shift register (LFSR circuit generates pseudorandom bit sequences (PRBS, including an M-sequence with the maximum period of length. In this study, we tried to detect M-sequences known as a pseudorandom sequence generated by the LFSR circuit from time series patterns of stimulated action potentials. Stimulated action potentials were recorded from dissociated cultures of hippocampal neurons grown on a multielectrode array. We could find several M-sequences from a 3-stage LFSR circuit (M3. These results show the possibility of assembling LFSR circuits or its equivalent ones in a neuronal network. However, since the M3 pattern was composed of only four spike intervals, the possibility of an accidental detection was not zero. Then, we detected M-sequences from random spike sequences which were not generated from an LFSR circuit and compare the result with the number of M-sequences from the originally observed raster data. As a result, a significant difference was confirmed: a greater number of “0–1” reversed the 3-stage M-sequences occurred than would have accidentally be detected. This result suggests that some LFSR equivalent circuits are assembled in neuronal networks.

  13. Improvement of SNR with Chaotic Spreading Sequences for CDMA

    CERN Document Server

    Umeno, K; Umeno, Ken; Kitayama, Ken-ichi

    1999-01-01

    We show that chaotic spreading sequences generated by ergodic mappings of Chebyshev orthogonal polynomials have better correlation properties for CDMA(code division multiple access) than the optimal binary sequences (Gold sequences) in the sense of ensemble average.

  14. Some Identities for Generalized Fibonacci and Lucas Sequences

    OpenAIRE

    Irmak, Nurettin; ALP, Murat

    2013-01-01

    In this study, we define a generalization of Lucas sequence{p n }. Thenwe obtain Binet formula of sequence{p n } . Also, we investigate relationships between generalized Fibonacci and Lucas sequences.

  15. End Sequencing and Finger Printing of Human & Mouse BAC Libraries

    Energy Technology Data Exchange (ETDEWEB)

    Fraser, C

    2005-09-27

    This project provided for continued end sequencing of existing and new BAC libraries constructed to support human sequencing as well as to initiate BAC end sequencing from the mouse BAC libraries constructed to support mouse sequencing. The clones, the sequences, and the fingerprints are now an available resource for the community at large. Research and development of new metaodologies for BAC end sequencing have reduced costs and increase throughput.

  16. BLEACHING EUCALYPTUS PULPS WITH SHORT SEQUENCES

    Directory of Open Access Journals (Sweden)

    Flaviana Reis Milagres

    2011-03-01

    Full Text Available Eucalyptus spp kraft pulp, due to its high content of hexenuronic acids, is quite easy to bleach. Therefore, investigations have been made attempting to decrease the number of stages in the bleaching process in order to minimize capital costs. This study focused on the evaluation of short ECF (Elemental Chlorine Free and TCF (Totally Chlorine Free sequences for bleaching oxygen delignified Eucalyptus spp kraft pulp to 90% ISO brightness: PMoDP (Molybdenum catalyzed acid peroxide, chlorine dioxide and hydrogen peroxide, PMoD/P (Molybdenum catalyzed acid peroxide, chlorine dioxide and hydrogen peroxide, without washing PMoD(PO (Molybdenum catalyzed acid peroxide, chlorine dioxide and pressurized peroxide, D(EPODP (chlorine dioxide, extraction oxidative with oxygen and peroxide, chlorine dioxide and hydrogen peroxide, PMoQ(PO (Molybdenum catalyzed acid peroxide, DTPA and pressurized peroxide, and XPMoQ(PO (Enzyme, molybdenum catalyzed acid peroxide, DTPA and pressurized peroxide. Uncommon pulp treatments, such as molybdenum catalyzed acid peroxide (PMo and xylanase (X bleaching stages, were used. Among the ECF alternatives, the two-stage PMoD/P sequence proved highly cost-effective without affecting pulp quality in relation to the traditional D(EPODP sequence and produced better quality effluent in relation to the reference. However, a four stage sequence, XPMoQ(PO, was required to achieve full brightness using the TCF technology. This sequence was highly cost-effective although it only produced pulp of acceptable quality.

  17. Lossless Compression and Complexity of Chaotic Sequences

    CERN Document Server

    Nagaraj, Nithin; T., Arjun Venugopal; Krishnan, Nithin

    2011-01-01

    We investigate the complexity of short symbolic sequences of chaotic dynamical systems by using lossless compression algorithms. In particular, we study Non-Sequential Recursive Pair Substitution (NSRPS), a lossless compression algorithm first proposed by W. Ebeling et al. [Math. Biosc. 52, 1980] and Jim\\'{e}nez-Monta\\~{n}o et al. [arXiv:cond-mat/0204134, 2002]) which was subsequently shown to be optimal. NSPRS has also been used to estimate Entropy of written English (P. Grassberger [arXiv:physics/0207023, 2002]). We propose a new measure of complexity - defined as the number of iterations of NSRPS required to transform the input sequence into a constant sequence. We test this measure on symbolic sequences of the Logistic map for various values of the bifurcation parameter. The proposed measure of complexity is easy to compute and is observed to be highly correlated with the Lyapunov exponent of the original non-linear time series, even for very short symbolic sequences (as short as 50 samples). Finally, we ...

  18. Protein sequence classification using feature hashing.

    Science.gov (United States)

    Caragea, Cornelia; Silvescu, Adrian; Mitra, Prasenjit

    2012-06-21

    Recent advances in next-generation sequencing technologies have resulted in an exponential increase in the rate at which protein sequence data are being acquired. The k-gram feature representation, commonly used for protein sequence classification, usually results in prohibitively high dimensional input spaces, for large values of k. Applying data mining algorithms to these input spaces may be intractable due to the large number of dimensions. Hence, using dimensionality reduction techniques can be crucial for the performance and the complexity of the learning algorithms. In this paper, we study the applicability of feature hashing to protein sequence classification, where the original high-dimensional space is "reduced" by hashing the features into a low-dimensional space, using a hash function, i.e., by mapping features into hash keys, where multiple features can be mapped (at random) to the same hash key, and "aggregating" their counts. We compare feature hashing with the "bag of k-grams" approach. Our results show that feature hashing is an effective approach to reducing dimensionality on protein sequence classification tasks.

  19. The IMGT/HLA sequence database.

    Science.gov (United States)

    Robinson, J; Marsh, S G

    2000-01-01

    The IMGT/HLA database (wwwebi.ac.uk/imgt/hla/) specialises in sequences of the polymorphic genes of the HLA system, the humanmajor histocompatibility complex (MHC). This complex is located within the 6p213 region on the short arm of human chromosome 6 and contains more than 220 genes of diverse function. Many of the genes encode proteins of the immune system and these include the 21 highly polymorphic HLA genes, which influence the outcome of clinical transplantation and confer susceptibility to a wide range of non-infectious diseases. The database contains sequences for all HLA alleles officially recognised by the WHO Nomenclature Committee for Factors of the HLA System and provides users with online tools and facilities for their retrieval and analysis. These include allele reports, alignment tools, and detailed descriptions of the source cells. The online submission tool allows both new and confirmatory sequences to be submitted directly to the WHO Nomenclature Committee. The latest version (release 1.10.0 April 2001) contains 1329 HLA alleles, 61 HLA related sequences, derived from around 3350 component sequences from the EMBL/ GenBank/DDBJ databases. The IMGT/HLA database provides a model that will be extended to provide specialist databases for polymorphic MHC genes of other species. PMID:12361093

  20. SNAD: sequence name annotation-based designer

    Directory of Open Access Journals (Sweden)

    Gorbalenya Alexander E

    2009-08-01

    Full Text Available Abstract Background A growing diversity of biological data is tagged with unique identifiers (UIDs associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Results Here we introduce SNAD (Sequence Name Annotation-based Designer that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. Conclusion A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.

  1. DNA sequence of the yeast transketolase gene.

    Science.gov (United States)

    Fletcher, T S; Kwee, I L; Nakada, T; Largman, C; Martin, B M

    1992-02-18

    Transketolase (EC 2.2.1.1) is the enzyme that, together with aldolase, forms a reversible link between the glycolytic and pentose phosphate pathways. We have cloned and sequenced the transketolase gene from yeast (Saccharomyces cerevisiae). This is the first transketolase gene of the pentose phosphate shunt to be sequenced from any source. The molecular mass of the proposed translated protein is 73,976 daltons, in good agreement with the observed molecular mass of about 75,000 daltons. The 5'-nontranslated region of the gene is similar to other yeast genes. There is no evidence of 5'-splice junctions or branch points in the sequence. The 3'-nontranslated region contains the polyadenylation signal (AATAAA), 80 base pairs downstream from the termination codon. A high degree of homology is found between yeast transketolase and dihydroxyacetone synthase (formaldehyde transketolase) from the yeast Hansenula polymorpha. The overall sequence identity between these two proteins is 37%, with four regions of much greater similarity. The regions from amino acid residues 98-131, 157-182, 410-433, and 474-489 have sequence identities of 74%, 66%, 83%, and 82%, respectively. One of these regions (157-182) includes a possible thiamin pyrophosphate (TPP) binding domain, and another (410-433) may contain the catalytic domain. PMID:1737042

  2. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  3. Viral genome sequencing by random priming methods

    Directory of Open Access Journals (Sweden)

    Zhang Xinsheng

    2008-01-01

    Full Text Available Abstract Background Most emerging health threats are of zoonotic origin. For the overwhelming majority, their causative agents are RNA viruses which include but are not limited to HIV, Influenza, SARS, Ebola, Dengue, and Hantavirus. Of increasing importance therefore is a better understanding of global viral diversity to enable better surveillance and prediction of pandemic threats; this will require rapid and flexible methods for complete viral genome sequencing. Results We have adapted the SISPA methodology 123 to genome sequencing of RNA and DNA viruses. We have demonstrated the utility of the method on various types and sources of viruses, obtaining near complete genome sequence of viruses ranging in size from 3,000–15,000 kb with a median depth of coverage of 14.33. We used this technique to generate full viral genome sequence in the presence of host contaminants, using viral preparations from cell culture supernatant, allantoic fluid and fecal matter. Conclusion The method described is of great utility in generating whole genome assemblies for viruses with little or no available sequence information, viruses from greatly divergent families, previously uncharacterized viruses, or to more fully describe mixed viral infections.

  4. SNP Discovery Using Next Generation Transcriptomic Sequencing.

    Science.gov (United States)

    De Wit, Pierre

    2016-01-01

    In this chapter, I will guide the user through methods to find new SNP markers from expressed sequence (RNA-Seq) data, focusing on the sample preparation and also on the bioinformatic analyses needed to sort through the immense flood of data from high-throughput sequencing machines. The general steps included are as follows: sample preparation, sequencing, quality control of data, assembly, mapping, SNP discovery, filtering, validation. The first few steps are traditional laboratory protocols, whereas steps following the sequencing are of bioinformatic nature. The bioinformatics described herein are by no means exhaustive, rather they serve as one example of a simple way of analyzing high-throughput sequence data to find SNP markers. Ideally, one would like to run through this protocol several times with a new dataset, while varying software parameters slightly, in order to determine the robustness of the results. The final validation step, although not described in much detail here, is also quite critical as that will be the final test of the accuracy of the assumptions made in silico.There is a plethora of downstream applications of a SNP dataset, not covered in this chapter. For an example of a more thorough protocol also including differential gene expression and functional enrichment analyses, BLAST annotation and downstream applications of SNP markers, a good starting point could be the "Simple Fool's Guide to population genomics via RNA-Seq," which is available at http://sfg.stanford.edu . PMID:27460371

  5. Probabilistic Sequence Learning in Mild Cognitive Impairment

    Directory of Open Access Journals (Sweden)

    Dezso eNemeth

    2013-07-01

    Full Text Available Mild Cognitive Impairment (MCI causes slight but noticeable disruption in cognitive systems, primarily executive and memory functions. However, it is not clear if the development of sequence learning is affected by an impaired cognitive system and, if so, how. The goal of our study was to investigate the development of probabilistic sequence learning, from the initial acquisition to consolidation, in MCI and healthy elderly control groups. We used the Alternating Serial Reaction Time task (ASRT to measure probabilistic sequence learning. Individuals with MCI showed weaker learning performance than the healthy elderly group. However, using the reaction times only from the second half of each learning block – after the reactivation phase - we found intact learning in MCI. Based on the assumption that the first part of each learning block is related to reactivation/recall processes, we suggest that these processes are affected in MCI. The 24-hour offline period showed no effect on sequence-specific learning in either group but did on general skill learning: the healthy elderly group showed offline improvement in general reaction times while individuals with MCI did not. Our findings deepen our understanding regarding the underlying mechanisms and time course of sequence acquisition and consolidation.

  6. Directed forgetting benefits motor sequence encoding.

    Science.gov (United States)

    Tempel, Tobias; Frings, Christian

    2016-04-01

    Two experiments investigated directed forgetting of newly learned motor sequences. Concurrently with the list method of directed forgetting, participants successively learned two lists of motor sequences. Each sequence consisted of four consecutive finger movements. After a short distractor task, a recall test was given. Both experiments compared a forget group that was instructed to forget list-1 items with a remember group not receiving a forget instruction. We found that the instruction to forget list 1 enhanced recall of subsequently learned motor sequences. This benefit of directed forgetting occurred independently of costs for list 1. A mediation analysis showed that the encoding accuracy of list 2 was a mediator of the recall benefit, that is, the more accurate execution of motor sequences of list 2 after receiving a forget instruction for list 1 accounted for better recall of list 2. Thus, the adaptation of the list method to motor action provided more direct evidence on the effect of directed forgetting on subsequent learning. The results corroborate the assumption of a reset of encoding as a consequence of directed forgetting. PMID:26471189

  7. DNA sequencing by synthesis with degenerate primers

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    The degenerate primer-based sequencing Was developed by a synthesis method(DP-SBS)for high-throughput DNA sequencing,in which a set of degenerate primers are hybridized on the arrayed DNA templates and extended by DNA polymerase on microarrays.In this method,adifferent set of degenerate primers containing a give nnumber(n)of degenerate nucleotides at the 3'-ends were annealed to the sequenced templates that were immobilized on the solid surface.The nucleotides(n+1)on the template sequences were determined by detecting the incorporation of fluorescent labeled nucleotides.The fluorescent labeled nucleotide was incorporated into the primer in a base-specific manner after the enzymatic primer extension reactions and nine-base length were read out accurately.The main advanmge of the DP-SBS is that the method only uses very conventional biochemical reagents and avoids the complicated special chemical reagents for removing the labeled nucleotides and reactivating the primer for further extension.From the present study,it is found that the DP-SBS method is reliable,simple,and cost-effective for laboratory-sequencing a large amount of short DNA fragments.

  8. Sequence analysis by iterated maps, a review.

    Science.gov (United States)

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  9. Generation of physical map contig-specific sequences useful for whole genome sequence scaffolding.

    Directory of Open Access Journals (Sweden)

    Yanliang Jiang

    Full Text Available Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge.

  10. Ion Torren Semiconductor Sequencing Allows Rapid, Low Cost Sequencing of the Human Exome ( 7th Annual SFAF Meeting, 2012)

    Energy Technology Data Exchange (ETDEWEB)

    Jenkins, David [EdgeBio

    2012-06-01

    David Jenkins on "Ion Torrent semiconductor sequencing allows rapid, low-cost sequencing of the human exome" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  11. Auditory sequence analysis and phonological skill.

    Science.gov (United States)

    Grube, Manon; Kumar, Sukhbinder; Cooper, Freya E; Turton, Stuart; Griffiths, Timothy D

    2012-11-01

    This work tests the relationship between auditory and phonological skill in a non-selected cohort of 238 school students (age 11) with the specific hypothesis that sound-sequence analysis would be more relevant to phonological skill than the analysis of basic, single sounds. Auditory processing was assessed across the domains of pitch, time and timbre; a combination of six standard tests of literacy and language ability was used to assess phonological skill. A significant correlation between general auditory and phonological skill was demonstrated, plus a significant, specific correlation between measures of phonological skill and the auditory analysis of short sequences in pitch and time. The data support a limited but significant link between auditory and phonological ability with a specific role for sound-sequence analysis, and provide a possible new focus for auditory training strategies to aid language development in early adolescence. PMID:22951739

  12. TRAP Sequence - An Interesting Entity in Twins

    Directory of Open Access Journals (Sweden)

    R H Srinivas Prasad

    2012-01-01

    Full Text Available Twin reversed arterial perfusion (TRAP sequence, is a rare malformation occurring in monozygotic multiple gestations. One well-developed normal (pump twin and the other twin with absent cardiac structure (acardiac, who is hemodynamically dependent on the normal (pump twin are characteristic of this syndrome. The acardiac twin develops multiple anomalies that make survival difficult. The prognosis of the pump twin is variable with mortality rate ranging from 50% to 70%. Complications that affect the prognosis of the pump twin include complications of congestive cardiac failure due to increased cardiac demand, prematurity secondary to preterm delivery, and polyhydramnios. Because of these complications prompt detection, follow-up, and treatment of this condition is very important. We report two cases of TRAP sequence that emphasizes the importance of gray-scale and color Doppler imaging in diagnosis, detection of poor prognostic features, follow-up, and management of TRAP sequence.

  13. A glass menagerie of low complexity sequences.

    Science.gov (United States)

    Halfmann, Randal

    2016-06-01

    Remarkably simple proteins play outsize roles in the execution of developmental complexity within biological systems. Sequence information determines structure and hence function, so how do low complexity sequences fulfill their functions? Recent discoveries are raising the curtain on a new dimension of the sequence-structure paradigm. In it, function derives not from the structures of individual proteins, but instead, from dynamic material properties of entire ensembles of the proteins acting in unison through phase changes. These phases include liquids, one-dimensional crystals, and - as elaborated herein - even glasses. The peculiar thermodynamics of glass-like protein assemblies, in particular, illuminate new principles of information flow through and, at times, orthogonal to the central dogma of molecular biology. PMID:27258703

  14. Constructing circuit codes by permuting initial sequences

    CERN Document Server

    Wynn, Ed

    2012-01-01

    Two new constructions are presented for coils and snakes in the hypercube. Improvements are made on the best known results for snake-in-the-box coils of dimensions 9, 10 and 11, and for some other circuit codes of dimensions between 8 and 13. In the first construction, circuit codes are generated from permuted copies of an initial transition sequence; the multiple copies constrain the search, so that long codes can be found relatively efficiently. In the second construction, two lower-dimensional paths are joined together with only one or two changes in the highest dimension; this requires a search for a permutation of the second sequence to fit around the first. It is possible to investigate sequences of vertices of the hypercube, including circuit codes, by connecting the corresponding vertices in an extended graph related to the hypercube. As an example of this, invertible circuit codes are briefly discussed.

  15. Bias of purine stretches in sequenced chromosomes

    DEFF Research Database (Denmark)

    Ussery, David; Soumpasis, Dikeos Mario; Brunak, Søren;

    2002-01-01

    We examined more than 700 DNA sequences (full length chromosomes and plasmids) for stretches of purines (R) or pyrimidines (Y) and alternating YR stretches; such regions will likely adopt structures which are different from the canonical B-form. Since one turn of the DNA helix is roughly 10 bp, we...... measured the fraction of each genome which contains purine (or pyrimidine) tracts of lengths of 10 by or longer (hereafter referred to as 'purine tracts'), as well as stretches of alternating pyrimidines/purine ('pyr/pur tracts') of the same length. Using this criteria, a random sequence would be expected......, in eukaryotes there is an abundance of long stretches of purines or alternating purine/pyrimidine tracts, which cannot be explained in this way; these sequences are likely to play an important role in eukaryotic chromosome organisation....

  16. Degree sequences of k-multi-hypertournaments

    Institute of Scientific and Technical Information of China (English)

    Pirzada S

    2009-01-01

    Let n and k (n ≥ k > 1) be two non-negative integers. A k-multi-hypertournament on n vertices is a pair (V, A), where V is a set of vertices with |V|= n, and A is a set of k-tuples of vertices, called arcs, such that for any k-subset S of V, A contains at least one (at most k!) of the k! k-tuples whose entries belong to S. The necessary and sufficient conditions for a non-decreasing sequence of non-negative integers to be the out-degree sequence (in-degree sequence) of some k-multi-hypertournament are given.

  17. Finding Sequential Patterns from Large Sequence Data

    CERN Document Server

    Esmaeili, Mahdi

    2010-01-01

    Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Sequential pattern mining finds sets of data items that occur together frequently in some sequences. Sequential pattern mining, which extracts frequent subsequences from a sequence database, has attracted a great deal of interest during the recent data mining research because it is the basis of many applications, such as: web user analysis, stock trend prediction, DNA sequence analysis, finding language or linguistic patterns from natural language texts, and using the history of symptoms to predict certain kind of disease. The diversity of the applications may not be possible to apply a single sequential pattern model to all these problems. Each application may require a unique model and solution. A number of research projects were established in recent years to develop meaningful sequential pattern...

  18. Transforming clinical microbiology with bacterial genome sequencing

    Science.gov (United States)

    2016-01-01

    Whole genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here we review the current status of clinical microbiology and how it has already begun to be transformed by the use of next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. The application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow. PMID:22868263

  19. Inferring interaction partners from protein sequences

    CERN Document Server

    Bitbol, Anne-Florence; Colwell, Lucy J; Wingreen, Ned S

    2016-01-01

    Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multi-protein complexes, and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners. Hence, the sequences of interacting partners are correlated. Here we exploit these correlations to accurately identify which proteins are specific interaction partners from sequence data alone. Our general approach, which employs a pairwise maximum entropy model to infer direct couplings between residues, has been successfully used to predict the three-dimensional structures of proteins from sequences. Building on this approach, we introduce an iterative algorithm to predict specific interaction partners from among the members of two protein families. We assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. The algorithm proves successful without any a pri...

  20. Developmental sequence of Cambrian embryo Markuelia

    Institute of Scientific and Technical Information of China (English)

    DONG XiPing

    2007-01-01

    Based on more exquisitely preserved specimens of Markuelia hunanensis recently recovered from Middle and Upper Cambrian in western Hunan and in the light of Synchrotron radiation X-ray tomographic microscopy, the developmental sequence from cleavage through organogenesis to the pre-hatching of Cambrian embryo Markuelia, especially the developmental sequence during the pre-hatching stage, i.e. from the earliest period when the scalids and tail spines only took shape to the latest period (just about hatching), is established. This developmental sequence provides a pattern of embryonic development during the pre-hatching stage, which has not been established in the living scalidophorans (priapulids, Ioriciferans and kinorhynchs). Thus, it not only enriches our knowledge on the embryonic development of the extant descendants of Markuelia, but also opens a new window to the evolution and development of the animal.

  1. Physical approaches to DNA sequencing and detection

    CERN Document Server

    Zwolak, Michael

    2007-01-01

    With the continued improvement of sequencing technologies, the prospect of genome-based medicine is now at the forefront of scientific research. To realize this potential, however, we need a revolutionary sequencing method for the cost-effective and rapid interrogation of individual genomes. This capability is likely to be provided by a physical approach to probing DNA at the single nucleotide level. This is in sharp contrast to current techniques and instruments which probe, through chemical elongation, electrophoresis, and optical detection, length differences and terminating bases of strands of DNA. In this Colloquium we review several physical approaches to DNA detection that have the potential to deliver fast and low-cost sequencing. Center-fold to these approaches is the concept of nanochannels or nanopores which allow for the spatial confinement of DNA molecules. In addition to their possible impact in medicine and biology, the methods offer ideal test beds to study open scientific issues and challenge...

  2. Nager syndrome and Pierre Robin sequence.

    Science.gov (United States)

    Rosa, Rafael Fabiano Machado; Guimarães, Victória Bernardes; Beltrão, Luciana Amorim; Trombetta, Júlia Santana; Lliguin, Karen Lizeth Puma; de Mattos, Vinicius Freitas; Zen, Paulo Ricardo Gazzola

    2015-04-01

    Nager syndrome is considered a rare genetic syndrome characterized by craniofacial and radial anomalies. Pierre Robin sequence is a triad that includes micrognathia, cleft palate and glossoptosis. The present patient had typical findings of Nager syndrome and Pierre Robin sequence. He progressed to severe respiratory distress, requiring mechanical ventilation and tracheostomy. At 1 year and 11 months, he had episodes of cardiorespiratory arrest and died. In the literature review, we identified the clinical description of 44 patients with Nager syndrome. Among them, 93.1% had micrognathia, 38.6% cleft palate and 11.3% glossoptosis. Only one (2.3%) had all three features, as observed in the present patient. Therefore, despite the fact that the features of Pierre Robin sequence are common, there are few patients who have the complete triad. It is noteworthy, however, that they may be associated with respiratory distress, which may put the patient's life at risk.

  3. LARGE DEVIATIONS FOR SOME DEPENDENT SEQUENCES

    Institute of Scientific and Technical Information of China (English)

    Hu Shuhe; Wang Xuejun

    2008-01-01

    Let (Xi) be a martingale difference sequence and Sn=n∑i=1 Xi. Suppose (Xi) is bounded in Lp. In the case p2, Lesigne and Volny (Stochastic Process. Appl. 96 (2001) 143) obtained the estimation μ(Sn n) < cn-p/2, Yulin Li (Statist. Probab.Lett. 62 (2003) 317) generalized the result to the case when p ∈(1, 2) and obtained μ(Sn n) < cn1-P, these are optimal in a certain sense. In this article, the authors study the large deviation of Sn for some dependent sequences and obtain the same order optimal upper bounds for μ(Sn n) as those for martingale difference sequence.

  4. Transforming clinical microbiology with bacterial genome sequencing.

    Science.gov (United States)

    Didelot, Xavier; Bowden, Rory; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W

    2012-09-01

    Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.

  5. Microbial species delineation using whole genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos; Mukherjee, Supratim; Ivanova, Natalia; Mavrommatics, Kostas; Pati, Amrita; Konstantinidis, Konstantinos

    2014-10-20

    Species assignments in prokaryotes use a manual, poly-phasic approach utilizing both phenotypic traits and sequence information of phylogenetic marker genes. With thousands of genomes being sequenced every year, an automated, uniform and scalable approach exploiting the rich genomic information in whole genome sequences is desired, at least for the initial assignment of species to an organism. We have evaluated pairwise genome-wide Average Nucleotide Identity (gANI) values and alignment fractions (AFs) for nearly 13,000 genomes using our fast implementation of the computation, identifying robust and widely applicable hard cut-offs for species assignments based on AF and gANI. Using these cutoffs, we generated stable species-level clusters of organisms, which enabled the identification of several species mis-assignments and facilitated the assignment of species for organisms without species definitions.

  6. DNA extraction columns contaminated with murine sequences.

    Directory of Open Access Journals (Sweden)

    Otto Erlwein

    Full Text Available Sequences of the novel gammaretrovirus, xenotropic murine leukemia virus-related virus (XMRV have been described in human prostate cancer tissue, although the amounts of DNA are low. Furthermore, XMRV sequences and polytropic (p murine leukemia viruses (MLVs have been reported in patients with chronic fatigue syndrome (CFS. In assessing the prevalence of XMRV in prostate cancer tissue samples we discovered that eluates from naïve DNA purification columns, when subjected to PCR with primers designed to detect genomic mouse DNA contamination, occasionally gave rise to amplification products. Further PCR analysis, using primers to detect XMRV, revealed sequences derived from XMRV and pMLVs from mouse and human DNA and DNA of unspecified origin. Thus, DNA purification columns can present problems when used to detect minute amounts of DNA targets by highly sensitive amplification techniques.

  7. Dynamics and Control of DNA Sequence Amplification

    CERN Document Server

    Marimuthu, Karthikeyan

    2014-01-01

    DNA amplification is the process of replication of a specified DNA sequence \\emph{in vitro} through time-dependent manipulation of its external environment. A theoretical framework for determination of the optimal dynamic operating conditions of DNA amplification reactions, for any specified amplification objective, is presented based on first-principles biophysical modeling and control theory. Amplification of DNA is formulated as a problem in control theory with optimal solutions that can differ considerably from strategies typically used in practice. Using the Polymerase Chain Reaction (PCR) as an example, sequence-dependent biophysical models for DNA amplification are cast as control systems, wherein the dynamics of the reaction are controlled by a manipulated input variable. Using these control systems, we demonstrate that there exists an optimal temperature cycling strategy for geometric amplification of any DNA sequence and formulate optimal control problems that can be used to derive the optimal tempe...

  8. A statistical study of aftershock sequences

    Directory of Open Access Journals (Sweden)

    Giorgio Ranalli

    2010-02-01

    Full Text Available A comprehensive statistical study of the phenomenology of aftershock sequences is made in this paper. The spatial distribution of aftershocks indicates that they are mainly crustal events; however, deeper sequences also take place. The analysis of the distribution of aftershocks in 15 sequences with respect to time and magnitude leads to the statistical confirmation of a set of phenomenological laws describing the process, namely, the time-frequency law of hyperbolic decay of aftershock activity with time, the magnitude stability law, and the exponential magnitude- frequency distribution. The hypotheses involved are checked. The grouping of data and the statistical methods employed are chosen according to some basic well·confirmed assumptions regarding the nature of the process.

  9. DNA sequencing by nanopores: advances and challenges

    Science.gov (United States)

    Agah, Shaghayegh; Zheng, Ming; Pasquali, Matteo; Kolomeisky, Anatoly B.

    2016-10-01

    Developing inexpensive and simple DNA sequencing methods capable of detecting entire genomes in short periods of time could revolutionize the world of medicine and technology. It will also lead to major advances in our understanding of fundamental biological processes. It has been shown that nanopores have the ability of single-molecule sensing of various biological molecules rapidly and at a low cost. This has stimulated significant experimental efforts in developing DNA sequencing techniques by utilizing biological and artificial nanopores. In this review, we discuss recent progress in the nanopore sequencing field with a focus on the nature of nanopores and on sensing mechanisms during the translocation. Current challenges and alternative methods are also discussed.

  10. Spectral sequences in smooth generalized cohomology

    CERN Document Server

    Grady, Daniel

    2016-01-01

    We consider spectral sequences in smooth generalized cohomology theories, including differential generalized cohomology theories. The main differential spectral sequences will be of the Atiyah-Hirzebruch (AHSS) type, where we provide a filtration by the Cech resolution of smooth manifolds. This allows for systematic study of torsion in differential cohomology. We apply this in detail to smooth Deligne cohomology, differential topological complex K-theory, and to a smooth extension of integral Morava K-theory that we introduce. In each case we explicitly identify the differentials in the corresponding spectral sequences, which exhibit an interesting and systematic interplay between (refinement of) classical cohomology operations, operations involving differential forms, and operations on cohomology with U(1) coefficients.

  11. Dynamic programming algorithms for biological sequence comparison.

    Science.gov (United States)

    Pearson, W R; Miller, W

    1992-01-01

    Efficient dynamic programming algorithms are available for a broad class of protein and DNA sequence comparison problems. These algorithms require computer time proportional to the product of the lengths of the two sequences being compared [O(N2)] but require memory space proportional only to the sum of these lengths [O(N)]. Although the requirement for O(N2) time limits use of the algorithms to the largest computers when searching protein and DNA sequence databases, many other applications of these algorithms, such as calculation of distances for evolutionary trees and comparison of a new sequence to a library of sequence profiles, are well within the capabilities of desktop computers. In particular, the results of library searches with rapid searching programs, such as FASTA or BLAST, should be confirmed by performing a rigorous optimal alignment. Whereas rapid methods do not overlook significant sequence similarities, FASTA limits the number of gaps that can be inserted into an alignment, so that a rigorous alignment may extend the alignment substantially in some cases. BLAST does not allow gaps in the local regions that it reports; a calculation that allows gaps is very likely to extend the alignment substantially. Although a Monte Carlo evaluation of the statistical significance of a similarity score with a rigorous algorithm is much slower than the heuristic approach used by the RDF2 program, the dynamic programming approach should take less than 1 hr on a 386-based PC or desktop Unix workstation. For descriptive purposes, we have limited our discussion to methods for calculating similarity scores and distances that use gap penalties of the form g = rk. Nevertheless, programs for the more general case (g = q+rk) are readily available. Versions of these programs that run either on Unix workstations, IBM-PC class computers, or the Macintosh can be obtained from either of the authors.

  12. Cladistic analysis of anuran POMC sequences.

    Science.gov (United States)

    Alrubaian, Jasem; Danielson, Phillip; Walker, David; Dores, Robert M

    2002-03-01

    Procedures for performing cladistic analyses can provide powerful tools for understanding the evolution of neuropeptide and polypeptide hormone coding genes. These analyses can be done on either amino acid data sets or nucleotide data sets and can utilize several different algorithms that are dependent on distinct sets of operating assumptions and constraints. In some cases, the results of these analyses can be used to gauge phylogenetic relationships between taxa. Selecting the proper cladistic analysis strategy is dependent on the taxonomic level of analysis and the rate of evolution within the orthologous genes being evaluated. For example, previous studies have shown that the amino acid sequence of proopiomelanocortin (POMC), the common precursor for the melanocortins and beta-endorphin, can be used to resolve phylogenetic relationships at the class and order level. This study tested the hypothesis that POMC sequences could be used to resolve phylogenetic relationships at the family taxonomic level. Cladistic analyses were performed on amphibian POMC sequences characterized from the marine toad, Bufo marinus (family Bufonidae; this study), the spadefoot toad, Spea multiplicatus (family Pelobatidae), the African clawed frog, Xenopus laevis (family Pipidae) and the laughing frog, Rana ridibunda (family Ranidae). In these analyses the sequence of Australian lungfish POMC was used as the outgroup. The analyses were done at the amino acid level using the maximum parsimony algorithm and at the nucleotide level using the maximum likelihood algorithm. For the anuran POMC genes, analysis at the nucleotide level using the maximum likelihood algorithm generated a cladogram with higher bootstrap values than the maximum parsimony analysis of the POMC amino acid data set. For anuran POMC sequences, analysis of nucleotide sequences using the maximum likelihood algorithm would appear to be the preferred strategy for resolving phylogenetic relationships at the family taxonomic

  13. Image correlation method for DNA sequence alignment.

    Directory of Open Access Journals (Sweden)

    Millaray Curilem Saldías

    Full Text Available The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs and 100 scenes represented by 100 x 100 images each (in total, one million base pair database were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%, specificity (98.99% and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment.

  14. Indexing for Large DNA Database Sequences

    Directory of Open Access Journals (Sweden)

    S. M. Wohoush & M.H. Saheb

    2011-10-01

    Full Text Available Bioinformatics data consists of a huge amount of information due to the large number ofsequences, the very high sequences lengths and the daily new additions. This data need to beefficiently accessed for many needs. What makes one DNA data item distinct from another is itsDNA sequence. DNA sequence consists of a combination of four characters which are A, C, G, Tand have different lengths. Use a suitable representation of DNA sequences, and a suitable indexstructure to hold this representation at main memory will lead to have efficient processing byaccessing the DNA sequences through indexing, and will reduce number of disk I/O accesses.I/O operations needed at the end, to avoid false hits, we reduce the number of candidate DNAsequences that need to be checked by pruning, so no need to search the whole database. Weneed to have a suitable index for searching DNA sequences efficiently, with suitable index sizeand searching time. The suitable selection of relation fields, where index is build upon has a bigeffect on index size and search time. Our experiments use the n-gram wavelet transformationupon one field and multi-fields index structure under the relational DBMS environment. Resultsshow the need to consider index size and search time while using indexing carefully. Increasingwindow size decreases the amount of I/O reference. The use of a single field and multiple fieldsindexing is highly affected by window size value. Increasing window size value lead to bettersearching time with special type index using single filed indexing. While the search time is almostgood and the same with most index types when using multiple field indexing. Storage spaceneeded for RDMS indexing types are almost the same or greater than the actual data.

  15. Complete Genome Sequence of Streptococcus agalactiae CNCTC 10/84, a Hypervirulent Sequence Type 26 Strain

    OpenAIRE

    Hooven, Thomas A.; Randis, Tara M.; Sean C Daugherty; Narechania, Apurva; Planet, Paul J.; Tettelin, Hervé; Ratner, Adam J.

    2014-01-01

    Streptococcus agalactiae (group B Streptococcus [GBS]) is a human pathogen with a propensity to cause neonatal infections. We report the complete genome sequence of GBS strain CNCTC 10/84, a hypervirulent clinical isolate frequently used to study GBS pathogenesis. Comparative analysis of this sequence may shed light on novel pathogenic mechanisms.

  16. BAC-pool 454-sequencing: A rapid and efficient approach to sequence complex tetraploid cotton genomes

    Science.gov (United States)

    New and emerging next generation sequencing technologies have been promising in reducing sequencing costs, but not significantly for complex polyploid plant genomes such as cotton. Large and highly repetitive genome of G. hirsutum (~2.5GB) is less amenable and cost-intensive with traditional BAC-by...

  17. Finding the most significant common sequence and structure motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, L.J.; Stormo, G.D.

    1997-01-01

    We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints, In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections, The first part utilizes a simplified...

  18. Sequence, Program Words, and Sequence Rationale for the 1971 Revised First-Grade Spelling Program.

    Science.gov (United States)

    Berdiansky, Betty

    Pupils participating in the 1971-1972 tryout of the Southwest Regional Laboratory (SWRL) First Grade Spelling Program were taught to combine consonants and consonant clusters with word elements to form program words. This paper presents the sequence of instruction for these elements and the rationale used in deriving this sequence. In addition, it…

  19. Cloning, sequencing, and sequence analysis of two novel plasmids from the thermophilic anaerobic bacterium Anaerocellum thermophilum

    DEFF Research Database (Denmark)

    Clausen, Anders; Mikkelsen, Marie Just; Schrøder, I.;

    2004-01-01

    The nucleotide sequence of two novel plasmids isolated from the extreme thermophilic anaerobic bacterium Anaerocellum thermophilum DSM6725 (A. thermophilum), growing optimally at 70degreesC, has been determined. pBAS2 was found to be a 3653 bp plasmid with a GC content of 43%, and the sequence...

  20. Draft Genome Sequence of a Klebsiella pneumoniae Strain (New Sequence Type 2357) Carrying Tn3926

    Science.gov (United States)

    Mi, Zu-huang; Wang, Chun-xin; Zhu, Jian-ming

    2016-01-01

    We present the draft genome sequence of a Klebsiella pneumoniae carbapenemase–producing sequence type 2357 (ST2357) strain, NB60, which contains drug-resistant genes encoding resistance to beta-lactams, fluoroquinolones, aminoglycosides, trimethoprim-sulfamethoxazole, colistin, macrolides, and tetracycline. Strain NB60 was isolated from human blood, making it an important tool for studying K. pneumoniae pathogenesis. PMID:27660779

  1. A formula on linear complexity of highest coordinate sequences from maximal periodic sequences over Galois rings

    Institute of Scientific and Technical Information of China (English)

    HU Lei; SUN Nigang

    2006-01-01

    Using a polynomial expression of the highest coordinate map, we deduce an exact formula on the linear complexity of the highest coordinate sequence derived from a maximal periodic sequence over an arbitrary Galois ring of characteristic p2 , where p is a prime. This generalizes the known result of Udaya and Siddiqi for the case that the Galois ring is Z4.

  2. Sequence embedding for fast construction of guide trees for multiple sequence alignment

    LENUS (Irish Health Repository)

    Blackshields, Gordon

    2010-05-14

    Abstract Background The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http:\\/\\/www.clustal.org\\/mbed.tgz.

  3. From Sequence to Morphology - Long-Range Correlations in Complete Sequenced Genomes

    NARCIS (Netherlands)

    T.A. Knoch (Tobias)

    2004-01-01

    textabstractThe largely unresolved sequential organization, i.e. the relations within DNA sequences, and its connection to the three-dimensional organization of genomes was investigated by correlation analyses of completely sequenced chromosomes from Viroids, Archaea, Bacteria, Arabidopsis thali

  4. Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Miri eMichaeli

    2012-12-01

    Full Text Available High throughput sequencing (HTS yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner, a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier, a program for identifying legitimate and artifact insertions and/or deletions (indels. Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.

  5. Draft Genome Sequence of a Klebsiella pneumoniae Strain (New Sequence Type 2357) Carrying Tn3926.

    Science.gov (United States)

    Weng, Xing-Bei; Mi, Zu-Huang; Wang, Chun-Xin; Zhu, Jian-Ming

    2016-01-01

    We present the draft genome sequence of a Klebsiella pneumoniae carbapenemase-producing sequence type 2357 (ST2357) strain, NB60, which contains drug-resistant genes encoding resistance to beta-lactams, fluoroquinolones, aminoglycosides, trimethoprim-sulfamethoxazole, colistin, macrolides, and tetracycline. Strain NB60 was isolated from human blood, making it an important tool for studying K. pneumoniae pathogenesis. PMID:27660779

  6. Complete genome sequence of arracacha mottle virus.

    Science.gov (United States)

    Orílio, Anelise F; Lucinda, Natalia; Dusi, André N; Nagata, Tatsuya; Inoue-Nagata, Alice K

    2013-01-01

    Arracacha mottle virus (AMoV) is the only potyvirus reported to infect arracacha (Arracacia xanthorrhiza) in Brazil. Here, the complete genome sequence of an isolate of AMoV was determined to be 9,630 nucleotides in length, excluding the 3' poly-A tail, and encoding a polyprotein of 3,135 amino acids and a putative P3N-PIPO protein. Its genomic organization is typical of a member of the genus Potyvirus, containing all conserved motifs. Its full genome sequence shared 56.2 % nucleotide identity with sunflower chlorotic mottle virus and verbena virus Y, the most closely related viruses.

  7. Comparative Amino Acid Sequences of Dengue Viruses

    OpenAIRE

    Haishi, Shozo; TANAKA Mariko; Igarashi, Akira

    1990-01-01

    Amino acid (AA) sequences of 4 serotype of dengue viruses deduced from their nucleotide (nt) sequences of genomic RNA were analyzed for each genome segment and each stretch of 10 AA residues. Precursor of membrane protein (pM), and 4 nonstructural proteins (NS1, NS3, NS4B, NS5) were highly conserved, while another nonstructural protein (NS2A) was least conserved among 5 strains of dengue viruses. When homology was compared among heterotypic viruses, type 1 and type 3 dengue viruses showed clo...

  8. Next-Generation Sequencing in Intellectual Disability.

    Science.gov (United States)

    Carvill, Gemma L; Mefford, Heather C

    2015-09-01

    Next-generation sequencing technologies have revolutionized gene discovery in patients with intellectual disability (ID) and led to an unprecedented expansion in the number of genes implicated in this disorder. We discuss the strategies that have been used to identify these novel genes for both syndromic and nonsyndromic ID and highlight the phenotypic and genetic heterogeneity that underpin this condition. Finally, we discuss the future of defining the genetic etiology of ID, including the role of whole-genome sequencing, mosaicism, and the importance of diagnostic testing in ID. PMID:27617123

  9. Sequence variability of Campylobacter temperate bacteriophages

    Directory of Open Access Journals (Sweden)

    Ng Lai-King

    2008-03-01

    Full Text Available Abstract Background Prophages integrated within the chromosomes of Campylobacter jejuni isolates have been demonstrated very recently. Prior work with Campylobacter temperate bacteriophages, as well as evidence from prophages in other enteric bacteria, suggests these prophages might have a role in the biology and virulence of the organism. However, very little is known about the genetic variability of Campylobacter prophages which, if present, could lead to differential phenotypes in isolates carrying the phages versus those that do not. As a first step in the characterization of C. jejuni prophages, we investigated the distribution of prophage DNA within a C. jejuni population assessed the DNA and protein sequence variability within a subset of the putative prophages found. Results Southern blotting of C. jejuni DNA using probes from genes within the three putative prophages of the C. jejuni sequenced strain RM 1221 demonstrated the presence of at least one prophage gene in a large proportion (27/35 of isolates tested. Of these, 15 were positive for 5 or more of the 7 Campylobacter Mu-like phage 1 (CMLP 1, also designated Campylobacter jejuni integrated element 1, or CJIE 1 genes tested. Twelve of these putative prophages were chosen for further analysis. DNA sequencing of a 9,000 to 11,000 nucleotide region of each prophage demonstrated a close homology with CMLP 1 in both gene order and nucleotide sequence. Structural and sequence variability, including short insertions, deletions, and allele replacements, were found within the prophage genomes, some of which would alter the protein products of the ORFs involved. No insertions of novel genes were detected within the sequenced regions. The 12 prophages and RM 1221 had a % G+C very similar to C. jejuni sequenced strains, as well as promoter regions characteristic of C. jejuni. None of the putative prophages were successfully induced and propagated, so it is not known if they were functional or

  10. Digital image sequence processing, compression, and analysis

    CERN Document Server

    Reed, Todd R

    2004-01-01

    IntroductionTodd R. ReedCONTENT-BASED IMAGE SEQUENCE REPRESENTATIONPedro M. Q. Aguiar, Radu S. Jasinschi, José M. F. Moura, andCharnchai PluempitiwiriyawejTHE COMPUTATION OF MOTIONChristoph Stiller, Sören Kammel, Jan Horn, and Thao DangMOTION ANALYSIS AND DISPLACEMENT ESTIMATION IN THE FREQUENCY DOMAINLuca Lucchese and Guido Maria CortelazzoQUALITY OF SERVICE ASSESSMENT IN NEW GENERATION WIRELESS VIDEO COMMUNICATIONSGaetano GiuntaERROR CONCEALMENT IN DIGITAL VIDEOFrancesco G.B. De NataleIMAGE SEQUENCE RESTORATION: A WIDER PERSPECTIVEAnil KokaramVIDEO SUMMARIZATIONCuneyt M. Taskiran and Edward

  11. Bilateral maculopathy associated with Pierre Robin sequence.

    Science.gov (United States)

    Witmer, Matthew T; Vasan, Ryan; Levy, Richard; Davis, Jessica; Chan, R V Paul

    2012-08-01

    Pierre Robin sequence has been associated with a number of ocular complications, including myopia, strabismus, Möbius syndrome, nasolacrimal duct obstruction, glaucoma, cataract, microphthalmos, coloboma of choroid, and retinal detachment. We report a 10-day-old boy who presented with micrognathia, glossoptosis, and cleft palate as well as multiple congenital anomalies. Ophthalmic examination was notable for bilateral maculopathy, with focal areas of retinal and retinal pigment epithelial atrophy. The association of Pierre Robin sequence and maculopathy has been reported only twice previously.

  12. ASTRAL, a hyperspectral imaging DNA sequencer

    Science.gov (United States)

    O'Brien, Kevin M.; Wren, Jonathan; Davé, Varshal K.; Bai, Diane; Anderson, Richard D.; Rayner, Simon; Evans, Glen A.; Dabiri, Ali E.; Garner, Harold R.

    1998-05-01

    We are developing a prototype automatic DNA sequencer which utilizes polyacrylamide slab gels imaged through a novel optical detection system. The design of this prototype sequencer allows the ability to perform direct optical coupling over the entire read area of the gel and hyperspectrographic separation and detection of the fluorescence emission. The machine has no moving parts. All the major components incorporated in this prototype are all currently available "off the shelf," thus reducing equipment development time and decreasing costs. Software developed for data acquisition, analysis, and conversion to other standard formats facilitates compatibility.

  13. The computational linguistics of biological sequences

    Energy Technology Data Exchange (ETDEWEB)

    Searls, D. [Univ. of Pennsylvania, Philadelphia, PA (United States)

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Protein sequences are analogous in many respects, particularly their folding behavior. Proteins have a much richer variety of interactions, but in theory the same linguistic principles could come to bear in describing dependencies between distant residues that arise by virtue of three-dimensional structure. This tutorial will concentrate on nucleic acid sequences.

  14. Variations on strongly lacunary quasi Cauchy sequences

    Science.gov (United States)

    Kaplan, Huseyin; Cakalli, Huseyin

    2016-08-01

    We introduce a new function space, namely the space of Nθ (p)-ward continuous functions, which turns out to be a closed subspace of the space of continuous functions for each positive integer p. Nθα(p ) -ward continuity is also introduced and investigated for any fixed 0 kr-1, kr], and θ = (kr) is a lacunary sequence, i.e. an increasing sequence of positive integers such that k0 ≠ 0, and hr: kr-kr-1 →∞.

  15. Linear recurring sequences over rings and modules

    Energy Technology Data Exchange (ETDEWEB)

    Kurakin, V.L.; Kuzmin, A.S.; Mikhalev, A.V. [and others

    1995-10-15

    Here we present some fundamental concept and results of the theory of linear recurring sequences over rings and modules and their applications. Of course, the authors give in more detail those results that are close to their mathematical interests. In particular, an attempt has been made to construct a general algebraic theory of k-LRS over modules, paying explicit attention to periodic k-sequences, to properties of linear recurrences over finite rings and especially over Galois rings, and also to methods of constructing codes baed on such recurrences.

  16. Sequence tagged microsatellite profiling (STMP): improved isolation of DNA sequence flanking target SSRs.

    Science.gov (United States)

    Hayden, M J; Good, G; Sharp, P J

    2002-12-01

    Sequence tagged microsatellite profiling (STMP) enables the rapid development of large numbers of co-dominant DNA markers, known as sequence tagged microsatellites (STMs). Each STM is amplified by PCR using a single primer specific to the conserved DNA sequence flanking the microsatellite repeat in combination with a universal primer that anchors to the 5'-ends of the microsatellites. It is also possible to convert STMs into conventional microsatellite, or simple sequence repeat (SSR), markers that are amplified using a pair of primers flanking the repeat sequence. Here, we describe a modification of the STMP procedure to significantly improve the capacity to convert STMs into conventional SSRs and, therefore, facilitate the development of highly specific DNA markers for purposes such as marker-assisted breeding. The usefulness of this technique was demonstrated in bread wheat. PMID:12466561

  17. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

    Science.gov (United States)

    Martin, Andrew C R

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.

  18. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

    Science.gov (United States)

    Martin, Andrew C R

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/. PMID:25653836

  19. Sequence Length Limits for Controlling False Positives in Discovering Nucleotide Sequence Motifs

    Institute of Scientific and Technical Information of China (English)

    CHEN Lei; QiAN Zi-liang

    2008-01-01

    In the study of motif discovery, especially the transcription factor DNA binding sites discovery, a too long input sequence would return non-informative motifs rather than those biological functional motifs. This paper gave theoretical analyses and computational experiments to suggest the length limits of the input sequence. When the sequence length exceeds a certain critical point, the probability of discovering the motif decreases sharply. The work not only gave an explanation on the unsatisfying results of the existed motif discovery problems that the input sequence length might be too long and exceed the point, but also provided an estimation of input sequence length we should accept to get more meaningful and reliable results in motif discovery.

  20. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    Directory of Open Access Journals (Sweden)

    Wadim L. Matochko

    2013-01-01

    Full Text Available Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N×1 frequency vector n=ni, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N×N matrix and a stochastic sampling operator (Sa. The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq. Sequencing without any bias and errors is Seq=Sa IN, where IN is a N×N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN, which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  1. Draft genome sequence of Bacillus endophyticus 2102.

    Science.gov (United States)

    Lee, Yong-Jik; Lee, Sang-Jae; Kim, Sun Hong; Lee, Sang Jun; Kim, Byoung-Chan; Lee, Han-Seung; Jeong, Haeyoung; Lee, Dong-Woo

    2012-10-01

    Bacillus endophyticus 2102 is an endospore-forming, plant growth-promoting rhizobacterium isolated from a hypersaline pond in South Korea. Here we present the draft sequence of B. endophyticus 2102, which is of interest because of its potential use in the industrial production of algaecides and bioplastics and for the treatment of industrial textile effluents. PMID:23012284

  2. Automated Testing with Targeted Event Sequence Generation

    DEFF Research Database (Denmark)

    Jensen, Casper Svenning; Prasad, Mukul R.; Møller, Anders

    2013-01-01

    of the individual event handlers of the application. The second phase builds event sequences backward from the target, using the summaries together with a UI model of the application. Our experiments on a collection of open source Android applications show that this technique can successfully produce event...

  3. TRANSPORTATION INEQUALITIES FOR WEAKLY DEPENDENT SEQUENCES

    Institute of Scientific and Technical Information of China (English)

    Ma Yutao

    2011-01-01

    In[3],they gave necessary and sufficient condition for T1C and then as applications T1C for weakly dependent sequences was established.In this note,based on Gozlan-Léonard characterization for W1H-inequalities,we extends this result to W1Hinequalities.

  4. Crop Sequence Economics in Dynamic Cropping Systems

    Science.gov (United States)

    No-till production systems allow more intensified and diversified production in the northern Great Plains; however, this has increased the need for information on improving economic returns through crop sequence selection. Field research was conducted 6 km southwest of Mandan ND to determine the inf...

  5. Artificial Intelligence Controls Tape-Recording Sequence

    Science.gov (United States)

    Schwuttke, Ursula M.; Otamura, Roy M.; Zottarelli, Lawrence J.

    1989-01-01

    Developmental expert-system computer program intended to schedule recording of large amounts of data on limited amount of magnetic tape. Schedules recording using two sets of rules. First set incorporates knowledge of locations for recording of new data. Second set incorporates knowledge about issuing commands to recorder. Designed primarily for use on Voyager Spacecraft, also applicable to planning and sequencing in industry.

  6. Symmetric and Transitive Registration of Image Sequences

    Directory of Open Access Journals (Sweden)

    Oskar Škrinjar

    2008-01-01

    Full Text Available This paper presents a method for constructing symmetric and transitive algorithms for registration of image sequences from image registration algorithms that do not have these two properties. The method is applicable to both rigid and nonrigid registration and it can be used with linear or periodic image sequences. The symmetry and transitivity properties are satisfied exactly (up to the machine precision, that is, they always hold regardless of the image type, quality, and the registration algorithm as long as the computed transformations are invertable. These two properties are especially important in motion tracking applications since physically incorrect deformations might be obtained if the registration algorithm is not symmetric and transitive. The method was tested on two sequences of cardiac magnetic resonance images using two different nonrigid image registration algorithms. It was demonstrated that the transitivity and symmetry errors of the symmetric and transitive modification of the algorithms could be made arbitrary small when the computed transformations are invertable, whereas the corresponding errors for the nonmodified algorithms were on the order of the pixel size. Furthermore, the symmetric and transitive modification of the algorithms had higher registration accuracy than the nonmodified algorithms for both image sequences.

  7. 32 CFR 179.7 - Sequencing.

    Science.gov (United States)

    2010-07-01

    ... influence or change the MRS priority, but may influence the sequencing for action. Examples of factors that... and social factors. (3) Economic factors, including economic considerations pertaining to... (e.g., community members of an installation's restoration advisory board or technical...

  8. Time sequence photography of Roosters Comb

    Science.gov (United States)

    The importance of understanding natural landscape changes is key in properly determining rangeland ecology. Time sequence photography allows a landscape snapshot to be documented and enables the ability to compare natural changes overtime. Photographs of Roosters Comb were taken from the same vantag...

  9. Microfossils in the productive sequence of Azerbaijan

    Science.gov (United States)

    Afandiyeva, M. A.

    2013-11-01

    Microfossils from the Productive Complex of Azerbaijan are analyzed. Despite the presence of diverse fossils in this sequence that include reworked foraminifers and nannoplankton, autochthonous foraminiferal taxa, and endemic ostracods, none of their assemblages characterizing particular stratigraphic units are as yet defined. This work is aimed at making up for a deficiency in this knowledge.

  10. Mechanisms of protein sequence divergence and incompatibility.

    Directory of Open Access Journals (Sweden)

    Alon Wellner

    Full Text Available Alignments of orthologous protein sequences convey a complex picture. Some positions are utterly conserved whilst others have diverged to variable degrees. Amongst the latter, many are non-exchangeable between extant sequences. How do functionally critical and highly conserved residues diverge? Why and how did these exchanges become incompatible within contemporary sequences? Our model is phosphoglycerate kinase (PGK, where lysine 219 is an essential active-site residue completely conserved throughout Eukaryota and Bacteria, and serine is found only in archaeal PGKs. Contemporary sequences tested exhibited complete loss of function upon exchanges at 219. However, a directed evolution experiment revealed that two mutations were sufficient for human PGK to become functional with serine at position 219. These two mutations made position 219 permissive not only for serine and lysine, but also to a range of other amino acids seen in archaeal PGKs. The identified trajectories that enabled exchanges at 219 show marked sign epistasis - a relatively small loss of function with respect to one amino acid (lysine versus a large gain with another (serine, and other amino acids. Our findings support the view that, as theoretically described, the trajectories underlining the divergence of critical positions are dominated by sign epistatic interactions. Such trajectories are an outcome of rare mutational combinations. Nonetheless, as suggested by the laboratory enabled K219S exchange, given enough time and variability in selection levels, even utterly conserved and functionally essential residues may change.

  11. Galaxy LIMS for next-generation sequencing

    NARCIS (Netherlands)

    Scholtalbers, J.; Rossler, J.; Sorn, P.; Graaf, J. de; Boisguerin, V.; Castle, J.; Sahin, U.

    2013-01-01

    SUMMARY: We have developed a laboratory information management system (LIMS) for a next-generation sequencing (NGS) laboratory within the existing Galaxy platform. The system provides lab technicians standard and customizable sample information forms, barcoded submission forms, tracking of input sam

  12. Deciphering the RNA landscape by RNAome sequencing

    NARCIS (Netherlands)

    K.W.J. Derks (Kasper); B. Misovic (Branislav); M.C.G.N. van den hout (Mirjam); C. Kockx (Christel); C.P. Gomez (Cesar Payan); R.W.W. Brouwer; H. Vrieling (Harry); J.H.J. Hoeijmakers (Jan); W.F.J. van IJcken (Wilfred); J. Pothof (Joris)

    2015-01-01

    textabstractCurrent RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species in an unperturbed manner. We report strand-specific RNAome sequencing that determines expression of small and large RNAs from rRNA-depleted total RNA in a sin

  13. Sublinear Time Motif Discovery from Multiple Sequences

    Directory of Open Access Journals (Sweden)

    Yunhui Fu

    2013-10-01

    Full Text Available In this paper, a natural probabilistic model for motif discovery has been used to experimentally test the quality of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet, Σ. A motif G = g1g2 ... gm is a string of m characters. In each background sequence is implanted a probabilistically-generated approximate copy of G. For a probabilistically-generated approximate copy b1b2 ... bm of G, every character, bi, is probabilistically generated, such that the probability for bi ≠ gi is at most α. We develop two new randomized algorithms and one new deterministic algorithm. They make advancements in the following aspects: (1 The algorithms are much faster than those before. Our algorithms can even run in sublinear time. (2 They can handle any motif pattern. (3 The restriction for the alphabet size is a lower bound of four. This gives them potential applications in practical problems, since gene sequences have an alphabet size of four. (4 All algorithms have rigorous proofs about their performances. The methods developed in this paper have been used in the software implementation. We observed some encouraging results that show improved performance for motif detection compared with other software.

  14. Output-Sensitive Pattern Extraction in Sequences

    DEFF Research Database (Denmark)

    Grossi, Roberto; Menconi, Giulia; Pisanti, Nadia;

    2014-01-01

    Genomic Analysis, Plagiarism Detection, Data Mining, Intrusion Detection, Spam Fighting and Time Series Analysis are just some examples of applications where extraction of recurring patterns in sequences of objects is one of the main computational challenges. Several notions of patterns exist...

  15. Flexible, Mastery-Oriented Astrophysics Sequence.

    Science.gov (United States)

    Zeilik, Michael, II

    1981-01-01

    Describes the implementation and impact of a two-semester mastery-oriented astrophysics sequence for upper-level physics/astrophysics majors designed to handle flexibly a wide range of student backgrounds. A Personalized System of Instruction (PSI) format was used fostering frequent student-instructor interaction and role-modeling behavior in…

  16. What Makes a Protein Sequence a Prion?

    Science.gov (United States)

    Sabate, Raimon; Rousseau, Frederic; Schymkowitz, Joost; Ventura, Salvador

    2015-01-01

    Typical amyloid diseases such as Alzheimer's and Parkinson's were thought to exclusively result from de novo aggregation, but recently it was shown that amyloids formed in one cell can cross-seed aggregation in other cells, following a prion-like mechanism. Despite the large experimental effort devoted to understanding the phenomenon of prion transmissibility, it is still poorly understood how this property is encoded in the primary sequence. In many cases, prion structural conversion is driven by the presence of relatively large glutamine/asparagine (Q/N) enriched segments. Several studies suggest that it is the amino acid composition of these regions rather than their specific sequence that accounts for their priogenicity. However, our analysis indicates that it is instead the presence and potency of specific short amyloid-prone sequences that occur within intrinsically disordered Q/N-rich regions that determine their prion behaviour, modulated by the structural and compositional context. This provides a basis for the accurate identification and evaluation of prion candidate sequences in proteomes in the context of a unified framework for amyloid formation and prion propagation. PMID:25569335

  17. Hidden ribozymes in eukaryotic genome sequence

    OpenAIRE

    Sean P Ryder

    2010-01-01

    The small self-cleaving ribozymes fold into complex tertiary structures to promote autocatalytic cleavage or ligation at a precise position within their sequence. Until recently, relatively few examples had been identified. Two papers now reveal that self-cleaving ribozymes are prevalent in eukaryotic genomes and, in some cases, might play a role in regulating gene expression.

  18. The Ultramafites and layered Gabbro Sequences

    NARCIS (Netherlands)

    Oosterom, M.G.

    1963-01-01

    On Stjernöy, Seiland and the neighbouring peninsulas of Öksfjord and Bergsfjord ultramafic bodies of peridotite and pyroxenite with associated layered gabbro sequences occur within a complex of highly metamorphic gabbro gneisses, rocks akin to pyroxene-granulites and mafic charnockites. As is shown

  19. Sorghum genome sequencing by methylation filtration.

    Science.gov (United States)

    Bedell, Joseph A; Budiman, Muhammad A; Nunberg, Andrew; Citek, Robert W; Robbins, Dan; Jones, Joshua; Flick, Elizabeth; Rholfing, Theresa; Fries, Jason; Bradford, Kourtney; McMenamy, Jennifer; Smith, Michael; Holeman, Heather; Roe, Bruce A; Wiley, Graham; Korf, Ian F; Rabinowicz, Pablo D; Lakey, Nathan; McCombie, W Richard; Jeddeloh, Jeffrey A; Martienssen, Robert A

    2005-01-01

    Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis. PMID:15660154

  20. Gamma and Related Functions Generalized for Sequences

    Science.gov (United States)

    Ollerton, R. L.

    2008-01-01

    Given a sequence g[subscript k] greater than 0, the "g-factorial" product [big product][superscript k] [subscript i=1] g[subscript i] is extended from integer k to real x by generalizing properties of the gamma function [Gamma](x). The Euler-Mascheroni constant [gamma] and the beta and zeta functions are also generalized. Specific examples include…

  1. Linearly recursive sequences and Dynkin diagrams

    CERN Document Server

    Reutenauer, Christophe

    2012-01-01

    Motivated by a construction in the theory of cluster algebras (Fomin and Zelevinsky), one associates to each acyclic directed graph a family of sequences of natural integers, one for each vertex; this construction is called a {\\em frieze}; these sequences are given by nonlinear recursions (with division), and the fact that they are integers is a consequence of the Laurent phenomenon of Fomin and Zelevinsky. If the sequences satisfy a linear recursion with constant coefficients, then the graph must be a Dynkin diagram or an extended Dynkin diagram, with an acyclic orientation. The converse also holds: the sequences of the frieze associated to an oriented Dynkin or Euclidean diagram satisfy linear recursions, and are even $\\mathbb N$-rational. One uses in the proof objects called $SL_2$-{\\em tilings of the plane}, which are fillings of the discrete plane such that each adjacent 2 by 2 minor is equal to 1. These objects, which have applications in the theory of cluster algebras, are interesting for themselves. S...

  2. Generalized Mignotte Sequences and Software Watermarking

    Directory of Open Access Journals (Sweden)

    Hedley Morris

    2009-12-01

    Full Text Available Dynamic Graph Watermarking (DGW is vulnerable to attack. We propose the use of a multiple secret sharing scheme to improve the robustness of DGW. We propose using Mignotte Sequences as the way to create the watermark shares needed by the method. The new scheme is suitable for inclusion in the SandMark test program. All necessary algorithms are identified.

  3. Recalling visual serial order for verbal sequences

    NARCIS (Netherlands)

    Logie, R.H.; Saito, S.; Morita, A.; Varma, S.; Norris, D.

    2016-01-01

    We report three experiments in which participants performed written serial recall of visually presented verbal sequences with items varying in visual similarity. In Experiments 1 and 2 native speakers of Japanese recalled visually presented Japanese Kanji characters. In Experiment 3, native speakers

  4. Zero-Till Crop Sequence Economics

    Science.gov (United States)

    No-till production systems allow more intensified and diversified production in the northern Great Plains; however, this has increased the need for information on improving economic returns through crop sequence selection. Field research was conducted near Mandan ND to determine the influences of pr...

  5. On VC-density over indiscernible sequences

    OpenAIRE

    Guingona, Vincent; Hill, Cameron Donnay

    2011-01-01

    In this paper, we study VC-density over indiscernible sequences (denoted VC_ind-density). We answer an open question in [1], showing that VC_ind-density is always integer valued. We also show that VC_ind-density and dp-rank coincide in the natural way.

  6. Applications of High Throughput Nucleotide Sequencing

    DEFF Research Database (Denmark)

    Waage, Johannes Eichler

    -sequencing, a study of the effects on alternative RNA splicing of KO of the nonsense mediated RNA decay system in Mus, using digital gene expression and a custom-built exon-exon junction mapping pipeline is presented (article I). Evolved from this work, a Bioconductor package, spliceR, for classifying alternative...

  7. Getting started in mapping-by-sequencing

    Institute of Scientific and Technical Information of China (English)

    Héctor Candela; Rubén Casanova-Sáez; José Luis Micol

    2015-01-01

    Next-generation sequencing (NGS) technologies al ow the cost-effective sequencing of whole genomes and have expanded the scope of genomics to novel applications, such as the genome-wide characterization of intraspecific polymorphisms and the rapid mapping and identification of point mutations. Next-generation sequencing platforms, such as the Il umina HiSeq2000 platform, are now commercial y available at affordable prices and routinely produce an enormous amount of sequence data, but their wide use is often hindered by a lack of knowledge on how to manipulate and process the information produced. In this review, we focus on the strategies that are available to geneticists who wish to incorporate these novel approaches into their research but who are not familiar with the necessary bioinformatic concepts and computational tools. In particular, we comprehensively summarize case studies where the use of NGS technologies has led to the identification of point mutations, a strategy that has been dubbed“mapping-by-sequencing”, and review examples from plants and other model species such as Caenorhabditis elegans, Saccharomyces cerevisiae, and Drosophila mela-nogaster. As these technologies are becoming cheaper and more powerful, their use is also expanding to al ow mutation identification in species with larger genomes, such as many crop plants.

  8. Using "Arabidopsis" Genetic Sequences to Teach Bioinformatics

    Science.gov (United States)

    Zhang, Xiaorong

    2009-01-01

    This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR RLK) genetic…

  9. ZERO-TILL CROP SEQUENCE ECONOMICS

    Science.gov (United States)

    No-till production systems allow more intensified and diversified production in the northern Great Plains; however, this has increased the need for information on improving economic returns through crop sequence selection. Field research was conducted near Mandan ND to determine the influences of pr...

  10. The Pizza Problem: A Solution with Sequences

    Science.gov (United States)

    Shafer, Kathryn G.; Mast, Caleb J.

    2008-01-01

    This article addresses the issues of coaching and assessing. A preservice middle school teacher's unique solution to the Pizza problem was not what the professor expected. The student's solution strategy, based on sequences and a reinvention of Pascal's triangle, is explained in detail. (Contains 8 figures.)

  11. Gnathostome phylogenomics utilizing lungfish EST sequences.

    Science.gov (United States)

    Hallström, Björn M; Janke, Axel

    2009-02-01

    The relationship between the Chondrichthyes (cartilaginous fishes), the Actinopterygii (ray-finned fishes), and the piscine Sarcopterygii (lobe-finned fishes) and how the Tetrapoda (four-limbed terrestrial vertebrates) are related to these has been a contentious issue for more than a century. A general consensus about the relationship of these vertebrate clades has gradually emerged among morphologists, but no molecular study has yet provided conclusive evidence for any specific hypothesis. In order to examine these relationships on the basis of more extensive sequence data, we have produced almost 1,000,000 bp of expressed sequence tags (ESTs) from the African marbled lungfish, Protopterus aethiopicus. This new data set yielded 771 transcribed nuclear sequences that had not been previously described. The lungfish EST sequences were combined with EST data from two cartilaginous fishes and whole genome data from an agnathan, four ray-finned fishes, and four tetrapods. Phylogenomic analysis of these data yielded, for the first time, significant maximum likelihood support for a traditional gnathostome tree with a split between the Chondrichthyes and remaining (bone) gnathostomes. Also, the sister group relationship between Dipnoi (lungfishes) and Tetrapoda received conclusive support. Previously proposed hypotheses, such as the monophyly of fishes, could be rejected significantly. The divergence time between lungfishes and tetrapods was estimated to 382-388 Ma by the current data set and six calibration points. PMID:19029191

  12. Mitochondrial DNA sequence evolution in shorebird populations.

    NARCIS (Netherlands)

    Wenink, P.W.

    1994-01-01

    This thesis describes the global molecular population structure of two shorebird species, in particular of the dunlin, Calidris alpina, by means of comparative sequence analysis of the most variable part of the mitochondrial DNA (mtDNA) genome. There are several reasons why mtDNA is the molecule of

  13. Draft genome sequence of Virgibacillus halodenitrificans 1806.

    Science.gov (United States)

    Lee, Sang-Jae; Lee, Yong-Jik; Jeong, Haeyoung; Lee, Sang Jun; Lee, Han-Seung; Pan, Jae-Gu; Kim, Byoung-Chan; Lee, Dong-Woo

    2012-11-01

    Virgibacillus halodenitrificans 1806 is an endospore-forming halophilic bacterium isolated from salterns in Korea. Here, we report the draft genome sequence of V. halodenitrificans 1806, which may reveal the molecular basis of osmoadaptation and insights into carbon and anaerobic metabolism in moderate halophiles. PMID:23105070

  14. Draft Genome Sequence of Virgibacillus halodenitrificans 1806

    OpenAIRE

    Lee, Sang-Jae; Lee, Yong-Jik; Jeong, Haeyoung; Lee, Sang Jun; Lee, Han-Seung; Pan, Jae-Gu; Kim, Byoung-Chan; Lee, Dong-Woo

    2012-01-01

    Virgibacillus halodenitrificans 1806 is an endospore-forming halophilic bacterium isolated from salterns in Korea. Here, we report the draft genome sequence of V. halodenitrificans 1806, which may reveal the molecular basis of osmoadaptation and insights into carbon and anaerobic metabolism in moderate halophiles.

  15. Sorghum genome sequencing by methylation filtration.

    Science.gov (United States)

    Bedell, Joseph A; Budiman, Muhammad A; Nunberg, Andrew; Citek, Robert W; Robbins, Dan; Jones, Joshua; Flick, Elizabeth; Rholfing, Theresa; Fries, Jason; Bradford, Kourtney; McMenamy, Jennifer; Smith, Michael; Holeman, Heather; Roe, Bruce A; Wiley, Graham; Korf, Ian F; Rabinowicz, Pablo D; Lakey, Nathan; McCombie, W Richard; Jeddeloh, Jeffrey A; Martienssen, Robert A

    2005-01-01

    Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis.

  16. Sorghum genome sequencing by methylation filtration.

    Directory of Open Access Journals (Sweden)

    Joseph A Bedell

    2005-01-01

    Full Text Available Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis.

  17. Sequence Semantics for Dynamic Predicate Logic

    NARCIS (Netherlands)

    Vermeulen, C.F.M.

    2008-01-01

    In this paper a semantics for dynamic predicate logic is developed that uses sequence valued assignments. This semantics is compared with the usual relational semantics for dynamic predicate logic: it is shown that the most important intuitions of the usual semantics are preserved. Then it is shown

  18. Whole genome sequences of four Brucella strains.

    Science.gov (United States)

    Ding, Jiabo; Pan, Yuanlong; Jiang, Hai; Cheng, Junsheng; Liu, Taotao; Qin, Nan; Yang, Yi; Cui, Buyun; Chen, Chen; Liu, Cuihua; Mao, Kairong; Zhu, Baoli

    2011-07-01

    Brucella melitensis and Brucella suis are intracellular pathogens of livestock and humans. Here we report four genome sequences, those of the virulent strain B. melitensis M28-12 and vaccine strains B. melitensis M5 and M111 and B. suis S2, which show different virulences and pathogenicities, which will help to design a more effective brucellosis vaccine. PMID:21602346

  19. Monitoring Error Rates In Illumina Sequencing

    Science.gov (United States)

    Manley, Leigh J.; Ma, Duanduan; Levine, Stuart S.

    2016-01-01

    Guaranteeing high-quality next-generation sequencing data in a rapidly changing environment is an ongoing challenge. The introduction of the Illumina NextSeq 500 and the depreciation of specific metrics from Illumina's Sequencing Analysis Viewer (SAV; Illumina, San Diego, CA, USA) have made it more difficult to determine directly the baseline error rate of sequencing runs. To improve our ability to measure base quality, we have created an open-source tool to construct the Percent Perfect Reads (PPR) plot, previously provided by the Illumina sequencers. The PPR program is compatible with HiSeq 2000/2500, MiSeq, and NextSeq 500 instruments and provides an alternative to Illumina's quality value (Q) scores for determining run quality. Whereas Q scores are representative of run quality, they are often overestimated and are sourced from different look-up tables for each platform. The PPR’s unique capabilities as a cross-instrument comparison device, as a troubleshooting tool, and as a tool for monitoring instrument performance can provide an increase in clarity over SAV metrics that is often crucial for maintaining instrument health. These capabilities are highlighted. PMID:27672352

  20. Properties of aftershock sequences in southern California

    Science.gov (United States)

    Kisslinger, Carl; Jones, Lucile M.

    1991-07-01

    The temporal behavior of 39 aftershock sequences in southern California, 1933-1988, was modeled by the modified Omori relation. Minimum magnitudes for completeness of each sequence catalog were determined, and the maximum likelihood estimates of the parameters K, p, and c, with the standard errors on each, were determined by the Ogata algorithm. The b value of each sequence was also calculated. Many of the active faults in the region, both strike slip and thrust, were sampled. The p values were graded in terms of the size of the standard error relative to the p value itself. Most of the sequences were modeled well by the Omori relation. Many of the sequences had p values close to the mean of the whole data set, 1.11±0.25, but values significantly different from the mean, as low as 0.7 and as high as 1.8, exist. No correlation of p with either the b value of the sequence or the mainshock magnitude was found. The results suggest a direct correlation of p values is with surface heat flow, with high values in the Salton Trough (high heat flow) and one low value in the San Bernardino Mountains and on the edge of the Ventura Basin (both low heat flow). The large fraction of the sequences with p values near the mean are at locations where the heat flow is near the regional mean, 74 mW/m2. If the hypothesis that aftershock decay rate is controlled by temperature at depth is valid, the effects of other factors such as heterogeneity of the fault zone properties are superimposed on the background rate determined by temperature. Surface heat flow is taken as an indicator of crustal temperature at hypocentral depths, but the effects on heat flow of convective heat transport and variations in near-surface thermal conductivity invalidate any simple association of local variations in heat flow with details of the subsurface temperature distribution. The interpretation is that higher temperatures in the aftershock source volume caused shortened stress relaxation times in the fault

  1. Evaluation measures of multiple sequence alignments.

    Science.gov (United States)

    Gonnet, G H; Korostensky, C; Benner, S

    2000-01-01

    Multiple sequence alignments (MSAs) are frequently used in the study of families of protein sequences or DNA/RNA sequences. They are a fundamental tool for the understanding of the structure, functionality and, ultimately, the evolution of proteins. A new algorithm, the Circular Sum (CS) method, is presented for formally evaluating the quality of an MSA. It is based on the use of a solution to the Traveling Salesman Problem, which identifies a circular tour through an evolutionary tree connecting the sequences in a protein family. With this approach, the calculation of an evolutionary tree and the errors that it would introduce can be avoided altogether. The algorithm gives an upper bound, the best score that can possibly be achieved by any MSA for a given set of protein sequences. Alternatively, if presented with a specific MSA, the algorithm provides a formal score for the MSA, which serves as an absolute measure of the quality of the MSA. The CS measure yields a direct connection between an MSA and the associated evolutionary tree. The measure can be used as a tool for evaluating different methods for producing MSAs. A brief example of the last application is provided. Because it weights all evolutionary events on a tree identically, but does not require the reconstruction of a tree, the CS algorithm has advantages over the frequently used sum-of-pairs measures for scoring MSAs, which weight some evolutionary events more strongly than others. Compared to other weighted sum-of-pairs measures, it has the advantage that no evolutionary tree must be constructed, because we can find a circular tour without knowing the tree.

  2. BlockLogo: Visualization of peptide and sequence motif conservation

    DEFF Research Database (Denmark)

    Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian;

    2013-01-01

    BlockLogo is a web-server application for the visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, se...

  3. Quantitative biostratigraphy and species evolutionary se-quence

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Introduction of species evolutionary sequence into the quantitative biostratigraphy is a significant work, either for studying biologic evolution or for making stratigraphic correlation and reconstructing geologic history. The quantitative biostratigraphy is to determine biostratigraphic event sequences by using probabilistic analysis. The evolutionary sequence systematics can efficiently ascertain species evolutionary sequences. Two methods have been proposed to determine the sequence of species-disappearance events: (1) species extinction events can be closed by last occurrence events using quantitative biostratigraphic analysis; (2) the duration of a species may be approximately replaced by the duration of its parent species. To combine these two methods for determining the sequence of species disappearance is the best way up to now. A consulting standard sequence that consists of the speciation sequence of Permian waagenophylloid corals and the biostratigraphic event sequence of other important fossils in Permian is used as an example. The group spearman rank-correlation test is used to test the consulting standard sequence by comparing four types of calculations and two kinds of sequences and to find abnormal events. Based on the found abnormal events in the test, the consulting standard sequence is revised to deal with different conditions. Sequences of speciation and species-disappearance, and species duration are determined. Application of species evolutionary sequence to quantitative biostratigraphy can largely improve the quality of biostratigraphic event sequence. In stratigraphic correlation, furthermore, event sequences have higher precision than range biozones.

  4. Applications of next-generation sequencing techniques in plant biology

    Science.gov (United States)

    The last several years have seen revolutionary advances in DNA sequencing technologies with the advent of next generation sequencing (NGS) techniques. NGS methods now allow millions of bases to be sequenced in one round, at a fraction of the cost relative to traditional Sanger sequencing, allowing u...

  5. Strong Limit Theorems for Arbitrary Fuzzy Stochastic Sequences

    Institute of Scientific and Technical Information of China (English)

    FEI Wei-yin

    2008-01-01

    Based on fuzzy random variables, the concept of fuzzy stochastic sequences is defined. Strong limit theorems for fuzzy stochastic sequences are established. Some known results in non-fuzzy stochastic sequences are extended. In order to prove results of this paper, the notion of fuzzy martingale difference sequences is also introduced.

  6. The Polytopic-k-Step Fibonacci Sequences in Finite Groups

    Directory of Open Access Journals (Sweden)

    Ömür Deveci

    2011-01-01

    Full Text Available We study the polytopic-k-step Fibonacci sequences, the polytopic-k-step Fibonacci sequences modulo m, and the polytopic-k-step Fibonacci sequences in finite groups. Also, we examine the periods of the polytopic-k-step Fibonacci sequences in semidihedral group SD2m.

  7. The nucleotide sequences of two leghemoglobin genes from soybean

    DEFF Research Database (Denmark)

    Wiborg, O; Hyldig-Nielsen, J J; Jensen, E O;

    1982-01-01

    We present the complete nucleotide sequences of two leghemoglobin genes isolated from soybean DNA. Both genes contain three intervening sequences in identical positions. Comparison of the coding sequences with known amino-acid sequences of soybean leghemoglobins suggest that the two genes corresp...

  8. Sequencing of simple sequence repeat anchored polymerase chain reaction amplification products of Biomphalaria glabrata

    Directory of Open Access Journals (Sweden)

    Caldeira Roberta L

    2002-01-01

    Full Text Available Simple sequence repeat anchored polymerase chain reaction amplification (SSR-PCR is a genetic typing technique based on primers anchored at the 5' or 3' ends of microsatellites, at high primer annealing temperatures. This technique has already been used in studies of genetic variability of several organisms, using different primer designs. In order to conduct a detailed study of the SSR-PCR genomic targets, we cloned and sequenced 20 unique amplification products of two commonly used primers, CAA(CT6 and (CA8RY, using Biomphalaria glabrata genomic DNA as template. The sequences obtained were novel B. glabrata genomic sequences. It was observed that 15 clones contained microsatellites between priming sites. Out of 40 clones, seven contained complex sequence repetitions. One of the repeats that appeared in six of the amplified fragments generated a single band in Southern analysis, indicating that the sequence was not widespread in the genome. Most of the annealing sites for the CAA(CT6 primer contained only the six repeats found within the primer sequence. In conclusion, SSR-PCR is a useful genotyping technique. However, the premise of the SSR-PCR technique, verified with the CAA(CT6 primer, could not be supported since the amplification products did not result necessarily from microsatellite loci amplification.

  9. Sequence validation for the identification of the white-rot fungi Bjerkandera in public sequence databases.

    Science.gov (United States)

    Jung, Paul Eunil; Fong, Jonathan J; Park, Myung Soo; Oh, Seung-Yoon; Kim, Changmu; Lim, Young Woon

    2014-10-01

    White-rot fungi of the genus Bjerkandera are cosmopolitan and have shown potential for industrial application and bioremediation. When distinguishing morphological characters are no longer present (e.g., cultures or dried specimen fragments), characterizing true sequences of Bjerkandera is crucial for accurate identification and application of the species. To build a framework for molecular identification of Bjerkandera, we carefully identified specimens of B. adusta and B. fumosa from Korea based on morphological characters, followed by sequencing the internal transcribed spacer region and 28S nuclear ribosomal large subunit. The phylogenetic analysis of Korean Bjerkandera specimens showed clear genetic differentiation between the two species. Using this phylogeny as a framework, we examined the identification accuracy of sequences available in GenBank. Analyses revealed that many Bjerkandera sequences in the database are either misidentified or unidentified. This study provides robust reference sequences for sequence-based identification of Bjerkandera, and further demonstrates the presence and dangers of incorrect sequences in GenBank.

  10. Assessment of whole genome amplification for sequence capture and massively parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Johanna Hasmats

    Full Text Available Exome sequence capture and massively parallel sequencing can be combined to achieve inexpensive and rapid global analyses of the functional sections of the genome. The difficulties of working with relatively small quantities of genetic material, as may be necessary when sharing tumor biopsies between collaborators for instance, can be overcome using whole genome amplification. However, the potential drawbacks of using a whole genome amplification technology based on random primers in combination with sequence capture followed by massively parallel sequencing have not yet been examined in detail, especially in the context of mutation discovery in tumor material. In this work, we compare mutations detected in sequence data for unamplified DNA, whole genome amplified DNA, and RNA originating from the same tumor tissue samples from 16 patients diagnosed with non-small cell lung cancer. The results obtained provide a comprehensive overview of the merits of these techniques for mutation analysis. We evaluated the identified genetic variants, and found that most (74% of them were observed in both the amplified and the unamplified sequence data. Eighty-nine percent of the variations found by WGA were shared with unamplified DNA. We demonstrate a strategy for avoiding allelic bias by including RNA-sequencing information.

  11. Assessment of whole genome amplification for sequence capture and massively parallel sequencing.

    Science.gov (United States)

    Hasmats, Johanna; Gréen, Henrik; Orear, Cedric; Validire, Pierre; Huss, Mikael; Käller, Max; Lundeberg, Joakim

    2014-01-01

    Exome sequence capture and massively parallel sequencing can be combined to achieve inexpensive and rapid global analyses of the functional sections of the genome. The difficulties of working with relatively small quantities of genetic material, as may be necessary when sharing tumor biopsies between collaborators for instance, can be overcome using whole genome amplification. However, the potential drawbacks of using a whole genome amplification technology based on random primers in combination with sequence capture followed by massively parallel sequencing have not yet been examined in detail, especially in the context of mutation discovery in tumor material. In this work, we compare mutations detected in sequence data for unamplified DNA, whole genome amplified DNA, and RNA originating from the same tumor tissue samples from 16 patients diagnosed with non-small cell lung cancer. The results obtained provide a comprehensive overview of the merits of these techniques for mutation analysis. We evaluated the identified genetic variants, and found that most (74%) of them were observed in both the amplified and the unamplified sequence data. Eighty-nine percent of the variations found by WGA were shared with unamplified DNA. We demonstrate a strategy for avoiding allelic bias by including RNA-sequencing information.

  12. Concordance between whole-exome sequencing and clinical Sanger sequencing: implications for patient care.

    Science.gov (United States)

    Hamilton, Alison; Tétreault, Martine; Dyment, David A; Zou, Ruobing; Kernohan, Kristin; Geraghty, Michael T; Hartley, Taila; Boycott, Kym M

    2016-09-01

    The clinical translation of next-generation sequencing has created a paradigm shift in the diagnostic assessment of individuals with suspected rare genetic diseases. Whole-exome sequencing (WES) simultaneously examines the majority of the coding portion of the genome and is rapidly becoming accepted as an efficient alternative to clinical Sanger sequencing for diagnosing genetically heterogeneous disorders. Among reports of the clinical and diagnostic utility of WES, few studies to date have directly compared its concordance to Sanger sequencing, which is considered the clinical "gold standard". We performed a direct comparison of 391 coding and noncoding polymorphisms and variants of unknown significance identified by clinical Sanger sequencing to the WES results of 26 patients. Of the 150 well-covered coding variants identified by Sanger sequencing, 146 (97.3%) were also reported by WES. Nine genes were excluded from the comparison due to consistently low coverage in WES, which might be attributed to the use of older exome capture kits. We performed confirmatory Sanger sequencing of discordant variants; including five variants with discordant bases and four with discordant zygosity. Confirmatory Sanger sequencing supported the original Sanger report for three of the five discordant bases, one was shown to be a false positive supporting the WES data, and one result differed from both the Sanger and WES data. Two of the discordant zygosity results supported Sanger and the other two supported WES data. We report high concordance for well-covered coding variants, supporting the use of WES as a screening tool for heterogeneous disorders, and recommend the use of supplementary Sanger sequencing for poorly-covered genes when the clinical suspicion is high. Importantly, despite remaining difficulties with achieving complete coverage of the whole exome, 10 (38.5%) of the 26 compared patients were diagnosed through WES. PMID:27652278

  13. Path Optimization Algorithm For Network Problems Using Job Sequencing Technique

    Directory of Open Access Journals (Sweden)

    Punit Kumar Singh

    2012-06-01

    Full Text Available The job sequencing technique is used to determine an optimal sequence. It performs a series of jobs by a number of specific orders so that it calculates the optimal cost. In this paper, we propose a novel approach to find an optimal path from source to destination by taking advantage of job sequencing technique. Wehave used n jobs m machine sequencing technique and this is divided into n jobs 2 machine problems. Using Johnson’s sequencing rule, we solved the problem and obtained the (n-1 sub sequences of the route. Using the proposed algorithm, we calculated the optimal sequence, which leads to the shortest path of the network.

  14. Analysis of expressed sequence tags from Plasmodium falciparum.

    Science.gov (United States)

    Chakrabarti, D; Reddy, G R; Dame, J B; Almira, E C; Laipis, P J; Ferl, R J; Yang, T P; Rowe, T C; Schuster, S M

    1994-07-01

    An initiative was undertaken to sequence all genes of the human malaria parasite Plasmodium falciparum in an effort to gain a better understanding at the molecular level of the parasite that inflicts much suffering in the developing world. 550 random complimentary DNA clones were partially sequenced from the intraerythrocytic form of the parasite as one of the approaches to analyze the transcribed sequences of its genome. The sequences, after editing, generated 389 expressed sequence tag sites and over 105 kb of DNA sequences. About 32% of these clones showed significant homology with other genes in the database. These clones represent 340 new Plasmodium falciparum expressed sequence tags.

  15. High order coherent control sequences of fat pulses

    CERN Document Server

    Pasini, S; Uhrig, G S

    2010-01-01

    We analyze the performance of sequences of fat pulses of various lengths and shapes for dynamic decoupling and we compare it with that of sequences of ideal, instantaneous pulses. The use of second order, shaped pulses represents a significant improvement. Non-equidistant sequences characterized by pulse durations scaled proportional to the duration T of the sequence strikingly outperform the sequences with pulses of constant length for small T. Interestingly, for longer durations sequences of pulses of substantial length are found to suppress dephasing better than sequences of ideal pulses.

  16. The first determination of DNA sequence of a specific gene.

    Science.gov (United States)

    Inouye, Masayori

    2016-05-10

    How and when the first DNA sequence of a gene was determined? In 1977, F. Sanger came up with an innovative technology to sequence DNA by using chain terminators, and determined the entire DNA sequence of the 5375-base genome of bacteriophage φX 174 (Sanger et al., 1977). While this Sanger's achievement has been recognized as the first DNA sequencing of genes, we had determined DNA sequence of a gene, albeit a partial sequence, 11 years before the Sanger's DNA sequence (Okada et al., 1966).

  17. Sequence determination from overlapping fragments: a simple model of whole-genome shotgun sequencing.

    Science.gov (United States)

    Derrida, Bernard; Fink, Thomas M A

    2002-02-11

    Assembling fragments randomly sampled from along a sequence is the basis of whole-genome shotgun sequencing, a technique used to map the DNA of the human and other genomes. We calculate the probability that a random sequence can be recovered from a collection of overlapping fragments. We provide an exact solution for an infinite alphabet and in the case of constant overlaps. For the general problem we apply two assembly strategies and give the probability that the assembly puzzle can be solved in the limit of infinitely many fragments. PMID:11863859

  18. Sequence Determination from Overlapping Fragments: A Simple Model of Whole-Genome Shotgun Sequencing

    Science.gov (United States)

    Derrida, Bernard; Fink, Thomas M.

    2002-02-01

    Assembling fragments randomly sampled from along a sequence is the basis of whole-genome shotgun sequencing, a technique used to map the DNA of the human and other genomes. We calculate the probability that a random sequence can be recovered from a collection of overlapping fragments. We provide an exact solution for an infinite alphabet and in the case of constant overlaps. For the general problem we apply two assembly strategies and give the probability that the assembly puzzle can be solved in the limit of infinitely many fragments.

  19. Development of expressed sequence tag and expressed sequence tag–simple sequence repeat marker resources for Musa acuminata

    OpenAIRE

    Passos, Marco A. N.; de Oliveira Cruz, Viviane; Emediato, Flavia L; de Camargo Teixeira, Cristiane; Souza, Manoel T; Matsumoto, Takashi; Rennó Azevedo, Vânia C.; Ferreira, Claudia F; Amorim, Edson P; de Alencar Figueiredo, Lucio Flavio; Martins, Natalia F; de Jesus Barbosa Cavalcante, Maria; Baurens, Franc-Christophe; da Silva, Orzenil Bonfim; Pappas, Georgios J

    2012-01-01

    Background and aims Banana (Musa acuminata) is a crop contributing to global food security. Many varieties lack resistance to biotic stresses, due to sterility and narrow genetic background. The objective of this study was to develop an expressed sequence tag (EST) database of transcripts expressed during compatible and incompatible banana–Mycosphaerella fijiensis (Mf) interactions. Black leaf streak disease (BLSD), caused by Mf, is a destructive disease of banana. Microsatellite markers were...

  20. Impact of Next Generation Sequencing Techniques in Food Microbiology

    OpenAIRE

    Mayo, Baltasar; Rachid, Caio T. C. C.; Alegría, Ángel; Leite, Analy M. O.; Raquel S Peixoto; Delgado, Susana

    2014-01-01

    Understanding the Maxam-Gilbert and Sanger sequencing as the first generation, in recent years there has been an explosion of newly-developed sequencing strategies, which are usually referred to as next generation sequencing (NGS) techniques. NGS techniques have high-throughputs and produce thousands or even millions of sequences at the same time. These sequences allow for the accurate identification of microbial taxa, including uncultivable organisms and those present in small numbers. In sp...