WorldWideScience

Sample records for lysine-rich dehydrin sequence

  1. Disorder and function: a review of the dehydrin protein family

    Directory of Open Access Journals (Sweden)

    Steffen P Graether

    2014-10-01

    Full Text Available Dehydration proteins (dehydrins are group 2 members of the late embryogenesis abundant (LEA protein family. The protein architecture of dehydrins can be described by the presence of three types of conserved sequence motifs that have been named the K-, Y- and S-segments. By definition, a dehydrin must contain at least one copy of the lysine-rich K-segment. Abiotic stresses such as drought, cold, and salinity cause the upregulation of dehydrin mRNA and protein levels. Despite the large body of genetic and protein evidence of the importance of these proteins in stress response, the in vivo protective mechanism is not fully known. In vitro experimental evidence from biochemical assays and localization experiments suggest multiple roles for dehydrins, including membrane protection, cryoprotection of enzymes, and protection from reactive oxygen species. Membrane binding by dehydrins is likely to be as a peripheral membrane protein, since the protein sequences are highly hydrophilic and contain many charged amino acids. Because of this, dehydrins in solution are intrinsically disordered proteins, that is, they have no well-defined secondary or tertiary structure. Despite their disorder, dehydrins have been shown to gain structure when bound to ligands such as membranes, and to possibly change their oligomeric state when bound to ions. We review what is currently known about dehydrin sequences and their structures, and examine the various ligands that have been shown to bind to this family of proteins.

  2. Coacervate-like microspheres from lysine-rich proteinoid

    Science.gov (United States)

    Rohlfing, D. L.

    1975-01-01

    Microspheres form isothermally from lysine-rich proteinoid when the ionic strength of the solution is increased with NaCl or other salts. Studies with different monovalent anions and with polymers of different amino acid composition indicate that charge neutralization and hydrophobic bonding contribute to microsphere formation. The particles also form in sea water, especially if heated or made slightly alkaline. The microspheres differ from those made from acidic proteinoid but resemble coacervate droplets in some ways (isothermal formation, limited stability, stabilization by quinone, uptake of dyes). Because the constituent lysine-rich proteinoid is of simulated prebiotic origin, the study is interpreted to add emphasis to and suggest an evolutionary continuity for coacervation phenomena.

  3. Genome Analysis of Conserved Dehydrin Motifs in Vascular Plants

    Directory of Open Access Journals (Sweden)

    Ahmad A. Malik

    2017-05-01

    Full Text Available Dehydrins, a large family of abiotic stress proteins, are defined by the presence of a mostly conserved motif known as the K-segment, and may also contain two other conserved motifs known as the Y-segment and S-segment. Using the dehydrin literature, we developed a sequence motif definition of the K-segment, which we used to create a large dataset of dehydrin sequences by searching the Pfam00257 dehydrin dataset and the Phytozome 10 sequences of vascular plants. A comprehensive analysis of these sequences reveals that lysine residues are highly conserved in the K-segment, while the amino acid type is often conserved at other positions. Despite the Y-segment name, the central tyrosine is somewhat conserved, but can be substituted with two other small aromatic amino acids (phenylalanine or histidine. The S-segment contains a series of serine residues, but in some proteins is also preceded by a conserved LHR sequence. In many dehydrins containing all three of these motifs the S-segment is linked to the K-segment by a GXGGRRKK motif (where X can be any amino acid, suggesting a functional linkage between these two motifs. An analysis of the sequences shows that the dehydrin architecture and several biochemical properties (isoelectric point, molecular mass, and hydrophobicity score are dependent on each other, and that some dehydrin architectures are overexpressed during certain abiotic stress, suggesting that they may be optimized for a specific abiotic stress while others are involved in all forms of dehydration stress (drought, cold, and salinity.

  4. De Novo Regulatory Motif Discovery Identifies Significant Motifs in Promoters of Five Classes of Plant Dehydrin Genes.

    Science.gov (United States)

    Zolotarov, Yevgen; Strömvik, Martina

    2015-01-01

    Plants accumulate dehydrins in response to osmotic stresses. Dehydrins are divided into five different classes, which are thought to be regulated in different manners. To better understand differences in transcriptional regulation of the five dehydrin classes, de novo motif discovery was performed on 350 dehydrin promoter sequences from a total of 51 plant genomes. Overrepresented motifs were identified in the promoters of five dehydrin classes. The Kn dehydrin promoters contain motifs linked with meristem specific expression, as well as motifs linked with cold/dehydration and abscisic acid response. KS dehydrin promoters contain a motif with a GATA core. SKn and YnSKn dehydrin promoters contain motifs that match elements connected with cold/dehydration, abscisic acid and light response. YnKn dehydrin promoters contain motifs that match abscisic acid and light response elements, but not cold/dehydration response elements. Conserved promoter motifs are present in the dehydrin classes and across different plant lineages, indicating that dehydrin gene regulation is likely also conserved.

  5. A dehydrin-dehydrin interaction: the case of SK3 from Opuntia streptacantha

    Directory of Open Access Journals (Sweden)

    Itzell eHernandez

    2014-10-01

    Full Text Available Dehydrins belongs to a large group of highly hydrophilic proteins known as Late Embryogenesis Abundant (LEA proteins. It is well know that dehydrins are intrinsically disordered plant proteins that accumulate during the late stages of embryogenesis and in response to abiotic stresses; however, the molecular mechanisms by which their functions are carried out are still unclear. We have previously reported that transgenic Arabidopsis plants overexpressing an Opuntia streptacantha SK3 dehydrin (OpsDHN1 show enhanced tolerance to freezing stress. Herein, we show using a split-ubiquitin yeast two-hybrid system that OpsDHN1 dimerizes. We found that the deletion of regions containing K-segments and the histidine-rich region in the OpsDHN1 protein affects dimer formation. Not surprisingly, in silico protein sequence analysis suggests that OpsDHN1 is an intrinsically disordered protein, an observation that was confirmed by circular dichroism and gel filtration of the recombinantly expressed protein. The addition of zinc triggered the association of recombinantly expressed OpsDHN1 protein, likely through its histidine-rich motif. These data brings new insights about the molecular mechanism of the OpsDHN1 SK3-dehydrin

  6. Functional characterization of a dehydrin protein from Fagus sylvatica seeds using experimental and in silico approaches.

    Science.gov (United States)

    Kalemba, Ewa Marzena; Litkowiec, Monika

    2015-12-01

    A strong increase in the level of dehydrin/response ABA transcripts expression reported from the 14th week after flowering coincident with the accumulation of 26 and 44 kDa dehydrins in the embryonic axes of developing beech (Fagus sylvatica L.) seeds. Both transcript and protein levels were strongly correlated with maturation drying. These results suggest that the 44-kDa dehydrin protein is a putative dimer of dehydrin/response ABA protein migrating as a 26-kDa protein. Dehydrins and dehydrin-like proteins form large oligomeric complexes under native conditions and are shown as several spots differing in pI through isoelectrofocusing analyses. Detailed prediction of specific sites accessible for various post-translational modifications (PTMs) in the dehydrin/response ABA protein sequence revealed sites specific to acetylation, amidation, glycosylation, methylation, myristoylation, nitrosylation, O-linked β-N-acetylglucosamination and Yin-O-Yang modification, palmitoylation, phosphorylation, sumoylation, sulfation, and ubiquitination. Thus, these results suggest that specific PTMs might play a role in switching dehydrin function or activity, water binding ability, protein-membrane interactions, transport and subcellular localization, interactions with targeted molecules, and protein stability. Despite the ability of two Cys residues to form a disulfide bond, -SH groups are likely not involved in dimer arrangement. His-rich regions and/or polyQ-tracts are potential candidates as spatial organization modulators. Dehydrin/response ABA protein is an intrinsically disordered protein containing low complexity regions. The lack of a fixed structure and exposition of amino acids on the surface of the protein structure enhances the accessibility to 40 predicted PTM sites, thereby facilitating dehydrin multifunctionality, which is discussed in the present study.

  7. Molecular mechanism of dehydrin in response to environmental stress in plant

    Institute of Scientific and Technical Information of China (English)

    ZHANG Yuxiu; WANG Zi; XU Jin

    2007-01-01

    Dehydrins, known as the D-11 subgroup of late embryogenesis abundant (LEA) protein, are an immunologically distinct family of proteins, which typically accumulate in desiccation-tolerant seed embryo or in vegetative tissues in response to various environmental stresses such as drought, salinity and freezing. The existence of conservative sequences designated as K, S, and Y segments is a structural feature of dehydrins, and the K segment found in all dehydrins represents a highly conserved 15 amino acid motif (EKKGIMDKIKEKLPG) and forms an amphiphilic a-helix. According to the arrangement of these domains and clustering analysis, dehydrins are subdivided into 5 subtypes: YnSK, Kn, KnS, SKn and YnK. Different types of dehydrins are induced by different environmental stress in plants. Study results showed that dehydrins might play important protective roles under abiotic stress via a number of different mechanisms, including improving or protecting enzyme activities by the cryoprotective activity in responding to freeze/thaw or dehydration; stabilizing vesicles or other endomembrane structures by function as the membrane stabilizer during freeze induced dehydration,and preventing the membrane system from the oxidative damage induced by reactive oxygen radicals as the radical scavenger. Here, the gene expression and molecular mechanisms of dehydrin in response to stress in plants are discussed.

  8. Quantitative microspectral evaluation of the ratio of arginine-rich to lysine-rich histones in neurons and neuroglial cells.

    Science.gov (United States)

    Pevzner, L Z; Raygorodskaya, T G; Agroskin, L S

    1978-09-01

    Staining of nervous tissue sections with ammoniacal silver according to Black et al. has been confirmed to be a reliable histochemical colour reaction for quantitative evaluation of arginine-rich and lysine-rich histones in cell structures on the basis of determinations of the position of spectral curve maximum. Neurons of several brain nuclei which differed in predominating neurotransmitter did not differ in the ratio of arginine-rich to lysine-rich histones while some differences in this ratio were found out in the glial satelite cells adjacent to the corresponding neurons of these nuclei. Moderate circadian fluctuations were observed in the arginine-rich to lysine-rich histone ratio, these fluctuations being rather similar in the neurons studied and in the cells of perineuronal neuroglia.

  9. Characterization of two novel cold-inducible K3 dehydrin genes from alfalfa (Medicago sativa spp. sativa L.).

    Science.gov (United States)

    Dubé, Marie-Pier; Castonguay, Yves; Cloutier, Jean; Michaud, Josée; Bertrand, Annick

    2013-03-01

    Dehydrin defines a complex family of intrinsically disordered proteins with potential adaptive value with regard to freeze-induced cell dehydration. Search within an expressed sequence tags library from cDNAs of cold-acclimated crowns of alfalfa (Medicago sativa spp. sativa L.) identified transcripts putatively encoding K(3)-type dehydrins. Analysis of full-length coding sequences unveiled two highly homologous sequence variants, K(3)-A and K(3)-B. An increase in the frequency of genotypes yielding positive genomic amplification of the K(3)-dehydrin variants in response to selection for superior tolerance to freezing and the induction of their expression at low temperature strongly support a link with cold adaptation. The presence of multiple allelic forms within single genotypes and independent segregation indicate that the two K(3) dehydrin variants are encoded by distinct genes located at unlinked loci. The co-inheritance of the K(3)-A dehydrin with a Y(2)K(4) dehydrin restriction fragment length polymorphism with a demonstrated impact on freezing tolerance suggests the presence of a genome domain where these functionally related genes are located. These results provide additional evidence that dehydrin play important roles with regard to tolerance to subfreezing temperatures. They also underscore the value of recurrent selection to help identify variants within a large multigene family in allopolyploid species like alfalfa.

  10. Identification of a new antimicrobial lysine-rich cyclolipopeptide family from Xenorhabdus nematophila.

    Science.gov (United States)

    Gualtieri, Maxime; Aumelas, André; Thaler, Jacques-Olivier

    2009-06-01

    Entomopathogenic bacteria of the genus Xenorhabdus are known to be symbiotically associated with soil dwelling nematodes of the Steinernematidae family. These bacteria are transported by their nematode hosts into the hemocoel of the insect larvae, where they proliferate and produce insecticidal proteins, inhibitors of the insect immune system and antimicrobial molecules. In this study, we describe the discovery of a new family (PAX) of five antimicrobial compounds produced by fermentation of the Xenorhabdus nematophila F1 strain and purified by cation exchange chromatography and reversed phase chromatography. The chemical structure of PAX 3, a lysine-rich cyclolipopetide, was obtained from the analysis of homo and heteronuclear 2D NMR and confirmed by MS-MS experiments. The five members of the PAX family showed significant activity against plants and human fungal pathogens and moderate activity against few bacteria and yeast. No cytotoxicity was observed on CHO or insect cells.

  11. Establishing the lysine-rich protein CEST reporter gene as a CEST MR imaging detector for oncolytic virotherapy

    NARCIS (Netherlands)

    C.T. Farrar (Christian T.); J.S. Buhrman (Jason); G. Liu (Guanshu); A. Kleijn (Anne); M.L.M. Lamfers (Martine); M.T. McMahon (Michael T.); A.A. Gilad (Assaf A.); G. Fulci (Giulia)

    2015-01-01

    textabstractPurpose: To (a) evaluate whether the lysine-rich protein (LRP) magnetic resonance (MR) imaging reporter gene can be engineered into G47Δ, a herpes simplex-derived oncolytic virus that is currently being tested in clinical trials, without disrupting its therapeutic effectiveness and (b) e

  12. Identification and Characterization of the Lysine-Rich Matrix Protein Family in Pinctada fucata: Indicative of Roles in Shell Formation.

    Science.gov (United States)

    Liang, Jian; Xie, Jun; Gao, Jing; Xu, Chao-Qun; Yan, Yi; Jia, Gan-Chu; Xiang, Liang; Xie, Li-Ping; Zhang, Rong-Qing

    2016-12-01

    Mantle can secret matrix proteins playing key roles in regulating the process of shell formation. The genes encoding lysine-rich matrix proteins (KRMPs) are one of the most highly expressed matrix genes in pearl oysters. However, the expression pattern of KRMPs is limited and the functions of them still remain unknown. In this study, we isolated and identified six new members of lysine-rich matrix proteins, rich in lysine, glycine and tyrosine, and all of them are basic matrix proteins. Combined with four members of the KRMPs previously reported, all these proteins can be divided into three subclasses according to the results of phylogenetic analyses: KRMP1-3 belong to subclass KPI, KRMP4-5 belong to KPII, and KRMP6-10 belong to KPIII. Three subcategories of lysine-rich matrix proteins are highly expressed in the D-phase, the larvae and adult mantle. Lysine-rich matrix proteins are involved in the shell repairing process and associated with the formation of the shell and pearl. What's more, they can cause abnormal shell growth after RNA interference. In detail, KPI subgroup was critical for the beginning formation of the prismatic layer; both KPII and KPIII subgroups participated in the formation of prismatic layer and nacreous layer. Compared with different temperatures and salinity stimulation treatments, the influence of changes in pH on KRMPs gene expression was the greatest. Recombinant KRMP7 significantly inhibited CaCO3 precipitation, changed the morphology of calcite, and inhibited the growth of aragonite in vitro. Our results are beneficial to understand the functions of the KRMP genes during shell formation.

  13. The role of hydrophobic amino acids of K-segments in the cryoprotection of lactate dehydrogenase by dehydrins.

    Science.gov (United States)

    Hara, Masakazu; Endo, Takuya; Kamiya, Keita; Kameyama, Ayuko

    2017-03-01

    Dehydrins, which are group 2 late embryogenesis abundant (LEA) proteins, accumulate in plants during the development of the embryo and exposure to abiotic stresses including low temperature. Dehydrins exhibit cryoprotection of freezing-sensitive enzymes, e.g. lactate dehydrogenase (LDH). Although it has been reported that K-segments conserved in dehydrins are related to their cryoprotection activity, it has not been determined which sequence features of the K-segments contribute to the cryoprotection. A cryoprotection assay using LDH indicated that 13 K-segments including 12 K-segments found in Arabidopsis dehydrins and a typical K-segment (TypK, EKKGIMEKIKEKLPG) derived from the K-segments of many plants showed similar cryoprotective activities. Mutation of the TypK sequence demonstrated that hydrophobic amino acids were clearly involved in preventing the cryoinactivation, cryoaggregation, and cryodenaturation of LDH. We propose that the cryoprotective activities of dehydrins may be made possible by the hydrophobic residues of the K-segments. Copyright © 2016 Elsevier GmbH. All rights reserved.

  14. Unintended Changes in Genetically Modified Rice Expressing the Lysine-Rich Fusion Protein Gene Revealed by a Proteomics Approach

    Institute of Scientific and Technical Information of China (English)

    ZHAO Xiang-xiang; TANG Tang; LIU Fu-xia; LU Chang-li; HU Xiao-lan; JI Li-lian; LIU Qiao-quan

    2013-01-01

    Development of new technologies for evaluating genetically modiifed (GM) crops has revealed that there are unintended insertions and expression changes in GM crops. Proifling techniques are non-targeted approaches and are capable of detecting more unintended changes in GM crops. Here, we report the application of a comparative proteomic approach to investigate the protein proifle differences between a GM rice line, which has a lysine-rich protein gene, and its non-transgenic parental line. Proteome analysis by two-dimensional gel electrophoresis (2-DE) and mass spectrum analysis of the seeds identiifed 22 differentially expressed protein spots. Apart from a number of glutelins that were detected as targeted proteins in the GM line, the majority of the other changed proteins were involved in carbohydrate metabolism, protein synthesis and stress responses. These results indicated that the altered proteins were not associated with plant allergens or toxicity.

  15. Dual Roles of the Lysine-Rich Matrix Protein (KRMP-3 in Shell Formation of Pearl Oyster, Pinctada fucata.

    Directory of Open Access Journals (Sweden)

    Jian Liang

    Full Text Available Matrix proteins play important roles in shell formation. Our group firstly isolated three cDNAs encoding lysine-rich matrix protein from Pinctada fucata in 2006. However, the functions of KRMPs are not fully understood. In addition, KRMPs contain two functional domains, the basic domain and the Gly/Tyr domain respectively. Based on the modular organization, the roles of their two domains were poorly characterized. Furthermore, KRMPs were then reported in other two species, P. maxima and P. margaritifera, which indicated that KRMPs might be very important for shell formation. In this study, the characterization and function of KRMP-3 and its two functional domains were studied in vitro through purification of recombinant glutathione S-transferase tagged KRMP-3 and two KRMP-3 deletion mutants. Western blot and immunofluorescence revealed that native KRMP-3 existed in the EDTA-insoluble matrix of the prismatic layer and was located in the organic sheet and the prismatic sheath. Recombinant KRMP-3 (rKRMP-3 bound tightly to chitin and this binding capacity was duo to the Gly/Tyr-rich region. rKRMP-3 inhibited the precipitation of CaCO3, affected the crystal morphology of calcite and inhibited the growth of aragonite in vitro, which was almost entirely attributed to the lysine-rich region. The results present direct evidence of the roles of KRMP-3 in shell biomineralization. The functional rBR region was found to participate in the growth control of crystals and the rGYR region was responsible to bind to chitin.

  16. Cloning and expression pattern of a dehydrin-like BDN1 gene from drought-tolerant Boea crassifolia Hemsl.

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    A 500-bp cDNA fragment was amplified via RT-PCR from drought-induced total RNA of the drought-tolerant B. crassifolia Hemsl. using primers based on the sequence of published dehydrin conserved region. By using 5′RACE, full-length coding region (1 148 bp) of BDN1 gene was produced. It is a new member of the dehydrin gene family. Southern analysis indicated that BDN1 is present in the B. crassifolia genome as a single-copy gene. Northern analysis revealed that its expression is inducible by drought and cold stresses as well as ABA application.

  17. Construction and Expression of Methionine-rich and Lysine-rich Fusion Gene inBacillus natto

    Institute of Scientific and Technical Information of China (English)

    Zhang Shuang; Luo Chao-chao; Wu Cai-xia; Gao Xue-jun

    2015-01-01

    Methionine and lysine are restrictive essential amino acids of livestock, they are also the most attentive indexes in the feed production to carry out the quality control and quality evaluation. Their contents in feed directly affect livestock protein synthesis. Bacillus natto has excellent probiotic properties. In this experiment, we used the genetic engineering method, fusion PCR technique, to connect methionine-rich gene (zein) from maize endosperm protein with lysine-rich gene (Cflr) from the pepper anther, then the fusion gene was inserted into the expression vector pHT43, and the recombinant plasmid pHT43/zein-Cflr was constructed. The recombinant plasmid was transferred intoBacillus natto, and induced by IPTG for the expression of the fusion gene. We found an apparent band at 40 ku site for the recombinant strain by SDS-PAGE. The contents of methionine and lysine were individually detected with HPLC, the quantities of methionine and lysine in the recombinant strain increased by 18.37% and 24.68% than the wild one, respectively. We also verified the stability of the recombinant bacterium during passaging, and found the stability was 100%. This study provided research-basis for the application of the recombinedBacillus nattoas feed additive.

  18. Improvement of Drought Tolerance in Transgenic Tobacco Plants by a Dehydrin-Like Gene Transfer

    Institute of Scientific and Technical Information of China (English)

    SHEN Ye; JIA Wei-long; ZHANG Yan-qin; HU Yuan-lei; WU Qi; LIN Zhong-ping

    2004-01-01

    A full-length cDNA of dehydrin BcDh2 from Boea crassifolia and its antisense nucleotide sequence have been transferred into tobacco (Nicotiana tabacum) NC89 under the control of a caulifower mosaic virus 35S promoter. Under a progressive water stress, photosynthetic rate, transpiration rate and stomatal conductance of the sense and antisense plants reduced, and those of the control reduced much more. Photosynthetic rate, transpiration rate and stomatal conductance of all plants tested increased significantly 24 hours later after recoveried water supply, and those of the sense and antisense plants were higher than control. These indicated that overexpression of a dehydrin gene in tobacco may improve tolerance to water stress for plants, however, antisense BcDh2 gene in transgenic plant did not influence physiological conditions. The results of germination experiment of the transgenic seeds showed that on MS medium with different concentration PEG (8000), sense seed could more endure drought than control, while antisense seed was sensitive to drought. The results suggested that the overexpression of a dehydrin gene in tobacco might improve the tolerance to water stress for plants.

  19. Improvement of Drought Tolerance in Transgenic Tobacco Plants by aDehydrin-Like Gene Transfer

    Institute of Scientific and Technical Information of China (English)

    SHENYe; JIAWei-long; ZHANGYan-qin; HUYuan-lei; WUQi; LINZhongping

    2004-01-01

    A full-length cDNA of dehydrin BcDh2 from Boea crassifolia and its antisense nucleotide sequence have been transferred into tobacco (Nicotiana tabacum) NC89 under the control of a caulifower mosaic virus 35S promoter. Under a progressive water stress, photosynthetic rate, transpiration rate and stomatal conductance of the sense and antisense plants reduced, and those of the control reduced much more. Photosynthetic rate, transpiration rate and stomatal conductance of all plants tested increased significantly 24 hours later after recoveried water supply, and those of the sense and antisense plants were higher than control. These indicated that overexpression of a dehydrin gene in tobacco may improve tolerance to water stress for plants, however, antisense BcDh2 gene in transgenic plant did not influence physiological conditions. The results of germination experiment of the transgenic seeds showed that on MS medium with different concentration PEG (8000), sense seed could more endure drought than control, while antisense seed was sensitive to drought. The results suggested that the overexpression of a dehydrin gene in tobacco might improve the tolerance to water stress for plants.

  20. Seed-Specific Expression of a Lysine-Rich Protein Gene, GhLRP, from Cotton Significantly Increases the Lysine Content in Maize Seeds

    Directory of Open Access Journals (Sweden)

    Jing Yue

    2014-03-01

    Full Text Available Maize seed storage proteins are a major source of human and livestock consumption. However, these proteins have poor nutritional value, because they are deficient in lysine and tryptophan. Much research has been done to elevate the lysine content by reducing zein content or regulating the activities of key enzymes in lysine metabolism. Using the naturally lysine-rich protein genes, sb401 and SBgLR, from potato, we previously increased the lysine and protein contents of maize seeds. Here, we examined another natural lysine-rich protein gene, GhLRP, from cotton, which increased the lysine content of transgenic maize seeds at levels varying from 16.2% to 65.0% relative to the wild-type. The total protein content was not distinctly different, except in the six transgenic lines. The lipid and starch levels did not differ substantially in Gossypium hirsutum L. lysine-rich protein (GhLRP transgenic kernels when compared to wild-type. The agronomic characteristics of all the transgenic maize were also normal. GhLRP is a high-lysine protein candidate gene for increasing the lysine content of maize. This study provided a valuable model system for improving maize lysine content.

  1. Seed-specific expression of a lysine-rich protein gene, GhLRP, from cotton significantly increases the lysine content in maize seeds.

    Science.gov (United States)

    Yue, Jing; Li, Cong; Zhao, Qian; Zhu, Dengyun; Yu, Jingjuan

    2014-03-27

    Maize seed storage proteins are a major source of human and livestock consumption. However, these proteins have poor nutritional value, because they are deficient in lysine and tryptophan. Much research has been done to elevate the lysine content by reducing zein content or regulating the activities of key enzymes in lysine metabolism. Using the naturally lysine-rich protein genes, sb401 and SBgLR, from potato, we previously increased the lysine and protein contents of maize seeds. Here, we examined another natural lysine-rich protein gene, GhLRP, from cotton, which increased the lysine content of transgenic maize seeds at levels varying from 16.2% to 65.0% relative to the wild-type. The total protein content was not distinctly different, except in the six transgenic lines. The lipid and starch levels did not differ substantially in Gossypium hirsutum L. lysine-rich protein (GhLRP) transgenic kernels when compared to wild-type. The agronomic characteristics of all the transgenic maize were also normal. GhLRP is a high-lysine protein candidate gene for increasing the lysine content of maize. This study provided a valuable model system for improving maize lysine content.

  2. The clone of wheat dehydrin-like gene wzy2 and its functional ...

    African Journals Online (AJOL)

    Yomi

    2012-05-17

    May 17, 2012 ... et al., 2005). These plants possess large genetic diversity ..... The gene characteristics were similar with the dehydrin .... associations with phenotypic traits. .... function of two dehydrins under environmental stresses in Brassica.

  3. Tr288, a rehydrin with a dehydrin twist.

    Science.gov (United States)

    Velten, J; Oliver, M J

    2001-04-01

    The rehydrin Tr288, originally isolated from a screen for differentially expressed transcripts during rehydration of desiccated moss (Tortula ruralis), was further characterized. Steady-state mRNA levels for Tr288 increase dramatically during slow drying even though protein synthesis is completely inhibited during this process. Tr288 transcripts do not accumulate during rapid drying of moss gametophytes. Conversely, during rehydration of rapidly dried tissue Tr288 transcript levels increase several-fold, while the relatively high amount of Tr288 mRNA sequestered in slowly dried material declines with time after the addition of water. Steady-state Tr288 mRNA also increases after treatment with salt (NaCl) and elevated temperature (37 degrees C). Genomic Southern analysis and isolation of a genomic clone suggest the presence of a single Tr288 gene containing two introns within the T. ruralis genome. The only sequence homology detected by a BLAST search of GenBank occurred at the 3' end of the Tr288 coding region and indicated a single copy of the K segment common to dehydrins. Computer translation of the Tr288 coding region revealed 15 copies of a protein segment (the GPN segment) that is predicted to form amphipathic alpha-helices.

  4. Dissecting the cryoprotection mechanisms for dehydrins

    Directory of Open Access Journals (Sweden)

    Cesar Luis Cuevas-Velazquez

    2014-10-01

    Full Text Available One of the common responses of plants to water deficit is the accumulation of the so-called Late Embryogenesis Abundant (LEA proteins. In vitro studies suggest that these proteins can protect other macromolecules and cellular structural components from the impairments caused by water limitation. Their binding to phospholipids, nucleic acids and/or to divalent cations has suggested multi-functionality. Genetic analyses indicate that these proteins are required for an optimal adjustment of plants to this insult. This diverse information has conducted to propose different models for LEA proteins action mechanisms. Many of these properties are shared by group 2 LEA proteins or dehydrins (DHN, one of the LEA protein families for which large amount of data is available. This manuscript focuses on the different mechanisms proposed for this LEA protein group by analyzing published data derived from in vitro cryoprotection assays. We compared the molar ratio of protectant:enzyme needed to preserve 50% of the initial activity per enzyme monomer to assess different mechanisms of action. Our results add evidence for protein-protein interaction as a protection mechanism but also indicate that some DHNs might protect by different means.

  5. Tibetan hulless barley dehydrin, dhn4, cloning and transforming into tobacco

    Directory of Open Access Journals (Sweden)

    Jianhui Wang

    2011-11-01

    Full Text Available A dehydrin,dhn4, cDNA fragment has been obtained via RT-PCR from Tibetan hulless barley(Hordeum vulgereL. var. nudum Hook. f.. It indicated that dhn4encoded a YSK2 type dehydrin (DHN4. One Y segment (VDEYGNP, one S segment (SGSSSSSSS and two K segments (RKKGIKEKIKEKLPG and EKKGIMDKIKEKLPG were identified in the deduced amino acid sequence of dhn4. The secondary structure of DHN4 protein predicated with software Anthepro 5.0 is prone to ?-helix, and the tertiary structure predicated by SWISS-PORT indicated intrinsically unstructured. The coding region of the dhn4 cloned into pBI121 binary vector with the 35S promoter was transformed into the Agrobacterium tumefaciens strain DHA105. The Agrobacterium mediation was transformed dhn4 into the leaf disc of tobacco and then the tobacco plantlets with kanamycin resistant were regenerated using callus induction mediums supplemented with kanamycin and carbencillin. The regenerated plants were transferred into plots with peat moss and grown in the greenhouse. The inserting dhn4 of regenerated plants were identified separately by PCR, PCR southern blot and DNA sequencing using the gnomic DNA.

  6. Tyrosine-phosphorylated Ehrlichia chaffeensis and Ehrlichia canis tandem repeat orthologs contain a major continuous cross-reactive antibody epitope in lysine-rich repeats.

    Science.gov (United States)

    McBride, Jere W; Zhang, Xiaofeng; Wakeel, Abdul; Kuriakose, Jeeba A

    2011-08-01

    A small subset of major immunoreactive proteins have been identified in Ehrlichia chaffeensis and Ehrlichia canis, including three molecularly and immunologically characterized pairs of immunoreactive tandem repeat protein (TRP) orthologs with major continuous species-specific epitopes within acidic tandem repeats (TR) that stimulate strong antibody responses during infection. In this study, we identified a fourth major immunoreactive TR-containing ortholog pair and defined a major cross-reactive epitope in homologous nonidentical 24-amino-acid lysine-rich TRs. Antibodies from patients and dogs with ehrlichiosis reacted strongly with recombinant TR regions, and epitopes were mapped to the N-terminal TR region (18 amino acids) in E. chaffeensis and the complete TR (24 amino acids) in E. canis. Two less-dominant epitopes were mapped to adjacent glutamate/aspartate-rich and aspartate/tyrosine-rich regions in the acidic C terminus of E. canis TRP95 but not in E. chaffeensis TRP75. Major immunoreactive proteins in E. chaffeensis (75-kDa) and E. canis (95-kD) whole-cell lysates and supernatants were identified with TR-specific antibodies. Consistent with other ehrlichial TRPs, the TRPs identified in ehrlichial whole-cell lysates and the recombinant proteins migrated abnormally slow electrophoretically a characteristic that was demonstrated with the positively charged TR and negatively charged C-terminal domains. E. chaffeensis TRP75 and E. canis TRP95 were immunoprecipitated with anti-pTyr antibody, demonstrating that they are tyrosine phosphorylated during infection of the host cell.

  7. Ribosomal L1 domain and lysine-rich region are essential for CSIG/ RSL1D1 to regulate proliferation and senescence

    Energy Technology Data Exchange (ETDEWEB)

    Ma, Liwei; Zhao, Wenting; Zheng, Quanhui; Chen, Tianda; Qi, Ji; Li, Guodong; Tong, Tanjun, E-mail: tztong@bjmu.edu.cn

    2016-01-15

    The expression change of cellular senescence-associated genes is underlying the genetic foundation of cellular senescence. Using a suppressive subtractive hybridization system, we identified CSIG (cellular senescence-inhibited gene protein; RSL1D1) as a novel senescence-associated gene. CSIG is implicated in various process including cell cycle regulation, apoptosis, and tumor metastasis. We previously showed that CSIG plays an important role in regulating cell proliferation and cellular senescence progression through inhibiting PTEN, however, which domain or region of CSIG contributes to this function? To clarify this question, we investigated the functional importance of ribosomal L1 domain and lysine (Lys) -rich region of CSIG. The data showed that expression of CSIG potently reduced PTEN expression, increased cell proliferation rates, and reduced the senescent phenotype (lower SA-β-gal activity). By contrast, neither the expression of CSIG N- terminal (NT) fragment containing the ribosomal L1 domain nor C-terminal (CT) fragment containing Lys-rich region could significantly altered the levels of PTEN; instead of promoting cell proliferation and delaying cellular senescence, expression of CSIG-NT or CSIG-CT inhibited cell proliferation and accelerated cell senescence (increased SA-β-gal activity) compared to either CSIG over-expressing or control (empty vector transfected) cells. The further immunofluorescence analysis showed that CSIG-CT and CSIG-NT truncated proteins exhibited different subcellular distribution with that of wild-type CSIG. Conclusively, both ribosomal L1 domain and Lys-rich region of CSIG are critical for CSIG to act as a regulator of cell proliferation and cellular senescence. - Highlights: • The ribosomal L1 domain and lysine-rich region of CSIG were expressed. • They are critical for CSIG to regulate proliferation and senescence. • CSIG and its domains exhibit different subcellular distribution.

  8. The polybasic lysine-rich domain of plasma membrane-resident STIM1 is essential for the modulation of store-operated divalent cation entry by extracellular calcium.

    Science.gov (United States)

    Jardin, Isaac; Dionisio, Natalia; Frischauf, Irene; Berna-Erro, Alejandro; Woodard, Geoffrey E; López, José J; Salido, Ginés M; Rosado, Juan A

    2013-05-01

    STIM1 acts as an endoplasmic reticulum Ca(2+) sensor that communicates the filling state of the intracellular stores to the store-operated channels. In addition, STIM1 is expressed in the plasma membrane, with the Ca(2+) binding EF-hand motif facing the extracellular medium; however, its role sensing extracellular Ca(2+) concentrations in store-operated Ca(2+) entry (SOCE), as well as the underlying mechanism remains unclear. Here we report that divalent cation entry stimulated by thapsigargin (TG) is attenuated by extracellular Ca(2+) in a concentration-dependent manner. Expression of the Ca(2+)-binding defective STIM1(D76A) mutant did not alter the surface expression of STIM1 but abolishes the regulation of divalent cation entry by extracellular Ca(2+). Orai1 and TRPC1 have been shown to play a major role in SOCE. Expression of the STIM1(D76A) mutant did not alter Orai1 phosphoserine content. TRPC1 silencing significantly attenuated TG-induced Mn(2+) entry. Expression of the STIM1(K684,685E) mutant impaired the association of plasma membrane STIM1 with TRPC1, as well as the regulation of TG-induced divalent cation entry by extracellular Ca(2+), which suggests that TRPC1 might be involved in the regulation of divalent cation entry by extracellular Ca(2+) mediated by plasma membrane-resident STIM1. Expression of the STIM1(D76A) or STIM1(K684,685E) mutants reduced store-operated divalent cation entry and resulted in loss of dependence on the extracellular Ca(2+) concentration, providing evidence for a functional role of plasma membrane-resident STIM1 in the regulation of store-operated divalent cation entry, which at least involves the EF-hand motif and the C-terminal polybasic lysine-rich domain. Copyright © 2013 Elsevier Inc. All rights reserved.

  9. A dehydration-inducible gene in the truffle Tuber borchii identifies a novel group of dehydrins

    Directory of Open Access Journals (Sweden)

    Bonfante Paola

    2006-03-01

    Full Text Available Abstract Background The expressed sequence tag M6G10 was originally isolated from a screening for differentially expressed transcripts during the reproductive stage of the white truffle Tuber borchii. mRNA levels for M6G10 increased dramatically during fruiting body maturation compared to the vegetative mycelial stage. Results Bioinformatics tools, phylogenetic analysis and expression studies were used to support the hypothesis that this sequence, named TbDHN1, is the first dehydrin (DHN-like coding gene isolated in fungi. Homologs of this gene, all defined as "coding for hypothetical proteins" in public databases, were exclusively found in ascomycetous fungi and in plants. Although complete (or almost complete fungal genomes and EST collections of some Basidiomycota and Glomeromycota are already available, DHN-like proteins appear to be represented only in Ascomycota. A new and previously uncharacterized conserved signature pattern was identified and proposed to Uniprot database as the main distinguishing feature of this new group of DHNs. Expression studies provide experimental evidence of a transcript induction of TbDHN1 during cellular dehydration. Conclusion Expression pattern and sequence similarities to known plant DHNs indicate that TbDHN1 is the first characterized DHN-like protein in fungi. The high similarity of TbDHN1 with homolog coding sequences implies the existence of a novel fungal/plant group of LEA Class II proteins characterized by a previously undescribed signature pattern.

  10. A dehydrin gene isolated from feral olive enhances drought tolelance

    Directory of Open Access Journals (Sweden)

    Adriana eChiappetta

    2015-06-01

    Full Text Available Dehydrins belong to a protein family whose expression may be induced or enhanced by developmental process and environmental stresses that lead to cell dehydratation. A dehydrin gene named OesDHN was isolated and characterized from oleaster (Olea europaea L. subsp. europaea, var. sylvestris, the wild form of olive.To elucidate the contribution of OesDHN in the development of drought tolerance, its expression levels were investigated in oleaster plants during development and under drought stress condition. The involvement of OesDHN in plant stress response was also evaluated in Arabidopsis transgenic lines, engineered to overexpress this gene, and exposed to a controlled mild osmotic stress. OesDHN expression was found to be modulated during development and induced under mild drought stress in oleaster plants. In addition, the Arabidopsis transgenic plants showed a better tolerance to osmotic stress than wild-type plants. The results demonstrated that OesDHN expression is induced by drought stress and is able to confer osmotic stress tolerance. We suggest a role for OesDHN, as a putative functional marker of plant stress tolerance.

  11. Expression of Finger Millet EcDehydrin7 in Transgenic Tobacco Confers Tolerance to Drought Stress.

    Science.gov (United States)

    Singh, Rajiv Kumar; Singh, Vivek Kumar; Raghavendrarao, Sanagala; Phanindra, Mullapudi Lakshmi Venkata; Venkat Raman, K; Solanke, Amolkumar U; Kumar, Polumetla Ananda; Sharma, Tilak Raj

    2015-09-01

    One of the critical alarming constraints for agriculture is water scarcity. In the current scenario, global warming due to climate change and unpredictable rainfall, drought is going to be a master player and possess a big threat to stagnating gene pool of staple food crops. So it is necessary to understand the mechanisms that enable the plants to cope with drought stress. In this study, effort was made to prospect the role of EcDehydrin7 protein from normalized cDNA library of drought tolerance finger millet in transgenic tobacco. Biochemical and molecular analyses of T0 transgenic plants were done for stress tolerance. Leaf disc assay, seed germination test, dehydration assay, and chlorophyll estimation showed EcDehydrin7 protein directly link to drought tolerance. Northern and qRT PCR analyses shows relatively high expression of EcDehydrin7 protein compare to wild type. T0 transgenic lines EcDehydrin7(11) and EcDehydrin7(15) shows superior expression among all lines under study. In summary, all results suggest that EcDehydrin7 protein has a remarkable role in drought tolerance and may be used for sustainable crop breeding program in other food crops.

  12. Detection and subcellular localization of dehydrin-like proteins in quinoa (Chenopodium quinoa Willd.) embryos.

    Science.gov (United States)

    Carjuzaa, P; Castellión, M; Distéfano, A J; del Vas, M; Maldonado, S

    2008-01-01

    The aim of this study was to characterize the dehydrin content in mature embryos of two quinoa cultivars, Sajama and Baer La Unión. Cultivar Sajama grows at 3600-4000 m altitude and is adapted to the very arid conditions characteristic of the salty soils of the Bolivian Altiplano, with less than 250 mm of annual rain and a minimum temperature of -1 degrees C. Cultivar Baer La Unión grows at sea-level regions of central Chile and is adapted to more humid conditions (800 to 1500 mm of annual rain), fertile soils, and temperatures above 5 degrees C. Western blot analysis of embryo tissues from plants growing under controlled greenhouse conditions clearly revealed the presence of several dehydrin bands (at molecular masses of approximately 30, 32, 50, and 55 kDa), which were common to both cultivars, although the amount of the 30 and 32 kDa bands differed. Nevertheless, when grains originated from their respective natural environments, three extra bands (at molecular masses of approximately 34, 38, and 40 kDa), which were hardly visible in Sajama, and another weak band (at a molecular mass of approximately 28 kDa) were evident in Baer La Unión. In situ immunolocalization microscopy detected dehydrin-like proteins in all axis and cotyledon tissues. At the subcellular level, dehydrins were detected in the plasma membrane, cytoplasm and nucleus. In the cytoplasm, dehydrins were found associated with mitochondria, rough endoplasmic reticulum cisternae, and proplastid membranes. The presence of dehydrins was also recognized in the matrix of protein bodies. In the nucleus, dehydrins were associated with the euchromatin. Upon examining dehydrin composition and subcellular localization in two quinoa cultivars belonging to highly contrasting environments, we conclude that most dehydrins detected here were constitutive components of the quinoa seed developmental program, but some of them (specially the 34, 38, and 40 kDa bands) may reflect quantitative molecular differences

  13. Comparison of Dehydrin Gene Expression and Freezing Tolerance in Bromus inermis and Secale cereale Grown in Controlled Environments, Hydroponics, and the Field.

    Science.gov (United States)

    Robertson, A. J.; Weninger, A.; Wilen, R. W.; Fu, P.; Gusta, L. V.

    1994-01-01

    There have been very few reports on the expression of stress-responsive genes in field-grown material. A barley dehydrin cDNA was used to investigate the expression of dehydrin-like transcripts after low-temperature and abscisic acid-induced acclimation of bromegrass (Bromus inermis Leyss) suspension cells and of bromegrass and rye (Secale cereale) plants grown in the field and under controlled environmental conditions. Field-acclimated plants accumulated high levels of dehydrin transcripts and were very freezing tolerant. Plants grown in pots and hydroponics under controlled environments also accumulated dehydrin transcripts and showed increased freezing tolerance. Simulation of a combined drought and freezing stress in pots resulted in expression of dehydrin-like transcripts comparable to those observed in field-acclimated material. PMID:12232403

  14. Pleiotropic effects of the wheat dehydrin DHN-5 on stress responses in Arabidopsis.

    Science.gov (United States)

    Brini, Faïçal; Yamamoto, Akiko; Jlaiel, Lobna; Takeda, Shin; Hobo, Tokunori; Dinh, Huy Q; Hattori, Tsukaho; Masmoudi, Khaled; Hanin, Moez

    2011-04-01

    We have previously reported that transgenic Arabidopsis plants overexpressing the wheat dehydrin DHN-5 show enhanced tolerance to osmotic stresses. In order to understand the mechanisms through which DHN-5 exerts this effect, we performed transcriptome profiling using the Affymetrix ATH1 microarray. Our data show an altered expression of 77 genes involved mainly in transcriptional regulation, cellular metabolism, stress tolerance and signaling. Among the up-regulated genes, we identified those which are known to be stress-related genes. Several late embryogenesis abundant (LEA) genes, ABA/stress-related genes (such as RD29B) and those involved in pathogen responses (PR genes) are among the most up-regulated genes. In addition, the MDHAR gene involved in the ascorbate biosynthetic pathway was also up-regulated. This up-regulation was correlated with higher ascorbate content in two dehydrin transgenic lines. In agreement with this result and as ascorbate is known to be an antioxidant, we found that both transgenic lines show enhanced tolerance to oxidative stress caused by H₂O₂. On the other hand, multiple types of transcription factors constitute the largest group of the down-regulated genes. Moreover, three members of the jasmonate-ZIM domain (JAZ) proteins which are negative regulators of jasmonate signaling were severely down-regulated. Interestingly, the dehydrin-overexpressing lines exhibit less sensitivity to jasmonate than wild-type plants and changes in regulation of jasmonate-responsive genes, in a manner similar to that in the jasmonate-insensitive jai3-1 mutant. Altogether, our data unravel the potential pleiotropic effects of DHN-5 on both abiotic and biotic stress responses in Arabidopsis.

  15. Dehydrin-like proteins in the necrotrophic fungus Alternaria brassicicola have a role in plant pathogenesis and stress response.

    Science.gov (United States)

    Pochon, Stéphanie; Simoneau, Philippe; Pigné, Sandrine; Balidas, Samuel; Bataillé-Simoneau, Nelly; Campion, Claire; Jaspard, Emmanuel; Calmes, Benoît; Hamon, Bruno; Berruyer, Romain; Juchaux, Marjorie; Guillemette, Thomas

    2013-01-01

    In this study, the roles of fungal dehydrin-like proteins in pathogenicity and protection against environmental stresses were investigated in the necrotrophic seed-borne fungus Alternaria brassicicola. Three proteins (called AbDhn1, AbDhn2 and AbDhn3), harbouring the asparagine-proline-arginine (DPR) signature pattern and sharing the characteristic features of fungal dehydrin-like proteins, were identified in the A. brassicicola genome. The expression of these genes was induced in response to various stresses and found to be regulated by the AbHog1 mitogen-activated protein kinase (MAPK) pathway. A knock-out approach showed that dehydrin-like proteins have an impact mainly on oxidative stress tolerance and on conidial survival upon exposure to high and freezing temperatures. The subcellular localization revealed that AbDhn1 and AbDhn2 were associated with peroxisomes, which is consistent with a possible perturbation of protective mechanisms to counteract oxidative stress and maintain the redox balance in AbDhn mutants. Finally, we show that the double deletion mutant ΔΔabdhn1-abdhn2 was highly compromised in its pathogenicity. By comparison to the wild-type, this mutant exhibited lower aggressiveness on B. oleracea leaves and a reduced capacity to be transmitted to Arabidopsis seeds via siliques. The double mutant was also affected with respect to conidiation, another crucial step in the epidemiology of the disease.

  16. Dehydrin-like proteins in the necrotrophic fungus Alternaria brassicicola have a role in plant pathogenesis and stress response.

    Directory of Open Access Journals (Sweden)

    Stéphanie Pochon

    Full Text Available In this study, the roles of fungal dehydrin-like proteins in pathogenicity and protection against environmental stresses were investigated in the necrotrophic seed-borne fungus Alternaria brassicicola. Three proteins (called AbDhn1, AbDhn2 and AbDhn3, harbouring the asparagine-proline-arginine (DPR signature pattern and sharing the characteristic features of fungal dehydrin-like proteins, were identified in the A. brassicicola genome. The expression of these genes was induced in response to various stresses and found to be regulated by the AbHog1 mitogen-activated protein kinase (MAPK pathway. A knock-out approach showed that dehydrin-like proteins have an impact mainly on oxidative stress tolerance and on conidial survival upon exposure to high and freezing temperatures. The subcellular localization revealed that AbDhn1 and AbDhn2 were associated with peroxisomes, which is consistent with a possible perturbation of protective mechanisms to counteract oxidative stress and maintain the redox balance in AbDhn mutants. Finally, we show that the double deletion mutant ΔΔabdhn1-abdhn2 was highly compromised in its pathogenicity. By comparison to the wild-type, this mutant exhibited lower aggressiveness on B. oleracea leaves and a reduced capacity to be transmitted to Arabidopsis seeds via siliques. The double mutant was also affected with respect to conidiation, another crucial step in the epidemiology of the disease.

  17. Physcomitrella Patens Dehydrins (PpDHNA and PpDHNC Confer Salinity and Drought Tolerance to Transgenic Arabidopsis Plants

    Directory of Open Access Journals (Sweden)

    Qilong Li

    2017-07-01

    Full Text Available Dehydrins (DHNs as a member of late-embryogenesis-abundant (LEA proteins are involved in plant abiotic stress tolerance. Two dehydrins PpDHNA and PpDHNC were previously characterized from the moss Physcomitrella patens, which has been suggested to be an ideal model plant to study stress tolerance due to its adaptability to extreme environment. In this study, functions of these two genes were analyzed by heterologous expressions in Arabidopsis. Phenotype analysis revealed that overexpressing PpDHN dehydrin lines had stronger stress resistance than wild type and empty-vector control lines. These stress tolerance mainly due to the up-regulation of stress-related genes expression and mitigation to oxidative damage. The transgenic plants showed strong scavenging ability of reactive oxygen species(ROS, which was attributed to the enhancing of the content of antioxidant enzymes like superoxide dismutase (SOD and catalase (CAT. Further analysis showed that the contents of chlorophyll and proline tended to be the appropriate level (close to non-stress environment and the malondialdehyde (MDA were repressed in these transgenic plants after exposure to stress. All these results suggest the PpDHNA and PpDHNC played a crucial role in response to drought and salt stress.

  18. 脱水素的分布、结构和功能%Distribution, Structure and Function of Dehydrins

    Institute of Scientific and Technical Information of China (English)

    马慧; 孙檬; 安亭亭; 钟鸣

    2015-01-01

    脱水素(Dehydrin)属于晚期胚胎发生蛋白(LEA)家族中的D-域家族,广泛存在于不同植物的细胞质、细胞核、液泡、叶绿体、线粒体或核质中。它具有较强的亲水性,可以稳定细胞膜,保护蛋白质和螯合金属离子,因此其在植物响应非生物逆境环境如干旱、低温冻害、高盐碱等胁迫过程中起着重要的作用。脱水素具有K、Y、S三个保守区域,根据其组成分为五类。本文综述了脱水素的结构和功能,以及植物如何利用这些特性提高相应的抗逆能力,同时对脱水素未来的研究方向作出了展望。%Dehydrins (DHN) are group 2 members of the late embryogenesis abundant protein family, and widely present in the cytoplasm, nucleus, vacuole, chloroplasts, mitochondria, or the karyoplasm of different plants. With strong hydrophilic, they are often related to the stability of cell membrane, the protection to protein and the interaction with metal irons. Thus, they play a great role in plant response to abiotic stress such as drought, cold and freeze, and salt. With three conservative domain of K, Y, S, dehydrins can be divided into five categories according to its composition. This review outlines the structure and function of dehydrins and how these traits could be exploited in improving stress tolerance in plants. At the same time, the future direction of dehydrins are also discussed.

  19. Nuclear localization of the dehydrin OpsDHN1 is determined by histidine-rich motif

    Directory of Open Access Journals (Sweden)

    Itzell Euridice Hernández-Sánchez

    2015-09-01

    Full Text Available The cactus OpsDHN1 dehydrin belongs to a large family of disordered and highly hydrophilic proteins known as Late Embryogenesis Abundant (LEA proteins, which accumulate during the late stages of embryogenesis and in response to abiotic stresses. Herein, we present the in vivo OpsDHN1 subcellular localization by N-terminal GFP translational fusion; our results revealed a cytoplasmic and nuclear localization of the GFP::OpsDHN1 protein in Nicotiana benthamiana epidermal cells. In addition, dimer assembly of OpsDHN1 in planta using a Bimolecular Fluorescence Complementation (BiFC approach was demonstrated. In order to understand the in vivo role of the histidine-rich motif, the OpsDHN1-ΔHis version was produced and assayed for its subcellular localization and dimer capability by GFP fusion and BiFC assays, respectively. We found that deletion of the OpsDHN1 histidine-rich motif restricted its localization to cytoplasm, but did not affect dimer formation. In addition, the deletion of the S-segment in the OpsDHN1 protein affected its nuclear localization. Our data suggest that the deletion of histidine-rich motif and S-segment show similar effects, preventing OpsDHN1 from getting into the nucleus. Based on these results, the histidine rich motif is proposed as a targeting element for OpsDHN1 nuclear localization.

  20. Nuclear localization of the dehydrin OpsDHN1 is determined by histidine-rich motif

    Science.gov (United States)

    Hernández-Sánchez, Itzell E.; Maruri-López, Israel; Ferrando, Alejandro; Carbonell, Juan; Graether, Steffen P.; Jiménez-Bremont, Juan F.

    2015-01-01

    The cactus OpsDHN1 dehydrin belongs to a large family of disordered and highly hydrophilic proteins known as Late Embryogenesis Abundant (LEA) proteins, which accumulate during the late stages of embryogenesis and in response to abiotic stresses. Herein, we present the in vivo OpsDHN1 subcellular localization by N-terminal GFP translational fusion; our results revealed a cytoplasmic and nuclear localization of the GFP::OpsDHN1 protein in Nicotiana benthamiana epidermal cells. In addition, dimer assembly of OpsDHN1 in planta using a Bimolecular Fluorescence Complementation (BiFC) approach was demonstrated. In order to understand the in vivo role of the histidine-rich motif, the OpsDHN1-ΔHis version was produced and assayed for its subcellular localization and dimer capability by GFP fusion and BiFC assays, respectively. We found that deletion of the OpsDHN1 histidine-rich motif restricted its localization to cytoplasm, but did not affect dimer formation. In addition, the deletion of the S-segment in the OpsDHN1 protein affected its nuclear localization. Our data suggest that the deletion of histidine-rich motif and S-segment show similar effects, preventing OpsDHN1 from getting into the nucleus. Based on these results, the histidine-rich motif is proposed as a targeting element for OpsDHN1 nuclear localization. PMID:26442018

  1. Dehydrin, alcohol dehydrogenase, and central metabolite levels are associated with cold tolerance in diploid strawberry (Fragaria spp.).

    Science.gov (United States)

    Davik, Jahn; Koehler, Gage; From, Britta; Torp, Torfinn; Rohloff, Jens; Eidem, Petter; Wilson, Robert C; Sønsteby, Anita; Randall, Stephen K; Alsheikh, Muath

    2013-01-01

    The use of artificial freezing tests, identification of biomarkers linked to or directly involved in the low-temperature tolerance processes, could prove useful in applied strawberry breeding. This study was conducted to identify genotypes of diploid strawberry that differ in their tolerance to low-temperature stress and to investigate whether a set of candidate proteins and metabolites correlate with the level of tolerance. 17 Fragaria vesca, 2 F. nilgerrensis, 2 F. nubicola, and 1 F. pentaphylla genotypes were evaluated for low-temperature tolerance. Estimates of temperatures where 50 % of the plants survived (LT₅₀) ranged from -4.7 to -12.0 °C between the genotypes. Among the F. vesca genotypes, the LT₅₀ varied from -7.7 °C to -12.0 °C. Among the most tolerant were three F. vesca ssp. bracteata genotypes (FDP821, NCGR424, and NCGR502), while a F. vesca ssp. californica genotype (FDP817) was the least tolerant (LT₅₀) -7.7 °C). Alcohol dehydrogenase (ADH), total dehydrin expression, and content of central metabolism constituents were assayed in select plants acclimated at 2 °C. The LT₅₀ estimates and the expression of ADH and total dehydrins were highly correlated (r(adh) = -0.87, r (dehyd) = -0.82). Compounds related to the citric acid cycle were quantified in the leaves during acclimation. While several sugars and acids were significantly correlated to the LT₅₀ estimates early in the acclimation period, only galactinol proved to be a good LT₅₀ predictor after 28 days of acclimation (r(galact) = 0.79). It is concluded that ADH, dehydrins, and galactinol show great potential to serve as biomarkers for cold tolerance in diploid strawberry.

  2. Tissue-specific expression and functional role of dehydrins in heat tolerance of sugarcane (Saccharum officinarum).

    Science.gov (United States)

    Galani, Saddia; Wahid, Abdul; Arshad, Muhammad

    2013-04-01

    Studies on the functional roles of dehydrins (DHNs) in heat tolerance of plants are scarce. This study was conducted to immunohistolocalize DHNs in leaves of heat-tolerant (CP-4333) and heat-sensitive (HSF-240) sugarcane (Saccharum officinarum L.) clones at three phenological stages in order to elucidate their putative roles under heat stress. CP-4333 indicated greater amounts of heat-stable proteins than HSF-240 under heat stress. Western blotting revealed the expression of three DHNs in CP-4333 (13- and 15-kDa peptides at 48 h and an additional 18-kDa band at 72 h) and two (13 and 15 kDa at 48 h) in HSF-240 at formative stage; two DHNs in CP-4333 (20 and 25 kDa) and one in HSF-240 (20 kDa) at grand growth stage, while two DHNs in CP-4333 (20 and 22 kDa) and one in HSF-240 (20 kDa) at maturity stage. Tissue-specific immunohistolocalization showed that DHNs were expressed in stele particularly the phloem and the cells intervening bundle sheath and vascular bundles. Furthermore, DHNs were also found scattered along the epidermal and parenchymatous cells. Recovery of sugarcane from heat stress manifested a gradual disappearance of DHNs in both the clones, being quicker in sensitive clone (HSF-240). Results suggested specific implications for DHNs synthesis. Their synthesis in epidermis appears to protect the mesophyll tissues from heat injury. When associated to vascular tissue, they tend to ensure the normal photoassimilate loading into the sieve element-companion cell complex. DHNs diminution during recovery suggested that their expression was transitory. However, prolonged retention of DHNs by tolerant clone appears to be an adaptive advantage of sugarcane to withstand heat stress.

  3. High Genetic Differentiation among European White Oak Species (Quercus spp. at a Dehydrin Gene

    Directory of Open Access Journals (Sweden)

    Iacob CRĂCIUNESC

    2015-12-01

    Full Text Available Dehydryn genes are involved in plant response to environmental stress and may be useful to examine functional diversity in relation to adaptive variation. Recently, a dehydrin gene (DHN3 was isolated in Quercus petraea and showed little differentiation between populations of the same species in an altitudinal transect. In the present study, inter- and intraspecific differentiation patterns in closely related and interfertile oaks were investigated for the first time at the DHN3 locus. A four-oak-species stand (Quercus frainetto Ten., Q. petraea (Matt. Liebl., Q. pubescens Willd., Q. robur L. and two populations for each of five white oak species (Q. frainetto Ten., Q. petraea (Matt. Liebl., Q. pubescens Willd., Q. robur L. and Q. pedunculiflora K. Koch were analyzed. Three alleles shared by all five oak species were observed. However, only two alleles were present in each population, but with different frequencies according to the species. At population level, all interspecific pairs of populations showed significant differentiation, except for pure Q. robur and Q. pedunculiflora populations. In contrast, no significant differentiation (p > 0.05 was found among conspecific populations. The DHN3 locus proved to be very useful to differentiate Q. frainetto and Q. pubescens from Q. pedunculiflora (FST = 0.914 and 0.660, respectively and Q. robur (FST = 0.858 and 0.633, respectively. As expected, the lowest level of differentiation was detected between the most closely related species, Q. robur and Q. pedunculiflora (FST = 0.020. Our results suggest that DHN3 can be an important genetic marker for differentiating among European white oak species.

  4. Activity and isoenzyme spectrum of peroxidases and dehydrins of some plant species, growing on the shores of lake Baikal, under abiotic stress

    Directory of Open Access Journals (Sweden)

    M.A. Zhivet’ev

    2010-11-01

    Full Text Available Termostability and optimal pH of weak-associated with plant cell wall and soluble peroxidases was shown to change in relation to natural conditions and season of year. Also the activity of peroxidase was variable during vegetation period. Dehydrine expression was followed by spike of peroxidase activity (and, a priori, an increase of hydrogen peroxide concentration.

  5. Dehydrins in cold-acclimated apices of birch (Betula pubescens Ehr.) : production, localization and potential role in rescuing enzyme function during dehydration

    NARCIS (Netherlands)

    Rinne, P.L.H.; Kaikuranta, P.L.M.; Plas, van der L.H.W.; Schoot, van der C.

    1999-01-01

    Dehydrins accumulate in various plant tissues during dehydration. Their physiological role is not well understood, but it is commonly assumed that they assist cells in tolerating dehydration. Since in perennials the ability of the shoot apex to withstand dehydration is pivotal for survival through w

  6. Polyamine regulates tolerance to water stress in leaves of white clover associated with antioxidant defense and dehydrin genes via involvement in calcium messenger system and hydrogen peroxide signaling

    Directory of Open Access Journals (Sweden)

    Zhou eLi

    2015-10-01

    Full Text Available Endogenous polyamine (PA may play a critical role in tolerance to water stress in plants acting as a signaling molecule activator. Water stress caused increases in endogenous PA content in leaves, including putrescine (Put, spermidine (Spd, and spermine (Spm. Exogenous application of Spd could induce the instantaneous H2O2 burst and accumulation of cytosolic free Ca2+, and activate NADPH oxidase and CDPK gene expression in cells. To a great extent, PA biosynthetic inhibitor reduced the water stress-induced H2O2 accumulation, free cytosolic Ca2+ release, antioxidant enzyme activities and genes expression leading to aggravate water stress-induced oxidative damage, while these suppressing effects were alleviated by the addition of exogenous Spd, indicating PA was involved in water stress-induced H2O2 and cytosolic free Ca2+ production as well as stress tolerance. Dehydrin genes (Y2SK, Y2K, and SK2 were showed to be highly responsive to exogenous Spd. PA-induced antioxidant defense and dehydrin genes expression could be blocked by the scavenger of H2O2 and the inhibitors of H2O2 generation or Ca2+ channels blockers, a calmodulin antagonist, as well as the inhibitor of CDPK. These findings suggested that PA regulated tolerance to water stress in white clover associated with antioxidant defenses and dehydrins via involvement in the calcium messenger system and H2O2 signaling pathways. PA-induced H2O2 production required Ca2+ release, while PA-induced Ca2+ release was also essential for H2O2 production, suggesting an interaction between PA-induced H2O2 and Ca2+ signaling.

  7. Insights on germinability and desiccation tolerance in developing neem seeds (Azadirachta indica): Role of AOS, antioxidative enzymes and dehydrin-like protein.

    Science.gov (United States)

    Sahu, Balram; Sahu, Alok Kumar; Chennareddy, Srinivasa Rao; Soni, Avinash; Naithani, Subhash Chandra

    2017-03-01

    The germinability and desiccation tolerance (DT) in developing seed are regulated by cellular metabolism involving active oxygen species (AOS) and protective proteins during maturation drying. The aim of the present investigation was to unravel the functions of AOS (superoxide, H2O2 and OH-radical), antioxidative enzymes (SOD, CAT and APX) and dehydrin-like proteins in regulating the germinability and DT in undried and artificially desiccated developing neem seeds. Germination was first observed in seeds of 8 weeks after anthesis (waa) whereas DT was noticed from 9 waa. High levels of superoxide in undried and artificially desiccated seeds of 9 waa were rapidly declined up to 15 waa with simultaneous increase in levels of SOD (quantitative and isoenzymes) that dismutates superoxide with corresponding formation and accumulation of H2O2. Activities and isoenzymes of APX and CAT were promoted in seeds from 9 to 12 waa. Intensity of dehydrin-like proteins increased as development progressed in seeds with higher intensities in slow dried (SD) seeds. Desiccation modulated the metabolism for the acquisition of germinability and DT in the developing neem seeds from 8 to 15 waa by altering the levels of superoxide, H2O2 and OH-radical those possibly act as signalling molecules for reprogramming protective proteins. Desiccation mediated the expression of new bands of SOD and APX in undried as well as SD seeds during 9-12 waa but the bands were more intense in SD seeds. The superoxide and H2O2-regulated intensity of dehydrin-like protein in SD seeds further validated our conclusion. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  8. The Tebuconazole-based Protectant of Seeds “Bunker” Induces the Synthesis of Dehydrins During Cold Hardening and Increases the Frost Resistance of Wheat Seedlings

    Directory of Open Access Journals (Sweden)

    A.V. Korsukova

    2015-12-01

    Full Text Available Triazole derivatives are widely used in agriculture for seed protectant of cereals against seed and soil infection. Triazole derivatives can have an effect on the biochemical and physiological functions of plants. The tebuconazole-based protectant of seeds «Bunker» (content of tebuconazole 60 grams per liter, g/L is a systemic fungicide of preventive and therapeutic action. The effect of the seed treatment by «Bunker» preparation on the shoot growth and cell viability coleoptile, synthesis of dehydrins in shoots and frost resistance etiolated winter and spring wheat seedlings has been studied. It has been shown that treatment of winter and spring wheat seed by «Bunker» preparation induces similar concentration-dependent inhibition of the coleoptiles length. At the recommended dose (0,5 liter per tonne of seeds, L/t growth inhibition was 28 - 30%, at a concentration of 1 L/t – 33 - 36%, at a concentration of 1,5 L/t – 40 - 42%, at a concentration of 3 L/t – 43 - 47%, at a concentration of 4 L/t – 48 - 51% and at 5 L/t – 53 - 56%. The treatment of wheat seed by «Bunker» preparation had no phytotoxic effect on coleoptile cells in any of the studied concentrations, on the contrary, with increasing concentration of preparation observed the increase in cell viability, as measured by recovery of 2,3,5-triphenyltetrazolium chloride. We can assume that having retardant properties, tebuconazole not only inhibits the growth of plants, but also delays their aging. The treatment of seed protectant at a concentration of 1.5 L/t induced synthesis of the dehydrins with molecular masses about 19, 21, 22, 25 and 27 kD in winter wheat shoots and 18,6, 27 and 28,5 kD in spring wheat shoots during cold hardening. Among identified dehydrins the dehydrin of 27 kD is most significantly induced both in winter and spring wheat. The treatment of seed protectant «Bunker» in the same concentration increased the frost resistance of winter and spring wheat

  9. Deciphering the Role of CBF/DREB Transcription Factors and Dehydrins in Maintaining the Quality of Table Grapes cv. Autumn Royal Treated with High CO2 Levels and Stored at 0°C

    Directory of Open Access Journals (Sweden)

    Maria Vazquez-Hernandez

    2017-09-01

    Full Text Available C-repeat/dehydration-responsive element binding factors (CBF/DREB are transcription factors which play a role in improving plant cold stress resistance and recognize the DRE/CRT element in the promoter of a set of cold regulated genes. Dehydrins (DHNs are proteins that accumulate in plants in response to cold stress, which present, in some cases, CBF/DREB recognition sequences in their promoters and are activated by members of this transcription factor family. The application of a 3-day gaseous treatment with 20 kPa CO2 at 0°C to table grapes cv. Autumn Royal maintained the quality of the bunches during postharvest storage at 0°C, reducing weight loss and rachis browning. In order to determine the role of CBF/DREB genes in the beneficial effect of the gaseous treatment by regulating DHNs, we have analyzed the gene expression pattern of three VviDREBA1s (VviDREBA1-1, VviDREBA1-6, and VviDREBA1-7 as well as three VviDHNs (VviDHN1a, VviDHN2, and VviDHN4, in both alternative splicing forms. Results showed that the differences in VviDREBA1s expression were tissue and atmosphere composition dependent, although the application of high levels of CO2 caused a greater increase of VviDREBA1-1 in the skin, VviDREBA1-6 in the pulp and VviDREBA1-7 in the skin and pulp. Likewise, the application of high levels of CO2 regulated the retention of introns in the transcripts of the dehydrins studied in the different tissues analyzed. The DHNs promoter analysis showed that VviDHN2 presented the cis-acting DRE and CRT elements, whereas VviDHN1a presented only the DRE motif. Our electrophoretic mobility shift assays (EMSA showed that VviDREBA1-1 was the only transcription factor that had in vitro binding capacity to the CRT element of the VviDHN2 promoter region, indicating that the transcriptional regulation of VviDHN1a and VviDHN4 would be carried out by activating other independent routes of these transcription factors. Our results suggest that the application of

  10. Significant relationships among frost tolerance and net photosynthetic rate, water use efficiency and dehydrin accumulation in cold-treated winter oilseed rapes.

    Science.gov (United States)

    Urban, Milan Oldřich; Klíma, Miroslav; Vítámvás, Pavel; Vašek, Jakub; Hilgert-Delgado, Alois Albert; Kučera, Vratislav

    2013-12-15

    Five winter oilseed rape cultivars (Benefit, Californium, Cortes, Ladoga, Navajo) were subjected to 30 days of cold treatment (4 °C) to examine the effect of cold on acquired frost tolerance (FT), dehydrin (DHN) content, and photosynthesis-related parameters. The main aim of this study was to determine whether there are relationships between FT (expressed as LT50 values) and the other parameters measured in the cultivars. While the cultivar Benefit accumulated two types of DHNs (D45 and D35), the other cultivars accumulated three additional DHNs (D97, D47, and D37). The similar-sized DHNs (D45 and D47) were the most abundant; the others exhibited significantly lower accumulations. The highest correlations were detected between LT50 and DHN accumulation (r=-0.815), intrinsic water use efficiency (WUEi; r=-0.643), net photosynthetic rate (r=-0.628), stomatal conductance (r=0.511), and intracellular/intercellular CO2 concentration (r=0.505). Those cultivars that exhibited higher Pn rate in cold (and further a significant increase in WUEi) had higher levels of DHNs and also higher FT. No significant correlation was observed between LT50 and E, PRI, or NDVI. Overall, we have shown the selected physiological parameters to be able to distinguish different FT cultivars of winter oilseed rape. Copyright © 2013 Elsevier GmbH. All rights reserved.

  11. Cloning and Expression Characteristics of a Novel Dehydrin Gene from Hazelnut (Corylus heterophylla Fisch.)%平榛脱水素基因的克隆与表达分析

    Institute of Scientific and Technical Information of China (English)

    陈新; 梁丽松; 马庆华; 赵天田; 刘庆忠; 王贵禧

    2013-01-01

    以平榛(Corylus heterophylla Fisch.)花芽为试材,采用RT-PCR和RACE方法克隆了一个平榛与脱水素基因同源的cDNA基因,命名为ChDHN(GenBank登录号HM228389),其全长639 bp,具有一个504 bp的潜在编码区,编码167个氨基酸组成的多肽,具有LEA类家族成员具有的特征多肽序列,属于Y4SK2类型DHN基因,预测ChDHN蛋白质分子量18.03 kD,预测其理论等电点为7.28.对ChDHN的时空表达特性进行了研究,以Actin为内参,对ChDHN在4℃冷激条件下(0、2、4、8、24和48 h)的表达模式进行了初步的研究,冷激处理后ChDHN表现逐渐上调的表达趋势,24 h达到最大表达量,48 h表达量降低;推测ChDHN属于植物冷适应调节网络中的应答基因;定量RT-PCR分析ChDHN在不同器官中的表达,在种子中高丰度表达,其次是雄花序和花芽,在树皮中表达最低.用PCR、酶切和测序鉴定等方法检测已成功构建重组表达载体pET-32a(+)-DHN,将鉴定完全正确的重组质粒转化大肠杆菌BL21(DE3),经SDS-PAGE分析并经过Western blotting鉴定,表明重组蛋白被IPTG诱导后高效表达出一条比预测分子量18.03 kD大4kD的融合蛋白.%A cDNA encoding the dehydrin-like gene homologue was isolated from hazelnut ( Corylus heterophylla Fisch.) by RACE-PCR and designated ChDHN (GenBank accession No. HM228389) . Sequence analysis showed that cDNA of ChDHN was 639 bp long and contained a single open reading frame. The predicted ChDHN protein has 167 amino acids with an estimated molecular mass of 18.03 kD and an isoelectric point of 7.28, qRT-PCR analysis showed that the expression of ChDHN was induced by low temperature and peaked at 24 h after exposed to low temperatures of 4 ℃. The transcripts of ChDHN appeared in many hazelnut tissues including male inflorescence, bark, flower bud and seeds, but mostly accumulated in seeds, the prokaryotic expression plasmid of pET-32a(+)-DHN was sequenced, digested by restricted endonuclease enzyme of

  12. Lysine-Rich Proteins in High-Lysine Hordeum Vulgare Grain

    DEFF Research Database (Denmark)

    Ingversen, J.; Køie, B.

    1973-01-01

    The salt-soluble proteins in barley grain selected for high-lysine content (Hiproly, CI 7115 and the mutants 29 and 86) and of a control (Carlsberg II) with normal lysine content, contain identical major proteins as determined by MW and electrophoretic mobility. The concentration of a protein group...

  13. Lysine Rich Proteins in the Salt-Soluble Protein Fraction of Barley

    DEFF Research Database (Denmark)

    Ingversen, J.; Køie, B.

    1973-01-01

    Fractionation of the protein complex from Emir barley showed that the salt-soluble fraction accounts for 44% of the total lysine content but only for 2.......Fractionation of the protein complex from Emir barley showed that the salt-soluble fraction accounts for 44% of the total lysine content but only for 2....

  14. The strong anti-glioblastoma capacity of the plasma-stimulated lysine-rich medium

    Science.gov (United States)

    Yan, Dayun; Nourmohammadi, Niki; Talbot, Annie; Sherman, Jonathan H.; Keidar, Michael

    2016-07-01

    Plasma-stimulated medium (PSM) shows a remarkable anti-cancer capacity as strong as the direct cold atmospheric plasma (CAP) treatment of cancer cells. PSM is able to effectively resist the growth of several cancer cell lines. To date, the sole approach to strengthen the anti-cancer capacity of PSM is extending the plasma treatment time. In this study, we demonstrated that the anti-glioblastoma capacity of PSM could be significantly increased by adding 20 mM lysine in Dulbecco’s modified Eagle’s medium (DMEM). This study provides clear evidence that the anti-glioblastoma capacity of PSM could be noticeably enhanced by modifying the composition of medium without increasing the CAP treatment time.

  15. Deep sequencing of ESTs from nacreous and prismatic layer producing tissues and a screen for novel shell formation-related genes in the pearl oyster.

    Directory of Open Access Journals (Sweden)

    Shigeharu Kinoshita

    Full Text Available BACKGROUND: Despite its economic importance, we have a limited understanding of the molecular mechanisms underlying shell formation in pearl oysters, wherein the calcium carbonate crystals, nacre and prism, are formed in a highly controlled manner. We constructed comprehensive expressed gene profiles in the shell-forming tissues of the pearl oyster Pinctada fucata and identified novel shell formation-related genes candidates. PRINCIPAL FINDINGS: We employed the GS FLX 454 system and constructed transcriptome data sets from pallial mantle and pearl sac, which form the nacreous layer, and from the mantle edge, which forms the prismatic layer in P. fucata. We sequenced 260477 reads and obtained 29682 unique sequences. We also screened novel nacreous and prismatic gene candidates by a combined analysis of sequence and expression data sets, and identified various genes encoding lectin, protease, protease inhibitors, lysine-rich matrix protein, and secreting calcium-binding proteins. We also examined the expression of known nacreous and prismatic genes in our EST library and identified novel isoforms with tissue-specific expressions. CONCLUSIONS: We constructed EST data sets from the nacre- and prism-producing tissues in P. fucata and found 29682 unique sequences containing novel gene candidates for nacreous and prismatic layer formation. This is the first report of deep sequencing of ESTs in the shell-forming tissues of P. fucata and our data provide a powerful tool for a comprehensive understanding of the molecular mechanisms of molluscan biomineralization.

  16. Development of four phylogenetically-arrayed BAC libraries and sequence of the APA locus in Phaseolus vulgaris.

    Science.gov (United States)

    Kami, James; Poncet, Valérie; Geffroy, Valérie; Gepts, Paul

    2006-04-01

    The APA family of seed proteins consists of three subfamilies, in evolutionary order of hypothesized appearance: phytohaemagglutinins (PHA), alpha-amylase inhibitors (alphaAI), and arcelins (ARL). The APA family plays a defensive role against mammalian and insect seed predation in common bean (Phaseolus vulgaris L.). The main locus (APA) for this gene family is situated on linkage group B4. In order to elucidate the pattern of duplication and diversification at this locus, we developed a BAC library in each of four different Phaseolus genotypes that represent presumptive steps in the evolutionary diversification of the APA family. Specifically, BAC libraries were established in one P. lunatus (cv. 'Henderson: PHA+ alphaAI- ARL-) and three P. vulgaris accessions (presumed ancestral wild G21245 from northern Peru: PHA+ alphaAI+ ARL-; Mesoamerican wild G02771: PHA+ alphaAI+ ARL+; and Mesoamerican breeding line BAT93: PHA+ alphaAI+ ARL-). The libraries were constructed after HindIII digestion of high molecular weight DNA, obtained with a novel nuclei isolation procedure. The frequency of empty or cpDNA-sequence-containing clones in all libraries is low (generally APA gene family, including members of the three subfamilies, as hypothesized. The different subfamilies were interspersed with retrotransposon sequences. In addition, other sequences were identified with similarity to chloroplast DNA, a dehydrin gene, and the Arabidopsis flowering D locus. Linkage between the dehydrin gene and the D1711 RFLP marker identifies a potential syntenic region between parts of common bean linkage group B4 and cowpea linkage group 2.

  17. Modeling of the Ebola Virus Delta Peptide Reveals a Potential Lytic Sequence Motif

    Directory of Open Access Journals (Sweden)

    William R. Gallaher

    2015-01-01

    Full Text Available Filoviruses, such as Ebola and Marburg viruses, cause severe outbreaks of human infection, including the extensive epidemic of Ebola virus disease (EVD in West Africa in 2014. In the course of examining mutations in the glycoprotein gene associated with 2014 Ebola virus (EBOV sequences, a differential level of conservation was noted between the soluble form of glycoprotein (sGP and the full length glycoprotein (GP, which are both encoded by the GP gene via RNA editing. In the region of the proteins encoded after the RNA editing site sGP was more conserved than the overlapping region of GP when compared to a distant outlier species, Tai Forest ebolavirus. Half of the amino acids comprising the “delta peptide”, a 40 amino acid carboxy-terminal fragment of sGP, were identical between otherwise widely divergent species. A lysine-rich amphipathic peptide motif was noted at the carboxyl terminus of delta peptide with high structural relatedness to the cytolytic peptide of the non-structural protein 4 (NSP4 of rotavirus. EBOV delta peptide is a candidate viroporin, a cationic pore-forming peptide, and may contribute to EBOV pathogenesis.

  18. Sequence evolution and expression regulation of stress-responsive genes in natural populations of wild tomato.

    Science.gov (United States)

    Fischer, Iris; Steige, Kim A; Stephan, Wolfgang; Mboup, Mamadou

    2013-01-01

    The wild tomato species Solanum chilense and S. peruvianum are a valuable non-model system for studying plant adaptation since they grow in diverse environments facing many abiotic constraints. Here we investigate the sequence evolution of regulatory regions of drought and cold responsive genes and their expression regulation. The coding regions of these genes were previously shown to exhibit signatures of positive selection. Expression profiles and sequence evolution of regulatory regions of members of the Asr (ABA/water stress/ripening induced) gene family and the dehydrin gene pLC30-15 were analyzed in wild tomato populations from contrasting environments. For S. chilense, we found that Asr4 and pLC30-15 appear to respond much faster to drought conditions in accessions from very dry environments than accessions from more mesic locations. Sequence analysis suggests that the promoter of Asr2 and the downstream region of pLC30-15 are under positive selection in some local populations of S. chilense. By investigating gene expression differences at the population level we provide further support of our previous conclusions that Asr2, Asr4, and pLC30-15 are promising candidates for functional studies of adaptation. Our analysis also demonstrates the power of the candidate gene approach in evolutionary biology research and highlights the importance of wild Solanum species as a genetic resource for their cultivated relatives.

  19. Sequence evolution and expression regulation of stress-responsive genes in natural populations of wild tomato.

    Directory of Open Access Journals (Sweden)

    Iris Fischer

    Full Text Available The wild tomato species Solanum chilense and S. peruvianum are a valuable non-model system for studying plant adaptation since they grow in diverse environments facing many abiotic constraints. Here we investigate the sequence evolution of regulatory regions of drought and cold responsive genes and their expression regulation. The coding regions of these genes were previously shown to exhibit signatures of positive selection. Expression profiles and sequence evolution of regulatory regions of members of the Asr (ABA/water stress/ripening induced gene family and the dehydrin gene pLC30-15 were analyzed in wild tomato populations from contrasting environments. For S. chilense, we found that Asr4 and pLC30-15 appear to respond much faster to drought conditions in accessions from very dry environments than accessions from more mesic locations. Sequence analysis suggests that the promoter of Asr2 and the downstream region of pLC30-15 are under positive selection in some local populations of S. chilense. By investigating gene expression differences at the population level we provide further support of our previous conclusions that Asr2, Asr4, and pLC30-15 are promising candidates for functional studies of adaptation. Our analysis also demonstrates the power of the candidate gene approach in evolutionary biology research and highlights the importance of wild Solanum species as a genetic resource for their cultivated relatives.

  20. An analysis of sequence variability in eight genes putatively involved in drought response in sunflower (Helianthus annuus L.).

    Science.gov (United States)

    Giordani, T; Buti, M; Natali, L; Pugliesi, C; Cattonaro, F; Morgante, M; Cavallini, A

    2011-04-01

    With the aim to study variability in genes involved in ecological adaptations, we have analysed sequence polymorphisms of eight unique genes putatively involved in drought response by isolation and analysis of allelic sequences in eight inbred lines of sunflower of different origin and phenotypic characters and showing different drought response in terms of leaf relative water content (RWC). First, gene sequences were amplified by PCR on genomic DNA from a highly inbred line and their products were directly sequenced. In the absence of single nucleotide polymorphisms, the gene was considered as unique. Then, the same PCR reaction was performed on genomic DNAs of eight inbred lines to isolate allelic variants to be compared. The eight selected genes encode a dehydrin, a heat shock protein, a non-specific lipid transfer protein, a z-carotene desaturase, a drought-responsive-element-binding protein, a NAC-domain transcription regulator, an auxin-binding protein, and an ABA responsive-C5 protein. Nucleotide diversity per synonymous and non-synonymous sites was calculated for each gene sequence. The π (a)/π (s) ratio range was usually very low, indicating strong purifying selection, though with locus-to-locus differences. As far as non-coding regions, the intron showed a larger variability than the other regions only in the case of the dehydrin gene. In the other genes tested, in which one or more introns occur, variability in the introns was similar or even lower than in the other regions. On the contrary, 3'-UTRs were usually more variable than the coding regions. Linkage disequilibrium in the selected genes decayed on average within 1,000 bp, with large variation among genes. A pairwise comparison between genetic distances calculated on the eight genes and the difference in RWC showed a significant correlation in the first phases of drought stress. The results are discussed in relation to the function of analysed genes, i.e. involved in gene regulation and signal

  1. Cell surface binding and uptake of arginine- and lysine-rich penetratin peptides in absence and presence of proteoglycans

    KAUST Repository

    Åmand, Helene L.

    2012-11-01

    Cell surface proteoglycans (PGs) appear to promote uptake of arginine-rich cell-penetrating peptides (CPPs), but their exact functions are unclear. To address if there is specificity in the interactions of arginines and PGs leading to improved internalization, we used flow cytometry to examine uptake in relation to cell surface binding for penetratin and two arginine/lysine substituted variants (PenArg and PenLys) in wildtype CHO-K1 and PG-deficient A745 cells. All peptides were more efficiently internalized into CHO-K1 than into A745, but their cell surface binding was independent of cell type. Thus, PGs promote internalization of cationic peptides, irrespective of the chemical nature of their positive charges. Uptake of each peptide was linearly dependent on its cell surface binding, and affinity is thus important for efficiency. However, the gradients of these linear dependencies varied significantly. Thus each peptide\\'s ability to stimulate uptake once bound to the cell surface is reliant on formation of specific uptake-promoting interactions. Heparin affinity chromatography and clustering experiments showed that penetratin and PenArg binding to sulfated sugars is stabilized by hydrophobic interactions and result in clustering, whereas PenLys only interacts through electrostatic attraction. This may have implications for the molecular mechanisms behind arginine-specific uptake stimulation as penetratin and PenArg are more efficiently internalized than PenLys upon interaction with PGs. However, PenArg is also least affected by removal of PGs. This indicates that an increased arginine content not only improve PG-dependent uptake but also that PenArg is more adaptable as it can use several portals of entry into the cell. © 2012 Elsevier B.V.

  2. Differential mode of antimicrobial actions of arginine-rich and lysine-rich histones against Gram-positive Staphylococcus aureus.

    Science.gov (United States)

    Morita, Shuu; Tagai, Chihiro; Shiraishi, Takayuki; Miyaji, Kazuyuki; Iwamuro, Shawichi

    2013-10-01

    We previously reported the activities and modes of action of arginine (Arg)-rich histones H3 and H4 against Gram-negative bacteria. In the present study, we investigated the properties of the Arg-rich histones against Gram-positive bacteria in comparison with those of lysine (Lys)-rich histone H2B. In a standard microdilution assay, calf thymus histones H2B, H3, and H4 showed growth inhibitory activity against Staphylococcus aureus with minimum effective concentration values of 4.0, 4.0, and 5.6 μM, respectively. Laser confocal microscopic analyses revealed that both the Arg-rich and Lys-rich histones associated with the surface of S. aureus. However, while the morphology of S. aureus treated with histone H2B appeared intact, those treated with the histones H3 and H4 closely resembled each other, and the cells were blurred. Electrophoretic mobility shift assay results revealed these histones have binding affinity to lipoteichoic acid (LTA), one of major cell surface components of Gram-positive bacteria. Scanning electron microscopic analyses demonstrated that while histone H2B elicited no obvious changes in cell morphology, histones H3 and H4 disrupted the cell membrane structure with bleb formation in a manner similar to general antimicrobial peptides. Consequently, our results suggest that bacterial cell surface LTA initially attracts both the Arg- and Lys-rich histones, but the modes of antimicrobial action of these histones are different; the former involves cell membrane disruption and the latter involves the cell integrity disruption. Copyright © 2013 Elsevier Inc. All rights reserved.

  3. Automatic sequences

    CERN Document Server

    Haeseler, Friedrich

    2003-01-01

    Automatic sequences are sequences which are produced by a finite automaton. Although they are not random they may look as being random. They are complicated, in the sense of not being not ultimately periodic, they may look rather complicated, in the sense that it may not be easy to name the rule by which the sequence is generated, however there exists a rule which generates the sequence. The concept automatic sequences has special applications in algebra, number theory, finite automata and formal languages, combinatorics on words. The text deals with different aspects of automatic sequences, in particular:· a general introduction to automatic sequences· the basic (combinatorial) properties of automatic sequences· the algebraic approach to automatic sequences· geometric objects related to automatic sequences.

  4. Sequence assembly

    DEFF Research Database (Denmark)

    Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria

    2009-01-01

    Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies...

  5. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  6. Dna Sequencing

    Science.gov (United States)

    Tabor, Stanley; Richardson, Charles C.

    1995-04-25

    A method for sequencing a strand of DNA, including the steps off: providing the strand of DNA; annealing the strand with a primer able to hybridize to the strand to give an annealed mixture; incubating the mixture with four deoxyribonucleoside triphosphates, a DNA polymerase, and at least three deoxyribonucleoside triphosphates in different amounts, under conditions in favoring primer extension to form nucleic acid fragments complementory to the DNA to be sequenced; labelling the nucleic and fragments; separating them and determining the position of the deoxyribonucleoside triphosphates by differences in the intensity of the labels, thereby to determine the DNA sequence.

  7. Heterologous expression of an SK2-type dehydrin gene (MrDHN3)from Medicago ruthenica enhances Escherichia coli tolerance under salt and high temperature stress%扁蓿豆 SK2型脱水素基因 MrDHN3的异源表达提高大肠杆菌对盐和高温胁迫的抗性

    Institute of Scientific and Technical Information of China (English)

    沈迎芳; 马超; 吴小培; 张业猛; 王海庆

    2016-01-01

    扁蓿豆为高原高寒地区优质豆科牧草,具有极强的抗旱、耐寒、抗盐碱的能力。脱水素(DHNs)是参与植物逆境应答的一类蛋白。根据前期 RNA-seq 的结果,从扁蓿豆幼苗中克隆到一个编码脱水素的基因 MrDHN3。序列分析显示该基因含666 bp 的开放阅读框,编码221个氨基酸,为一个 SK2型酸性脱水蛋白。氨基酸序列比对结果表明,MrDHN3与豆科植物白三叶和蒺藜苜蓿相似性最高,达83%。实时荧光定量 PCR 结果显示,MrDHN3基因受脱水、低温、高盐和脱落酸处理诱导表达,表明 MrDHN3参与了扁蓿豆的非生物胁迫响应。通过构建原核表达载体,在大肠杆菌中过表达 MrDHN3蛋白,检测重组菌在盐和高温胁迫处理下的生长存活情况。结果发现,在0.5 mol/L NaCl 和0.5 mol/L KCl 高盐胁迫条件下,重组大肠杆菌的存活率明显高于对照菌株;在55℃高温胁迫条件下,转化大肠杆菌的生长状态明显优于对照。表明 MrDHN3对盐和高温引起的细胞损伤具有保护作用。为今后作物抗逆性遗传改良的研究提供了有用信息。%Medicago ruthenica ,is an excellent legume in highland and cold regions,and is highly resistant to drought,cold and high salinity.Dehydrins (DHNs)are stress proteins involved in plant protective reactions a-gainst environmental stress.According to our previous RNA-sequence data,a DHN gene,MrDHN3,was cloned from young seedlings of M.ruthenica.Sequence analyses showed that the MrDHN3 gene contained a 666 bp open reading frame,putatively translated to 221 amino acids,and was an SK2-type acidic DHN.Amino acid sequence alignment showed that MrDHN3 shared the highest similarity (83%)with TrDHN3 and Mt-DHN3.Quantitative RT-PCR analysis showed that the expression of MrDHN3 was induced by dehydration, cold,high salinity stress and abscisic acid (ABA),which suggests that MrDHN3 is involved in abiotic stress responses

  8. Main: Sequences [KOME

    Lifescience Database Archive (English)

    Full Text Available Sequences Nucleotide Sequence Nucleotide sequence of full length cDNA (trimmed sequence) kome_ine_full_seq...uence_db.fasta.zip kome_ine_full_sequence_db.zip kome_ine_full_sequence_db ...

  9. Antimicrobial properties of arginine- and lysine-rich histones and involvement of bacterial outer membrane protease T in their differential mode of actions.

    Science.gov (United States)

    Tagai, Chihiro; Morita, Shuu; Shiraishi, Takayuki; Miyaji, Kazuyuki; Iwamuro, Shawichi

    2011-10-01

    There is growing evidence of the antimicrobial properties of histones and histone-derived peptides; however, most of them are specific to lysine (Lys)-rich histones (H1, H2A, and H2B). In the present study, we focused on arginine (Arg)-rich histones (H3 and H4) and investigated their antimicrobial properties in comparison with those of histone H2B. In a standard microdilution assay, calf thymus histones H2B, H3, and H4 showed growth inhibitory activity against the bacterial outer membrane protease T (OmpT) gene-expressing Escherichia coli strain JCM5491 with calculated 50% growth inhibitory concentrations of 3.8, 10, and 12.7 μM, respectively. A lysate prepared from the JCM5491 cells was capable of strongly, moderately, and slightly fragmenting histones H2B, H3, and H4, respectively. While the lysate prepared from the cells of the ompT-deleted E. coli strain BL21(DE3) did not digest these histones, the ompT-transformed BL21(DE3), termed BL21/OmpT(+), cell lysate digested the histones more strongly than the JCM5491 cell lysate. Laser confocal and scanning electron microscopic analyses demonstrated that while histone H2B penetrated the cell membrane of JCM5491 or BL21/OmpT(+) cells, histones H3 and H4 remained on the cell surface and subsequently disrupted the cell membrane structure with bleb formation in a manner similar to general antimicrobial peptides. The BL21(DE3) cells treated with each histone showed no bleb formation, but cell integrity was affected and the cell surface was corrugated. Consequently, it is suggested that OmpT is involved in the antimicrobial properties of Arg- and Lys-rich histones and that the modes of antimicrobial action of these histones are different. Copyright © 2011 Elsevier Inc. All rights reserved.

  10. The Concentration and Yield of Hordein and some Lysine-Rich Proteins as Influenced by the lys gene of Hiproly Barley

    DEFF Research Database (Denmark)

    Balasaraswathi, R.; Køie, B; Doll, Hans

    1984-01-01

    the corresponding normal gene. The lys lines had 16% lower grain yield and 13% lower single seed weight than the corresponding normal lines. The concentration of hordein in the protein and the yield of this storage protein were strongly reduced in the lys lines. On the contrary, the concentration of protein Z, β...

  11. Controlling noncovalent interactions between a lysine-rich α-helical peptide and self-assembled monolayers of alkanethiols on Au through functional group diversity

    Science.gov (United States)

    Raigoza, Annette F.; Onyirioha, Kristeen; Webb, Lauren J.

    2017-02-01

    Reliably attaching a structured biomolecule to an inorganic substrate would enable the preparation of surfaces that incorporate both biological and inorganic functions and structures. To this end, we have previously developed a procedure using the copper(I)-catalyzed click reaction to tether synthetic α-helical peptides carrying two alkyne groups to well-ordered alkanethiol self-assembled monolayers (SAM) on a Au(111) surface, in which the SAM is composed of a mixture of methyl and azide termination. Proteins, however, are composed of many diverse functional groups, and this composition directly effects protein structure, interactions, and reactivity. Here, we explore the utility of mixed SAMs with alternative terminating functional groups to tune and direct the reactivity of the surface through noncovalent peptide-surface interactions. We study both polar surfaces (OH-terminated) and charged surfaces (COOH- and NH3-terminated, which are negatively and positively charged, respectively, under our reaction conditions). Surfaces were functionalized with a bipolar peptide composed of Lys and Leu residues that could express different interactions through either hydrophilic and/or charge (Lys) or hydrophobic (Leu) influences. X-ray photoelectron spectroscopy (XPS) and surface infrared spectroscopy were used to characterize surfaces at all stages of the peptide functionalization procedure. This strategy resulted in a high density of surface-bound α-helices without aggregation. Mixed SAMs that included a positively charged alkanethiol along with the azide-terminated thiol resulted in a more efficient reaction and better alignment of the peptide with the azide on the surface. Negatively charged surfaces increased physisorption of the peptide, which was then removed during sample rinsing. This work demonstrates that varying easily controlled chemical inputs during the functionalization steps allows the reaction conditions to be balanced for the chemical needs of a particular biomolecule or substrate.

  12. Role of early glycation Amadori products of lysine-rich proteins in the production of autoantibodies in diabetes type 2 patients.

    Science.gov (United States)

    Ansari, Nadeem Ahmad; Moinuddin; Mir, Abdul Rouf; Habib, Safia; Alam, Khursheed; Ali, Asif; Khan, Rizwan Hasan

    2014-11-01

    In diabetes, protein glycation mostly occurs at intrachain lysine residues resulting in the formation of early stage Amadori products which are finally converted to advance glycation end products (AGEs). Several studies have reported autoantibodies against AGEs in diabetes but not much data are found in respect of Amadori products. In this study, poly-L-lysine (PLL) was glycated with 50 mM glucose and the resultant Amadori products were estimated by fructosamine or nitroblue tetrazolium assay. We report high content of Amadori products in PLL upon glycation. Glycated PLL showed marked hyperchromicity in the UV spectrum, ellipticity changes in CD spectroscopy, and variations in ε-methylene protons shift in NMR. It was better recognized by autoantibodies in type 2 diabetics compared to the native PLL. Induced antibodies against glycated PLL were successfully used to probe early glycation in the IgG isolated from diabetes type 2 patients. Role of Amadori products of glycated proteins in the induction of autoantibodies in type 2 diabetes as well as in associated secondary complications has been discussed.

  13. Classifying Genomic Sequences by Sequence Feature Analysis

    Institute of Scientific and Technical Information of China (English)

    Zhi-Hua Liu; Dian Jiao; Xiao Sun

    2005-01-01

    Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream,exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis.

  14. Main: Sequences [KOME

    Lifescience Database Archive (English)

    Full Text Available Sequences Amino Acid Sequence Amino Acid sequence of full length cDNA (Longest ORF) kome_ine_full_seq...uence_amino_db.fasta.zip kome_ine_full_sequence_amino_db.zip kome_ine_full_sequence_amino_db ...

  15. Shotgun protein sequencing.

    Energy Technology Data Exchange (ETDEWEB)

    Faulon, Jean-Loup Michel; Heffelfinger, Grant S.

    2009-06-01

    A novel experimental and computational technique based on multiple enzymatic digestion of a protein or protein mixture that reconstructs protein sequences from sequences of overlapping peptides is described in this SAND report. This approach, analogous to shotgun sequencing of DNA, is to be used to sequence alternative spliced proteins, to identify post-translational modifications, and to sequence genetically engineered proteins.

  16. Sequence Read Archive (SRA)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Sequence Read Archive (SRA) stores raw sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome...

  17. Multimodal sequence learning.

    Science.gov (United States)

    Kemény, Ferenc; Meier, Beat

    2016-02-01

    While sequence learning research models complex phenomena, previous studies have mostly focused on unimodal sequences. The goal of the current experiment is to put implicit sequence learning into a multimodal context: to test whether it can operate across different modalities. We used the Task Sequence Learning paradigm to test whether sequence learning varies across modalities, and whether participants are able to learn multimodal sequences. Our results show that implicit sequence learning is very similar regardless of the source modality. However, the presence of correlated task and response sequences was required for learning to take place. The experiment provides new evidence for implicit sequence learning of abstract conceptual representations. In general, the results suggest that correlated sequences are necessary for implicit sequence learning to occur. Moreover, they show that elements from different modalities can be automatically integrated into one unitary multimodal sequence.

  18. Coordinate cytokine regulatory sequences

    Science.gov (United States)

    Frazer, Kelly A.; Rubin, Edward M.; Loots, Gabriela G.

    2005-05-10

    The present invention provides CNS sequences that regulate the cytokine gene expression, expression cassettes and vectors comprising or lacking the CNS sequences, host cells and non-human transgenic animals comprising the CNS sequences or lacking the CNS sequences. The present invention also provides methods for identifying compounds that modulate the functions of CNS sequences as well as methods for diagnosing defects in the CNS sequences of patients.

  19. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  20. Contamination of sequence databases with adaptor sequences

    Energy Technology Data Exchange (ETDEWEB)

    Yoshikawa, Takeo; Sanders, A.R.; Detera-Wadleigh, S.D. [National Institute of Mental Health, Bethesda, MD (United States)

    1997-02-01

    Because of the exponential increase in the amount of DNA sequences being added to the public databases on a daily basis, it has become imperative to identify sources of contamination rapidly. Previously, contaminations of sequence databases have been reported to alert the scientific community to the problem. These contaminations can be divided into two categories. The first category comprises host sequences that have been difficult for submitters to manage or control. Examples include anomalous sequences derived from Escherichia coli, which are inserted into the chromosomes (and plasmids) of the bacterial hosts. Insertion sequences are highly mobile and are capable of transposing themselves into plasmids during cloning manipulation. Another example of the first category is the infection with yeast genomic DNA or with bacterial DNA of some commercially available cDNA libraries from Clontech. The second category of database contamination is due to the inadvertent inclusion of nonhost sequences. This category includes incorporation of cloning-vector sequences and multicloning sites in the database submission. M13-derived artifacts have been common, since M13-based vectors have been widely used for subcloning DNA fragments. Recognizing this problem, the National Center for Biotechnology Information (NCBI) started to screen, in April 1994, all sequences directly submitted to GenBank, against a set of vector data retrieved from GenBank by use of key-word searches, such as {open_quotes}vector.{close_quotes} In this report, we present evidence for another sequence artifact that is widespread but that, to our knowledge, has not yet been reported. 11 refs., 1 tab.

  1. Automated DNA Sequencing System

    Energy Technology Data Exchange (ETDEWEB)

    Armstrong, G.A.; Ekkebus, C.P.; Hauser, L.J.; Kress, R.L.; Mural, R.J.

    1999-04-25

    Oak Ridge National Laboratory (ORNL) is developing a core DNA sequencing facility to support biological research endeavors at ORNL and to conduct basic sequencing automation research. This facility is novel because its development is based on existing standard biology laboratory equipment; thus, the development process is of interest to the many small laboratories trying to use automation to control costs and increase throughput. Before automation, biology Laboratory personnel purified DNA, completed cycle sequencing, and prepared 96-well sample plates with commercially available hardware designed specifically for each step in the process. Following purification and thermal cycling, an automated sequencing machine was used for the sequencing. A technician handled all movement of the 96-well sample plates between machines. To automate the process, ORNL is adding a CRS Robotics A- 465 arm, ABI 377 sequencing machine, automated centrifuge, automated refrigerator, and possibly an automated SpeedVac. The entire system will be integrated with one central controller that will direct each machine and the robot. The goal of this system is to completely automate the sequencing procedure from bacterial cell samples through ready-to-be-sequenced DNA and ultimately to completed sequence. The system will be flexible and will accommodate different chemistries than existing automated sequencing lines. The system will be expanded in the future to include colony picking and/or actual sequencing. This discrete event, DNA sequencing system will demonstrate that smaller sequencing labs can achieve cost-effective the laboratory grow.

  2. Anomaly Detection in Sequences

    Data.gov (United States)

    National Aeronautics and Space Administration — We present a set of novel algorithms which we call sequenceMiner, that detect and characterize anomalies in large sets of high-dimensional symbol sequences that...

  3. DNA sequencing conference, 2

    Energy Technology Data Exchange (ETDEWEB)

    Cook-Deegan, R.M. [Georgetown Univ., Kennedy Inst. of Ethics, Washington, DC (United States); Venter, J.C. [National Inst. of Neurological Disorders and Strokes, Bethesda, MD (United States); Gilbert, W. [Harvard Univ., Cambridge, MA (United States); Mulligan, J. [Stanford Univ., CA (United States); Mansfield, B.K. [Oak Ridge National Lab., TN (United States)

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  4. Roles of repetitive sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bell, G.I.

    1991-12-31

    The DNA of higher eukaryotes contains many repetitive sequences. The study of repetitive sequences is important, not only because many have important biological function, but also because they provide information on genome organization, evolution and dynamics. In this paper, I will first discuss some generic effects that repetitive sequences will have upon genome dynamics and evolution. In particular, it will be shown that repetitive sequences foster recombination among, and turnover of, the elements of a genome. I will then consider some examples of repetitive sequences, notably minisatellite sequences and telomere sequences as examples of tandem repeats, without and with respectively known function, and Alu sequences as an example of interspersed repeats. Some other examples will also be considered in less detail.

  5. Roles of repetitive sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bell, G.I.

    1991-12-31

    The DNA of higher eukaryotes contains many repetitive sequences. The study of repetitive sequences is important, not only because many have important biological function, but also because they provide information on genome organization, evolution and dynamics. In this paper, I will first discuss some generic effects that repetitive sequences will have upon genome dynamics and evolution. In particular, it will be shown that repetitive sequences foster recombination among, and turnover of, the elements of a genome. I will then consider some examples of repetitive sequences, notably minisatellite sequences and telomere sequences as examples of tandem repeats, without and with respectively known function, and Alu sequences as an example of interspersed repeats. Some other examples will also be considered in less detail.

  6. sequenceMiner algorithm

    Data.gov (United States)

    National Aeronautics and Space Administration — Detecting and describing anomalies in large repositories of discrete symbol sequences. sequenceMiner has been open-sourced! Download the file below to try it out....

  7. Enhanced virome sequencing using targeted sequence capture.

    Science.gov (United States)

    Wylie, Todd N; Wylie, Kristine M; Herter, Brandi N; Storch, Gregory A

    2015-12-01

    Metagenomic shotgun sequencing (MSS) is an important tool for characterizing viral populations. It is culture independent, requires no a priori knowledge of the viruses in the sample, and may provide useful genomic information. However, MSS can lack sensitivity and may yield insufficient data for detailed analysis. We have created a targeted sequence capture panel, ViroCap, designed to enrich nucleic acid from DNA and RNA viruses from 34 families that infect vertebrate hosts. A computational approach condensed ∼1 billion bp of viral reference sequence into <200 million bp of unique, representative sequence suitable for targeted sequence capture. We compared the effectiveness of detecting viruses in standard MSS versus MSS following targeted sequence capture. First, we analyzed two sets of samples, one derived from samples submitted to a diagnostic virology laboratory and one derived from samples collected in a study of fever in children. We detected 14 and 18 viruses in the two sets, comprising 19 genera from 10 families, with dramatic enhancement of genome representation following capture enrichment. The median fold-increases in percentage viral reads post-capture were 674 and 296. Median breadth of coverage increased from 2.1% to 83.2% post-capture in the first set and from 2.0% to 75.6% in the second set. Next, we analyzed samples containing a set of diverse anellovirus sequences and demonstrated that ViroCap could be used to detect viral sequences with up to 58% variation from the references used to select capture probes. ViroCap substantially enhances MSS for a comprehensive set of viruses and has utility for research and clinical applications.

  8. DNA sequences encoding erythropoietin

    Energy Technology Data Exchange (ETDEWEB)

    Lin, F.K.

    1987-10-27

    A purified and isolated DNA sequence is described consisting essentially of a DNA sequence encoding a polypeptide having an amino acid sequence sufficiently duplicative of that of erythropoietin to allow possession of the biological property of causing bone marrow cells to increase production of reticulocytes and red blood cells, and to increase hemoglobin synthesis or iron uptake.

  9. Low autocorrelation binary sequences

    Science.gov (United States)

    Packebusch, Tom; Mertens, Stephan

    2016-04-01

    Binary sequences with minimal autocorrelations have applications in communication engineering, mathematics and computer science. In statistical physics they appear as groundstates of the Bernasconi model. Finding these sequences is a notoriously hard problem, that so far can be solved only by exhaustive search. We review recent algorithms and present a new algorithm that finds optimal sequences of length N in time O(N {1.73}N). We computed all optimal sequences for N≤slant 66 and all optimal skewsymmetric sequences for N≤slant 119.

  10. Repdigits in -Lucas Sequences

    Indian Academy of Sciences (India)

    Jhon J J Bravo; Florian Luca

    2014-05-01

    For an integer ≥ 2, let $(L_n^{(k)})_n$ be the -Lucas sequence which starts with $0,\\ldots,0,2,1$ ( terms) and each term afterwards is the sum of the preceding terms. In 2000, Luca (Port. Math. 57(2) 2000 243-254) proved that 11 is the largest number with only one distinct digit (the so-called repdigit) in the sequence $(L_n^{(2)})_n$. In this paper, we address a similar problem in the family of -Lucas sequences. We also show that the -Lucas sequences have similar properties to those of -Fibonacci sequences and occur in formulae simultaneously with the latter.

  11. On Maximal Green Sequences

    CERN Document Server

    Brüstle, Thomas; Pérotin, Matthieu

    2012-01-01

    Maximal green sequences are particular sequences of quiver mutations which were introduced by Keller in the context of quantum dilogarithm identities and independently by Cecotti-Cordova-Vafa in the context of supersymmetric gauge theory. Our aim is to initiate a systematic study of these sequences from a combinatorial point of view. Interpreting maximal green sequences as paths in various natural posets arising in representation theory, we prove the finiteness of the number of maximal green sequences for cluster finite quivers, affine quivers and acyclic quivers with at most three vertices. We also give results concerning the possible numbers and lengths of these maximal green sequences. Finally we describe an algorithm for computing maximal green sequences for arbitrary valued quivers which we used to obtain numerous explicit examples that we present.

  12. Next-generation sequencing

    DEFF Research Database (Denmark)

    Rieneck, Klaus; Bak, Mads; Jønson, Lars

    2013-01-01

    the feasibility of predicting the fetal KEL1 phenotype using next-generation sequencing (NGS) technology. STUDY DESIGN AND METHODS: The KEL1/2 single-nucleotide polymorphism was polymerase chain reaction (PCR) amplified with one adjoining base, and the PCR product was sequenced using a genome analyzer (GAIIx......, Illumina); several millions of PCR sequences were analyzed. RESULTS: The results demonstrated the feasibility of diagnosing the fetal KEL1 or KEL2 blood group from cell-free DNA purified from maternal plasma. CONCLUSION: This method requires only one primer pair, and the large amount of sequence...

  13. Hardware bitstream sequence recognizer

    OpenAIRE

    Karpin, Oleksandr; Sokil, Volodymyr

    2009-01-01

    This paper describes how to implement in hardware a bistream sequence recognizer using the PSoC Pseudo Random Sequence Generator (PRS) User Module. The PRS can be used in digital communication systems with the serial data interface for automatic preamble detection and extraction, control words selection, etc.

  14. Cosmetology: Scope and Sequence.

    Science.gov (United States)

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a cosmetology vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  15. DNA sequencing by CE.

    Science.gov (United States)

    Karger, Barry L; Guttman, András

    2009-06-01

    Sequencing of human and other genomes has been at the center of interest in the biomedical field over the past several decades and is now leading toward an era of personalized medicine. During this time, DNA-sequencing methods have evolved from the labor-intensive slab gel electrophoresis, through automated multiCE systems using fluorophore labeling with multispectral imaging, to the "next-generation" technologies of cyclic-array, hybridization based, nanopore and single molecule sequencing. Deciphering the genetic blueprint and follow-up confirmatory sequencing of Homo sapiens and other genomes were only possible with the advent of modern sequencing technologies that were a result of step-by-step advances with a contribution of academics, medical personnel and instrument companies. While next-generation sequencing is moving ahead at breakneck speed, the multicapillary electrophoretic systems played an essential role in the sequencing of the Human Genome, the foundation of the field of genomics. In this prospective, we wish to overview the role of CE in DNA sequencing based in part of several of our articles in this journal.

  16. Sequencing the maize genome.

    Science.gov (United States)

    Martienssen, Robert A; Rabinowicz, Pablo D; O'Shaughnessy, Andrew; McCombie, W Richard

    2004-04-01

    Sequencing of complex genomes can be accomplished by enriching shotgun libraries for genes. In maize, gene-enrichment by copy-number normalization (high C(0)t) and methylation filtration (MF) have been used to generate up to two-fold coverage of the gene-space with less than 1 million sequencing reads. Simulations using sequenced bacterial artificial chromosome (BAC) clones predict that 5x coverage of gene-rich regions, accompanied by less than 1x coverage of subclones from BAC contigs, will generate high-quality mapped sequence that meets the needs of geneticists while accommodating unusually high levels of structural polymorphism. By sequencing several inbred strains, we propose a strategy for capturing this polymorphism to investigate hybrid vigor or heterosis.

  17. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    OpenAIRE

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; AKIYAMA, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto seque...

  18. HIV Sequence Compendium 2015

    Energy Technology Data Exchange (ETDEWEB)

    Foley, Brian Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas Kenneth [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Cristian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Pennsylvania, Philadelphia, PA (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette Tina Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  19. Phylogenetic Trees From Sequences

    Science.gov (United States)

    Ryvkin, Paul; Wang, Li-San

    In this chapter, we review important concepts and approaches for phylogeny reconstruction from sequence data.We first cover some basic definitions and properties of phylogenetics, and briefly explain how scientists model sequence evolution and measure sequence divergence. We then discuss three major approaches for phylogenetic reconstruction: distance-based phylogenetic reconstruction, maximum parsimony, and maximum likelihood. In the third part of the chapter, we review how multiple phylogenies are compared by consensus methods and how to assess confidence using bootstrapping. At the end of the chapter are two sections that list popular software packages and additional reading.

  20. Yeast genome sequencing:

    DEFF Research Database (Denmark)

    Piskur, Jure; Langkjær, Rikke Breinhold

    2004-01-01

    For decades, unicellular yeasts have been general models to help understand the eukaryotic cell and also our own biology. Recently, over a dozen yeast genomes have been sequenced, providing the basis to resolve several complex biological questions. Analysis of the novel sequence data has shown...... of closely related species helps in gene annotation and to answer how many genes there really are within the genomes. Analysis of non-coding regions among closely related species has provided an example of how to determine novel gene regulatory sequences, which were previously difficult to analyse because...... they are short and degenerate and occupy different positions. Comparative genomics helps to understand the origin of yeasts and points out crucial molecular events in yeast evolutionary history, such as whole-genome duplication and horizontal gene transfer(s). In addition, the accumulating sequence data provide...

  1. Scope and Sequence.

    Science.gov (United States)

    Callison, Daniel

    2002-01-01

    Discusses scope and sequence plans for curriculum coordination in elementary and secondary education related to school libraries. Highlights include library skills; levels of learning objectives; technology skills; media literacy skills; and information inquiry skills across disciplines by grade level. (LRW)

  2. Evolution of DNA sequencing

    National Research Council Canada - National Science Library

    Tipu, Hamid Nawaz; Shabbir, Ambreen

    2015-01-01

    Sanger and coworkers introduced DNA sequencing in 1970s for the first time. It principally relied on termination of growing nucleotide chain when a dideoxythymidine triphosphate (ddTTP) was inserted...

  3. Pierre Robin sequence

    Science.gov (United States)

    Pierre Robin syndrome; Pierre Robin complex; Pierre Robin anomaly ... The exact causes of Pierre Robin sequence are unknown. It may be part of many genetic syndromes. The lower jaw develops slowly before birth, but may grow ...

  4. In Favor of Sequencing?

    NARCIS (Netherlands)

    van der Borgh, G.J.C.

    2014-01-01

    This short article is a contribution to an online discussion about political sequencing and stability. It argues that despite all the risks of democratization in fragile states,a more gradual approach should be preferred.

  5. Gomphid DNA sequence data

    Data.gov (United States)

    U.S. Environmental Protection Agency — DNA sequence data for several genetic loci. This dataset is not publicly accessible because: It's already publicly available on GenBank. It can be accessed through...

  6. Text Mining: (Asynchronous Sequences

    Directory of Open Access Journals (Sweden)

    Sheema Khan

    2014-12-01

    Full Text Available In this paper we tried to correlate text sequences those provides common topics for semantic clues. We propose a two step method for asynchronous text mining. Step one check for the common topics in the sequences and isolates these with their timestamps. Step two takes the topic and tries to give the timestamp of the text document. After multiple repetitions of step two, we could give optimum result.

  7. Malaria Genome Sequencing Project

    Science.gov (United States)

    2004-01-01

    million cases and up to 2.7 million A whole chromosome shotgun sequencing strategy was used to deaths from malaria each year. The mortality levels are...deaths from malaria each year. The mortality levels are greatest in determine the genome sequence of P. falciparum clone 3D7. This sub-Saharan Africa...aminolevulinic acid dehydratase. Cura . Genet. 40, 391-398 (2002). 15. Lasonder, E. et al Analysis of the Plasmodium falciparum proteome by high-accuracy mass

  8. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene......This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis...

  9. Genome sequencing conference II

    Energy Technology Data Exchange (ETDEWEB)

    1990-01-01

    Genome Sequencing Conference 2 was held September 30 to October 30, 1990. 26 speaker abstracts and 33 poster presentations were included in the program report. New and improved methods for DNA sequencing and genetic mapping were presented. Many of the papers were concerned with accuracy and speed of acquisition of data with computers and automation playing an increasing role. Individual papers have been processed separately for inclusion on the database.

  10. HIV Sequence Compendium 2010

    Energy Technology Data Exchange (ETDEWEB)

    Kuiken, Carla [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Foley, Brian [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Leitner, Thomas [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Apetrei, Christian [Univ. of Pittsburgh, PA (United States); Hahn, Beatrice [Univ. of Alabama, Tuscaloosa, AL (United States); Mizrachi, Ilene [National Center for Biotechnology Information, Bethesda, MD (United States); Mullins, James [Univ. of Washington, Seattle, WA (United States); Rambaut, Andrew [Univ. of Edinburgh, Scotland (United Kingdom); Wolinsky, Steven [Northwestern Univ., Evanston, IL (United States); Korber, Bette [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  11. Adaptive Processing for Sequence Alignment

    KAUST Repository

    Zidan, Mohammed Affan

    2012-01-26

    Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.

  12. Controlled processing during sequencing.

    Science.gov (United States)

    Thothathiri, Malathi; Rattinger, Michelle

    2015-01-01

    Longstanding evidence has identified a role for the frontal cortex in sequencing within both linguistic and non-linguistic domains. More recently, neuropsychological studies have suggested a specific role for the left premotor-prefrontal junction (BA 44/6) in selection between competing alternatives during sequencing. In this study, we used neuroimaging with healthy adults to confirm and extend knowledge about the neural correlates of sequencing. Participants reproduced visually presented sequences of syllables and words using manual button presses. Items in the sequence were presented either consecutively or concurrently. Concurrent presentation is known to trigger the planning of multiple responses, which might compete with one another. Therefore, we hypothesized that regions involved in controlled processing would show greater recruitment during the concurrent than the consecutive condition. Whole-brain analysis showed concurrent > consecutive activation in sensory, motor and somatosensory cortices and notably also in rostral-dorsal anterior cingulate cortex. Region of interest analyses showed increased activation within left BA 44/6 and correlation between this region's activation and behavioral response times. Functional connectivity analysis revealed increased connectivity between left BA 44/6 and the posterior lobe of the cerebellum during the concurrent than the consecutive condition. These results corroborate recent evidence and demonstrate the involvement of BA 44/6 and other control regions when ordering co-activated representations.

  13. Controlled processing during sequencing

    Directory of Open Access Journals (Sweden)

    Malathi eThothathiri

    2015-10-01

    Full Text Available Longstanding evidence has identified a role for the frontal cortex in sequencing within both linguistic and non-linguistic domains. More recently, neuropsychological studies have suggested a specific role for the left premotor-prefrontal junction (BA 44/6 in selection between competing alternatives during sequencing. In this study, we used neuroimaging with healthy adults to confirm and extend knowledge about the neural correlates of sequencing. Participants reproduced visually presented sequences of syllables and words using manual button presses. Items in the sequence were presented either consecutively or concurrently. Concurrent presentation is known to trigger the planning of multiple responses, which might compete with one another. Therefore, we hypothesized that regions involved in controlled processing would show greater recruitment during the concurrent than the consecutive condition. Whole-brain analysis showed concurrent > consecutive activation in sensory, motor and somatosensory cortices and notably also in rostral-dorsal anterior cingulate cortex (ACC. Region of interest analyses showed increased activation within left BA 44/6 and correlation between this region’s activation and behavioral response times. Functional connectivity analysis revealed increased connectivity between left BA 44/6 and the posterior lobe of the cerebellum during the concurrent than the consecutive condition. These results corroborate recent evidence and demonstrate the involvement of BA 44/6 and other control regions when ordering co-activated representations.

  14. Program Synthesizes UML Sequence Diagrams

    Science.gov (United States)

    Barry, Matthew R.; Osborne, Richard N.

    2006-01-01

    A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.

  15. Sequencing BPS Spectra

    CERN Document Server

    Gukov, Sergei; Saberi, Ingmar; Stosic, Marko; Sulkowski, Piotr

    2015-01-01

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincar\\'e polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (refined) modular $S$-matrix. This leads to the identifi...

  16. Next-generation sequencing

    DEFF Research Database (Denmark)

    Rieneck, Klaus; Bak, Mads; Jønson, Lars

    2013-01-01

    information obtained allows well for statistical analysis of the data. This general approach can be integrated into current laboratory practice and has numerous applications. Besides DNA-based predictions of blood group phenotypes, platelet phenotypes, or sickle cell anemia, and the determination of zygosity......, Illumina); several millions of PCR sequences were analyzed. RESULTS: The results demonstrated the feasibility of diagnosing the fetal KEL1 or KEL2 blood group from cell-free DNA purified from maternal plasma. CONCLUSION: This method requires only one primer pair, and the large amount of sequence...

  17. Family Sequencing and Cooperation

    NARCIS (Netherlands)

    Grundel, S.; Ciftci, B.B.; Borm, P.E.M.; Hamers, H.J.M.

    2012-01-01

    To analyze the allocation problem of the maximal cost savings of the whole group of jobs, we define and analyze a so-called corresponding cooperative family sequencing game which explicitly takes into account the maximal cost savings for any coalition of jobs. Using nonstandard techniques we prove t

  18. Twin anemia polycythemia sequence

    NARCIS (Netherlands)

    Slaghekke, Femke

    2014-01-01

    In this thesis we describe that Twin Anemia Polycythemia Sequence (TAPS) is a form of chronic feto-fetal transfusion in monochorionic (identical) twins based on a small amount of blood transfusion through very small anastomoses. For the antenatal diagnosis of TAPS, Middle Cerebral Artery – Peak Syst

  19. Twin anemia polycythemia sequence

    NARCIS (Netherlands)

    Slaghekke, Femke

    2014-01-01

    In this thesis we describe that Twin Anemia Polycythemia Sequence (TAPS) is a form of chronic feto-fetal transfusion in monochorionic (identical) twins based on a small amount of blood transfusion through very small anastomoses. For the antenatal diagnosis of TAPS, Middle Cerebral Artery – Peak

  20. Biological sequence analysis

    DEFF Research Database (Denmark)

    Durbin, Richard; Eddy, Sean; Krogh, Anders Stærmose

    This book provides an up-to-date and tutorial-level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Discussed methods include pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogene...

  1. Allele Re-sequencing Technologies

    DEFF Research Database (Denmark)

    Byrne, Stephen; Farrell, Jacqueline Danielle; Asp, Torben

    2013-01-01

    The development of next-generation sequencing technologies has made sequencing an affordable approach for detection of genetic variations associated with various traits. However, the cost of whole genome re-sequencing still remains too high to be feasible for many plant species with large...... alternative to whole genome re-sequencing to identify causative genetic variations in plants. One challenge, however, will be efficient bioinformatics strategies for data handling and analysis from the increasing amount of sequence information....

  2. Rapid-Sequence Intubation

    Directory of Open Access Journals (Sweden)

    Evangelina Dávila Cabo de Villa

    2015-09-01

    Full Text Available In medical practice there are several situations that require immediate intervention of the airway in some patients, in order to ensure proper entrance and exit of gases into and out of the lungs and prevent aspiration. Rapid-sequence intubation has been considered as the administration of a hypnotic agent and a neuromuscular relaxant consecutively (virtually simultaneously to facilitate orotracheal intubation in critically ill patients and minimize the risk of aspiration. This paper aims to collect elements that promote a successful medical management according to the situation presented, since there is no single way of proceeding in case of rapid-sequence intubation. The elements to consider include: knowing the anatomy of the upper respiratory tract, having a group of drugs to choose from, receiving adequate training and having an alternative plan for the difficulties that may arise.

  3. Sequence Classification: 885394 [

    Lifescience Database Archive (English)

    Full Text Available 703); The expression pattern of this gene is described in PMID:12000842; possible frameshift detected when compared...Non-TMB TMH Non-TMB Non-TMB Non-TMB Non-TMB >gi|23619146|ref|NP_705108.1| Slight di...fference exist when compared to the published sequence of EBL-1 from Dd2 strain of P. falciparum (PMID:10613

  4. Sequencing of aromatase inhibitors

    OpenAIRE

    2005-01-01

    Since the development of the third-generation aromatase inhibitors (AIs), anastrozole, letrozole and exemestane, these agents have been the subject of intensive research to determine their optimal use in advanced breast cancer. Not only have they replaced progestins in second-line therapy and challenged the role of tamoxifen in first-line, but there is also evidence for a lack of cross-resistance between the steroidal and nonsteroidal AIs, meaning that they may be used in sequence to obtain p...

  5. Properties of Semijoin Sequences

    Institute of Scientific and Technical Information of China (English)

    BengC.Ooi; B.Srinivasan

    1989-01-01

    The problem of finding optimum semijoin sequ4ence of an arbitrary query under linear cost function for the transmission cost is NP.hard.Hence heuristic algorithms with desirable properties are explored.In this paper four properties of semijoin programs for distributed query processing are identified,The use of these properties in constructing semijoin sequence is justified.An existing algorithm is modified incorporating these properties.Empirical comparison with existing algorithms shows the superiority of the proposed algorithm.

  6. Learning Sequence Neighbourhood Metrics

    CERN Document Server

    Bayer, Justin; van der Smagt, Patrick

    2011-01-01

    Recurrent neural networks (RNNs) in combination with a pooling operator and the neighbourhood components analysis (NCA) objective function are able to detect the characterizing dynamics of sequences and embed them into a fixed-length vector space of arbitrary dimensionality. Subsequently, the resulting features are meaningful and can be used for visualization or nearest neighbour classification in linear time. This kind of metric learning for sequential data enables the use of algorithms tailored towards fixed length vector spaces such as R^n.

  7. Sequencing BPS spectra

    Science.gov (United States)

    Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; Stošić, Marko; Sułkowski, Piotr

    2016-03-01

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincaré polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (refined) modular S-matrix. This leads to the identification of modular transformations in Chern-Simons theory and 3d {N}=2 theory via the 3d/3d correspondence. Lastly, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  8. Image sequence analysis

    CERN Document Server

    1981-01-01

    The processing of image sequences has a broad spectrum of important applica­ tions including target tracking, robot navigation, bandwidth compression of TV conferencing video signals, studying the motion of biological cells using microcinematography, cloud tracking, and highway traffic monitoring. Image sequence processing involves a large amount of data. However, because of the progress in computer, LSI, and VLSI technologies, we have now reached a stage when many useful processing tasks can be done in a reasonable amount of time. As a result, research and development activities in image sequence analysis have recently been growing at a rapid pace. An IEEE Computer Society Workshop on Computer Analysis of Time-Varying Imagery was held in Philadelphia, April 5-6, 1979. A related special issue of the IEEE Transactions on Pattern Anal­ ysis and Machine Intelligence was published in November 1980. The IEEE Com­ puter magazine has also published a special issue on the subject in 1981. The purpose of this book ...

  9. The Galaxy End Sequence

    Science.gov (United States)

    Eales, Stephen; de Vis, Pieter; Smith, Matthew W. L.; Appah, Kiran; Ciesla, Laure; Duffield, Chris; Schofield, Simon

    2017-03-01

    A common assumption is that galaxies fall in two distinct regions of a plot of specific star formation rate (SSFR) versus galaxy stellar mass: a star-forming galaxy main sequence (GMS) and a separate region of 'passive' or 'red and dead galaxies'. Starting from a volume-limited sample of nearby galaxies designed to contain most of the stellar mass in this volume, and thus representing the end-point of ≃12 billion years of galaxy evolution, we investigate the distribution of galaxies in this diagram today. We show that galaxies follow a strongly curved extended GMS with a steep negative slope at high galaxy stellar masses. There is a gradual change in the morphologies of the galaxies along this distribution, but there is no clear break between early-type and late-type galaxies. Examining the other evidence that there are two distinct populations, we argue that the 'red sequence' is the result of the colours of galaxies changing very little below a critical value of the SSFR, rather than implying a distinct population of galaxies. Herschel observations, which show at least half of early-type galaxies contain a cool interstellar medium, also imply continuity between early-type and late-type galaxies. This picture of a unitary population of galaxies requires more gradual evolutionary processes than the rapid quenching process needed to explain two distinct populations. We challenge theorists to predict quantitatively the properties of this 'Galaxy End Sequence'.

  10. Sequencing BPS spectra

    Energy Technology Data Exchange (ETDEWEB)

    Gukov, Sergei [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Max-Planck-Institut für Mathematik,Vivatsgasse 7, D-53111 Bonn (Germany); Nawata, Satoshi [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Centre for Quantum Geometry of Moduli Spaces, University of Aarhus,Nordre Ringgade 1, DK-8000 (Denmark); Saberi, Ingmar [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Stošić, Marko [CAMGSD, Departamento de Matemática, Instituto Superior Técnico,Av. Rovisco Pais, 1049-001 Lisbon (Portugal); Mathematical Institute SANU,Knez Mihajlova 36, 11000 Belgrade (Serbia); Sułkowski, Piotr [Walter Burke Institute for Theoretical Physics, California Institute of Technology,1200 E California Blvd, Pasadena, CA 91125 (United States); Faculty of Physics, University of Warsaw,ul. Pasteura 5, 02-093 Warsaw (Poland)

    2016-03-02

    This paper provides both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explain from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincaré polynomials in numerous examples. Among these structural properties is a novel “sliding” property, which can be explained by using (refined) modular S-matrix. This leads to the identification of modular transformations in Chern-Simons theory and 3d N=2 theory via the 3d/3d correspondence. Lastly, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.

  11. Information Theory of DNA Sequencing

    CERN Document Server

    Motahari, Abolfazl; Tse, David

    2012-01-01

    DNA sequencing is the basic workhorse of modern day biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original sequence. By drawing an analogy between the DNA sequencing problem and the classic communication problem, we define an information theoretic notion of sequencing capacity. This is the maximum number of DNA base pairs that can be resolved reliably per read, and provides a fundamental limit to the performance that can be achieved by any assembly algorithm. We compute the sequencing capacity explicitly for a simple statistical model of the DNA sequence and the read process. Using this framework, we also study the impact of noise in the read process on the sequencing capacity.

  12. A vision for ubiquitous sequencing.

    Science.gov (United States)

    Erlich, Yaniv

    2015-10-01

    Genomics has recently celebrated reaching the $1000 genome milestone, making affordable DNA sequencing a reality. With this goal successfully completed, the next goal of the sequencing revolution can be sequencing sensors--miniaturized sequencing devices that are manufactured for real-time applications and deployed in large quantities at low costs. The first part of this manuscript envisions applications that will benefit from moving the sequencers to the samples in a range of domains. In the second part, the manuscript outlines the critical barriers that need to be addressed in order to reach the goal of ubiquitous sequencing sensors.

  13. Psychoacoustic Properties of Fibonacci Sequences

    Directory of Open Access Journals (Sweden)

    J. Sokoll

    2008-01-01

    Full Text Available 1202, Fibonacci set up one of the most interesting sequences in number theory. This sequence can be represented by so-called Fibonacci Numbers, and by a binary sequence of zeros and ones. If such a binary Fibonacci Sequence is played back as an audio file, a very dissonant sound results. This is caused by the “almost-periodic”, “self-similar” property of the binary sequence. The ratio of zeros and ones converges to the golden ratio, as do the primary and secondary spectral components intheir frequencies and amplitudes. These Fibonacci Sequences will be characterized using listening tests and psychoacoustic analyses. 

  14. Infinite sequences and series

    CERN Document Server

    Knopp, Konrad

    1956-01-01

    One of the finest expositors in the field of modern mathematics, Dr. Konrad Knopp here concentrates on a topic that is of particular interest to 20th-century mathematicians and students. He develops the theory of infinite sequences and series from its beginnings to a point where the reader will be in a position to investigate more advanced stages on his own. The foundations of the theory are therefore presented with special care, while the developmental aspects are limited by the scope and purpose of the book. All definitions are clearly stated; all theorems are proved with enough detail to ma

  15. Next-Generation Sequencing Platforms

    Science.gov (United States)

    Mardis, Elaine R.

    2013-06-01

    Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.

  16. Spaces of Ideal Convergent Sequences

    Directory of Open Access Journals (Sweden)

    M. Mursaleen

    2014-01-01

    Full Text Available In the present paper, we introduce some sequence spaces using ideal convergence and Musielak-Orlicz function ℳ=Mk. We also examine some topological properties of the resulting sequence spaces.

  17. Cultivation of marker-free transfer of Lysine-rich protein gene on rice%无选择标记转高赖氨酸蛋白基因水稻植株的培育

    Institute of Scientific and Technical Information of China (English)

    段忠卫; 李希臣; 张军; 刘琦; 夏善勇; 刘建新; 李杰

    2013-01-01

    利用农杆菌介导遗传转化方法,将带有分别由Ubi组成型启动子和GluB-4水稻种子特异型启动子调控的GsHLP2/ GsHLP8基因、选择标记基因为Bar基因的双T-DNA植物表达栽体pTTBUG2、pTTBUG8、pTTBGG2、pTTBGG8转入水稻.共获得转化苗124株,PCR结果表明,高赖氨酸蛋白基因GsHLP2/GsHLP8及Bar基因已经共转化到水稻基因组上,共转化率为61.1%.通过Bialaphos抗性试验对转基因后代进行抗性检测,结果表明T1种子对Bialaphos的抗性出现分离.进而对T1代植株进行PCR检测,获得只合有目的基因而不合有选择标记基因的植株,为培育无选择标记转高赖氨酸蛋白基因水稻提供材料.

  18. Sequence Handling by Sequence Analysis Toolbox v1.0

    DEFF Research Database (Denmark)

    Ingrell, Christian Ravnsborg; Matthiesen, Rune; Jensen, Ole Nørregaard

    2006-01-01

    The fact that mass spectrometry have become a high-throughput method calls for bioinformatic tools for automated sequence handling and prediction. For efficient use of bioinformatic tools, it is important that these tools are integrated or interfaced with each other. The purpose of sequence...... analysis toolbox v1.0 was to have a general purpose sequence analyzing tool that can import sequences obtained by high-throughput sequencing methods. The program includes algorithms for calculation or prediction of isoelectric point, hydropathicity index, transmembrane segments, and glycosylphosphatidyl...

  19. The Galaxy End Sequence

    CERN Document Server

    Eales, Stephen; Smith, Matthew; Appah, Kiran; Ciesla, Laure; Duffield, Chris; Schofield, Simon

    2016-01-01

    A common assumption is that galaxies fall in two distinct regions on a plot of specific star-formation rate (SSFR) versus galaxy stellar mass: a star-forming Galaxy Main Sequence (GMS) and a separate region of `passive' or `red and dead galaxies'. Starting from a volume-limited sample of nearby galaxies designed to contain most of the stellar mass in this volume, and thus being a fair representation of the Universe at the end of 12 billion years of galaxy evolution, we investigate the distribution of galaxies in this diagram today. We show that galaxies follow a strongly curved extended GMS with a steep negative slope at high galaxy stellar masses. There is a gradual change in the morphologies of the galaxies along this distribution, but there is no clear break between early-type and late-type galaxies. Examining the other evidence that there are two distinct populations, we argue that the `red sequence' is the result of the colours of galaxies changing very little below a critical value of the SSFR, rather t...

  20. Rapid Polymer Sequencer

    Science.gov (United States)

    Stolc, Viktor (Inventor); Brock, Matthew W (Inventor)

    2013-01-01

    Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component. The nanopore preferably has a pore inner diameter of no more than about 40 nm and is prepared by heating and pulling a very small section of a glass tubing.

  1. Novel sequences propel familiar folds.

    Science.gov (United States)

    Jawad, Zahra; Paoli, Massimo

    2002-04-01

    Recent structure determinations have made new additions to a set of strikingly different sequences that give rise to the same topology. Proteins with a beta propeller fold are characterized by extreme sequence diversity despite the similarity in their three-dimensional structures. Several fold predictions, based in part on sequence repeats thought to match modular beta sheets, have been proved correct.

  2. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS...... the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56...... MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types...

  3. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    Science.gov (United States)

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can

  4. RIKEN integrated sequence analysis (RISA) system--384-format sequencing pipeline with 384 multicapillary sequencer.

    Science.gov (United States)

    Shibata, K; Itoh, M; Aizawa, K; Nagaoka, S; Sasaki, N; Carninci, P; Konno, H; Akiyama, J; Nishi, K; Kitsunai, T; Tashiro, H; Itoh, M; Sumi, N; Ishii, Y; Nakamura, S; Hazama, M; Nishine, T; Harada, A; Yamamoto, R; Matsumoto, H; Sakaguchi, S; Ikegami, T; Kashiwagi, K; Fujiwake, S; Inoue, K; Togawa, Y

    2000-11-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3' end and 5' end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can be

  5. Musical Sequences in Comics

    Directory of Open Access Journals (Sweden)

    Kieron Michael Brown

    2013-11-01

    Full Text Available Critical attention paid to the media of music and comics has historically focused on parallels between the temporal rhythm and pacing of music and the implied rhythm and temporality of comics (Eisner 2008, Godek 2007. Recent attention has begun to focus on both comics’ potential to represent the character of music (Whitted 2011 and the effects of musical images and themes on comics’ narratology (Peters 2013.    I suggest that analyses of comics that combine the traditional interplay of image and word with the use of elements of musical notation are able to shed further light on each of these areas, via the connotations and conventions of symbols pulled exclusively from the realms of music, and their integration with the other elements of the page in sequence.

  6. Solid phase sequencing of biopolymers

    Energy Technology Data Exchange (ETDEWEB)

    Cantor, Charles (Del Mar, CA); Koster, Hubert (La Jolla, CA)

    2010-09-28

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include DNA or RNA in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  7. Solid phase sequencing of biopolymers

    Science.gov (United States)

    Cantor, Charles R.; Hubert, Koster

    2014-06-24

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Probes may be affixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  8. Graphene nanodevices for DNA sequencing

    Science.gov (United States)

    Heerema, Stephanie J.; Dekker, Cees

    2016-02-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with nanopores. Owing to its unique structure and properties, graphene provides interesting opportunities for the development of a new sequencing technology. In recent years, a wide range of creative ideas for graphene sequencers have been theoretically proposed and the first experimental demonstrations have begun to appear. Here, we review the different approaches to using graphene nanodevices for DNA sequencing, which involve DNA passing through graphene nanopores, nanogaps, and nanoribbons, and the physisorption of DNA on graphene nanostructures. We discuss the advantages and problems of each of these key techniques, and provide a perspective on the use of graphene in future DNA sequencing technology.

  9. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  10. Nonlinear analysis of biological sequences

    Energy Technology Data Exchange (ETDEWEB)

    Torney, D.C.; Bruno, W.; Detours, V. [and others

    1998-11-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The main objectives of this project involved deriving new capabilities for analyzing biological sequences. The authors focused on tabulating the statistical properties exhibited by Human coding DNA sequences and on techniques of inferring the phylogenetic relationships among protein sequences related by descent.

  11. Biosensors for DNA sequence detection

    Science.gov (United States)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  12. Blazar Sequence in Fermi Era

    Indian Academy of Sciences (India)

    Liang Chen

    2014-09-01

    In this paper, we review the latest research results on the topic of blazar sequence. It seems that the blazar sequence is phenomenally ruled out, while the theoretical blazar sequence still holds. We point out that black hole mass is a dominated parameter accounting for high-power-high-synchrotron-peaked and low-power-low-sychrotron-peaked blazars. Because most blazars have similar size of emission region, theoretical blazar sequence implies that the break of Spectral Energy Distribution (SED) is a cooling break in nature.

  13. ABS: Sequence alignment by scanning

    KAUST Repository

    Bonny, Mohamed Talal

    2011-08-01

    Sequence alignment is an essential tool in almost any computational biology research. It processes large database sequences and considered to be high consumers of computation time. Heuristic algorithms are used to get approximate but fast results. We introduce fast alignment algorithm, called Alignment By Scanning (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the well-known alignment algorithms, the FASTA (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 76% enhancement in alignment score when it is compared with the FASTA Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  14. Assembly sequencing with toleranced parts

    Energy Technology Data Exchange (ETDEWEB)

    Latombe, J.C. [Stanford Univ., CA (United States). Robotics Lab.; Wilson, R.H. [Sandia National Labs., Albuquerque, NM (United States). Intelligent Systems and Robotics Center

    1995-02-21

    The goal of assembly sequencing is to plan a feasible series of operations to construct a product from its individual parts. Previous research has thoroughly investigated assembly sequencing under the assumption that parts have nominal geometry. This paper considers the case where parts have toleranced geometry. Its main contribution is an efficient procedure that decides if a product admits an assembly sequence with infinite translations that is feasible for all possible instances of the components within the specified tolerances. If the product admits one such sequence, the procedure can also generate it. For the cases where there exists no such assembly sequence, another procedure is proposed which generates assembly sequences that are feasible only for some values of the toleranced dimensions. If this procedure produces no such sequence, then no instance of the product is assemblable. Finally, this paper analyzes the relation between assembly and disassembly sequences in the presence of toleranced parts. This work assumes a simple, but non-trivial tolerance language that falls short of capturing all imperfections of a manufacturing process. Hence, it is only one step toward assembly sequencing with toleranced parts.

  15. SNMR pulse sequence phase cycling

    Science.gov (United States)

    Walsh, David O; Grunewald, Elliot D

    2013-11-12

    Technologies applicable to SNMR pulse sequence phase cycling are disclosed, including SNMR acquisition apparatus and methods, SNMR processing apparatus and methods, and combinations thereof. SNMR acquisition may include transmitting two or more SNMR pulse sequences and applying a phase shift to a pulse in at least one of the pulse sequences, according to any of a variety cycling techniques. SNMR processing may include combining SNMR from a plurality of pulse sequences comprising pulses of different phases, so that desired signals are preserved and indesired signals are canceled.

  16. The ontology of biological sequences

    Directory of Open Access Journals (Sweden)

    Kelso Janet

    2009-11-01

    Full Text Available Abstract Background Biological sequences play a major role in molecular and computational biology. They are studied as information-bearing entities that make up DNA, RNA or proteins. The Sequence Ontology, which is part of the OBO Foundry, contains descriptions and definitions of sequences and their properties. Yet the most basic question about sequences remains unanswered: what kind of entity is a biological sequence? An answer to this question benefits formal ontologies that use the notion of biological sequences and analyses in computational biology alike. Results We provide both an ontological analysis of biological sequences and a formal representation that can be used in knowledge-based applications and other ontologies. We distinguish three distinct kinds of entities that can be referred to as "biological sequence": chains of molecules, syntactic representations such as those in biological databases, and the abstract information-bearing entities. For use in knowledge-based applications and inclusion in biomedical ontologies, we implemented the developed axiom system for use in automated theorem proving. Conclusion Axioms are necessary to achieve the main goal of ontologies: to formally specify the meaning of terms used within a domain. The axiom system for the ontology of biological sequences is the first elaborate axiom system for an OBO Foundry ontology and can serve as starting point for the development of more formal ontologies and ultimately of knowledge-based applications.

  17. Fast global sequence alignment technique

    KAUST Repository

    Bonny, Mohamed Talal

    2011-11-01

    Bioinformatics database is growing exponentially in size. Processing these large amount of data may take hours of time even if super computers are used. One of the most important processing tool in Bioinformatics is sequence alignment. We introduce fast alignment algorithm, called \\'Alignment By Scanning\\' (ABS), to provide an approximate alignment of two DNA sequences. We compare our algorithm with the wellknown sequence alignment algorithms, the \\'GAP\\' (which is heuristic) and the \\'Needleman-Wunsch\\' (which is optimal). The proposed algorithm achieves up to 51% enhancement in alignment score when it is compared with the GAP Algorithm. The evaluations are conducted using different lengths of DNA sequences. © 2011 IEEE.

  18. Sequence Algebra, Sequence Decision Diagrams and Dynamic Fault Trees

    Energy Technology Data Exchange (ETDEWEB)

    Rauzy, Antoine B., E-mail: Antoine.Rauzy@lix.polytechnique.f [LIX-CNRS, Computer Science, Ecole Polytechnique, 91128 Palaiseau Cedex (France)

    2011-07-15

    A large attention has been focused on the Dynamic Fault Trees in the past few years. By adding new gates to static (regular) Fault Trees, Dynamic Fault Trees aim to take into account dependencies among events. Merle et al. proposed recently an algebraic framework to give a formal interpretation to these gates. In this article, we extend Merle et al.'s work by adopting a slightly different perspective. We introduce Sequence Algebras that can be seen as Algebras of Basic Events, representing failures of non-repairable components. We show how to interpret Dynamic Fault Trees within this framework. Finally, we propose a new data structure to encode sets of sequences of Basic Events: Sequence Decision Diagrams. Sequence Decision Diagrams are very much inspired from Minato's Zero-Suppressed Binary Decision Diagrams. We show that all operations of Sequence Algebras can be performed on this data structure.

  19. Three-Distance Sequences with Three Symbols

    OpenAIRE

    SAKAMOTO, Kuniko

    2003-01-01

    We will show that every $3$ dimensional cutting sequence is a three-distance sequence, and there are uncountable many periodic or aperiodic three-distance sequences (with $3$-symbols) which are not $3$ dimensional cutting sequences.

  20. NSIT: novel sequence identification tool.

    Directory of Open Access Journals (Sweden)

    Benjarath Pupacdi

    Full Text Available Novel sequences are DNA sequences present in an individual's genome but absent in the human reference assembly. They are predicted to be biologically important, both individual and population specific, and consistent with the known human migration paths. Recent works have shown that an average person harbors 2-5 Mb of such sequences and estimated that the human pan-genome contains as high as 19-40 Mb of novel sequences. To identify them in a de novo genome assembly, some existing sequence aligners have been used but no computational method has been specifically proposed for this task. In this work, we developed NSIT (Novel Sequence Identification Tool, a software that can accurately and efficiently identify novel sequences in an individual's de novo whole genome assembly. We identified and characterized 1.1 Mb, 1.2 Mb, and 1.0 Mb of novel sequences in NA18507 (African, YH (Asian, and NA12878 (European de novo genome assemblies, respectively. Our results show very high concordance with the previous work using the respective reference assembly. In addition, our results using the latest human reference assembly suggest that the amount of novel sequences per individual may not be as high as previously reported. We additionally developed a graphical viewer for comparisons of novel sequence contents. The viewer also helped in identifying sequence contamination; we found 130 kb of Epstein-Barr virus sequence in the previously published NA18507 novel sequences as well as 287 kb of zebrafish repeats in NA12878 de novo assembly. NSIT requires [Formula: see text]2GB of RAM and 1.5-2 hrs on a commodity desktop. The program is applicable to input assemblies with varying contig/scaffold sizes, ranging from 100 bp to as high as 50 Mb. It works in both 32-bit and 64-bit systems and outperforms, by large margins, other fast sequence aligners previously applied to this task. To our knowledge, NSIT is the first software designed specifically for novel sequence

  1. PERIODIC COMPLEMENTARY BINARY SEQUENCE PAIRS

    Institute of Scientific and Technical Information of China (English)

    XuChengqian; ZhaoXiaoqun

    2002-01-01

    A new set of binary sequences-Periodic Complementary Binary Sequence Pair (PCSP)is proposed .A new class of block design-Difference Family Pair (DFP)is also proposed .The relationship between PCSP and DFP,the properties and exising conditions of PCSP and the recursive constructions for PCSP are given.

  2. PERIODIC COMPLEMENTARY BINARY SEQUENCE PAIRS

    Institute of Scientific and Technical Information of China (English)

    Xu Chengqian; Zhao Xiaoqun

    2002-01-01

    A new set of binary sequences-Periodic Complementary Binary Sequence Pair (PCSP) is proposed. A new class of block design-Difference Family Pair (DFP) is also proposed.The relationship between PCSP and DFP, the properties and existing conditions of PCSP and the recursive constructions for PCSP are given.

  3. DNA Sequencing Sensors: An Overview

    Directory of Open Access Journals (Sweden)

    Jose Antonio Garrido-Cardenas

    2017-03-01

    Full Text Available The first sequencing of a complete genome was published forty years ago by the double Nobel Prize in Chemistry winner Frederick Sanger. That corresponded to the small sized genome of a bacteriophage, but since then there have been many complex organisms whose DNA have been sequenced. This was possible thanks to continuous advances in the fields of biochemistry and molecular genetics, but also in other areas such as nanotechnology and computing. Nowadays, sequencing sensors based on genetic material have little to do with those used by Sanger. The emergence of mass sequencing sensors, or new generation sequencing (NGS meant a quantitative leap both in the volume of genetic material that was able to be sequenced in each trial, as well as in the time per run and its cost. One can envisage that incoming technologies, already known as fourth generation sequencing, will continue to cheapen the trials by increasing DNA reading lengths in each run. All of this would be impossible without sensors and detection systems becoming smaller and more precise. This article provides a comprehensive overview on sensors for DNA sequencing developed within the last 40 years.

  4. Gambling strategies for random sequences

    OpenAIRE

    George Davie

    2010-01-01

    There is a general consensus that it is not possible to gamble successfully against a random se-quence. This consensus is based on results from probability theory that all gambling systems arein some sense futile and the idea that at any stage of the sequence, the next outcome is entirelyunpredictable.

  5. Sequence conserved for subcellular localization

    Science.gov (United States)

    Nair, Rajesh; Rost, Burkhard

    2002-01-01

    The more proteins diverged in sequence, the more difficult it becomes for bioinformatics to infer similarities of protein function and structure from sequence. The precise thresholds used in automated genome annotations depend on the particular aspect of protein function transferred by homology. Here, we presented the first large-scale analysis of the relation between sequence similarity and identity in subcellular localization. Three results stood out: (1) The subcellular compartment is generally more conserved than what might have been expected given that short sequence motifs like nuclear localization signals can alter the native compartment; (2) the sequence conservation of localization is similar between different compartments; and (3) it is similar to the conservation of structure and enzymatic activity. In particular, we found the transition between the regions of conserved and nonconserved localization to be very sharp, although the thresholds for conservation were less well defined than for structure and enzymatic activity. We found that a simple measure for sequence similarity accounting for pairwise sequence identity and alignment length, the HSSP distance, distinguished accurately between protein pairs of identical and different localizations. In fact, BLAST expectation values outperformed the HSSP distance only for alignments in the subtwilight zone. We succeeded in slightly improving the accuracy of inferring localization through homology by fine tuning the thresholds. Finally, we applied our results to the entire SWISS-PROT database and five entirely sequenced eukaryotes. PMID:12441382

  6. Bayesian analysis of binary sequences

    Science.gov (United States)

    Torney, David C.

    2005-03-01

    This manuscript details Bayesian methodology for "learning by example", with binary n-sequences encoding the objects under consideration. Priors prove influential; conformable priors are described. Laplace approximation of Bayes integrals yields posterior likelihoods for all n-sequences. This involves the optimization of a definite function over a convex domain--efficiently effectuated by the sequential application of the quadratic program.

  7. Chameleon sequences in neurodegenerative diseases

    Energy Technology Data Exchange (ETDEWEB)

    Bahramali, Golnaz [Institute of Biochemistry and Biophysics, University of Tehran, Tehran (Iran, Islamic Republic of); Goliaei, Bahram, E-mail: goliaei@ut.ac.ir [Institute of Biochemistry and Biophysics, University of Tehran, Tehran (Iran, Islamic Republic of); Minuchehr, Zarrin, E-mail: minuchehr@nigeb.ac.ir [Department of Systems Biotechnology, National Institute of Genetic Engineering and Biotechnology, (NIGEB), Tehran (Iran, Islamic Republic of); Salari, Ali [Department of Systems Biotechnology, National Institute of Genetic Engineering and Biotechnology, (NIGEB), Tehran (Iran, Islamic Republic of)

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to “helix to strand (HE)”, “helix to coil (HC)” and “strand to coil (CE)” alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.

  8. Rapid Diagnostics of Onboard Sequences

    Science.gov (United States)

    Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

    2012-01-01

    Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command

  9. Spatiotemporal correlations of aftershock sequences

    CERN Document Server

    Peixoto, Tiago P; Davidsen, Jörn

    2010-01-01

    Aftershock sequences are of particular interest in seismic research since they may condition seismic activity in a given region over long time spans. While they are typically identified with periods of enhanced seismic activity after a large earthquake as characterized by the Omori law, our knowledge of the spatiotemporal correlations between events in an aftershock sequence is limited. Here, we study the spatiotemporal correlations of two aftershock sequences form California (Parkfield and Hector Mine) using the recently introduced concept of "recurrent" events. We find that both sequences have very similar properties and that most of them are captured by the space-time epidemic-type aftershock sequence (ETAS) model if one takes into account catalog incompleteness. However, the stochastic model does not capture the spatiotemporal correlations leading to the observed structure of seismicity on small spatial scales.

  10. Quantum Exchangeable Sequences of Algebras

    CERN Document Server

    Curran, Stephen

    2008-01-01

    We extend the notion of quantum exchangeability, introduced by K\\"ostler and Speicher in arXiv:0807.0677, to sequences (\\rho_1,\\rho_2,...c) of homomorphisms from an algebra C into a noncommutative probability space (A,\\phi), and prove a free de Finetti theorem: an infinite quantum exchangeable sequence (\\rho_1,\\rho_2,...c) is freely independent and identically distributed with respect to a conditional expectation. As a corollary we obtain a free analogue of the Hewitt Savage zero-one law. As in the classical case, the theorem fails for finite sequences. We give a characterization of finite quantum exchangeable sequences, which can be viewed as a noncommutative analogue of sampling without replacement. We then give an approximation to how far a finite quantum exchangeable sequence is from being freely independent with amalgamation.

  11. Ossification sequence heterochrony among amphibians.

    Science.gov (United States)

    Harrington, Sean M; Harrison, Luke B; Sheil, Christopher A

    2013-01-01

    Heterochrony is an important mechanism in the evolution of amphibians. Although studies have centered on the relationship between size and shape and the rates of development, ossification sequence heterochrony also may have been important. Rigorous, phylogenetic methods for assessing sequence heterochrony are relatively new, and a comprehensive study of the relative timing of ossification of skeletal elements has not been used to identify instances of sequence heterochrony across Amphibia. In this study, a new version of the program Parsimov-based genetic inference (PGi) was used to identify shifts in ossification sequences across all extant orders of amphibians, for all major structural units of the skeleton. PGi identified a number of heterochronic sequence shifts in all analyses, the most interesting of which seem to be tied to differences in metamorphic patterns among major clades. Early ossification of the vomer, premaxilla, and dentary is retained by Apateon caducus and members of Gymnophiona and Urodela, which lack the strongly biphasic development seen in anurans. In contrast, bones associated with the jaws and face were identified as shifting late in the ancestor of Anura. The bones that do not shift late, and thereby occupy the earliest positions in the anuran cranial sequence, are those in regions of the skull that undergo the least restructuring throughout anuran metamorphosis. Additionally, within Anura, bones of the hind limb and pelvic girdle were also identified as shifting early in the sequence of ossification, which may be a result of functional constraints imposed by the drastic metamorphosis of most anurans.

  12. A Criterion for Regular Sequences

    Indian Academy of Sciences (India)

    D P Patil; U Storch; J Stückrad

    2004-05-01

    Let be a commutative noetherian ring and $f_1,\\ldots,f_r \\in R$. In this article we give (cf. the Theorem in $\\mathcal{x}$2) a criterion for $f_1,\\ldots,f_r$ to be regular sequence for a finitely generated module over which strengthens and generalises a result in [2]. As an immediate consequence we deduce that if $V(g_1,\\ldots,g_r) \\subseteq V(f_1,\\ldots,f_r)$ in Spec and if $f_1,\\ldots,f_r$ is a regular sequence in , then $g_1,\\ldots,g_r$ is also a regular sequence in .

  13. Weak disorder in Fibonacci sequences

    Energy Technology Data Exchange (ETDEWEB)

    Ben-Naim, E [Theoretical Division and Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM 87545 (United States); Krapivsky, P L [Department of Physics and Center for Molecular Cybernetics, Boston University, Boston, MA 02215 (United States)

    2006-05-19

    We study how weak disorder affects the growth of the Fibonacci series. We introduce a family of stochastic sequences that grow by the normal Fibonacci recursion with probability 1 - {epsilon}, but follow a different recursion rule with a small probability {epsilon}. We focus on the weak disorder limit and obtain the Lyapunov exponent that characterizes the typical growth of the sequence elements, using perturbation theory. The limiting distribution for the ratio of consecutive sequence elements is obtained as well. A number of variations to the basic Fibonacci recursion including shift, doubling and copying are considered. (letter to the editor)

  14. ISIS Individualized Support In Sequencing

    NARCIS (Netherlands)

    Drachsler, Hendrik; Hummel, Hans

    2007-01-01

    Drachsler, H., & Hummel, H. G. K. (2007). ISIS Individualized Support In Sequencing. Presentation given during the PIP meeting on March 22, 2007. Open University of the Netherlands: Heerlen, The Netherlands.

  15. Molecular beacon sequence design algorithm.

    Science.gov (United States)

    Monroe, W Todd; Haselton, Frederick R

    2003-01-01

    A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.

  16. Classification of Base Sequences (+1,

    Directory of Open Access Journals (Sweden)

    Dragomir Ž. Ðoković

    2010-01-01

    Full Text Available Base sequences BS(+1, are quadruples of {±1}-sequences (;;;, with A and B of length +1 and C and D of length n, such that the sum of their nonperiodic autocor-relation functions is a -function. The base sequence conjecture, asserting that BS(+1, exist for all n, is stronger than the famous Hadamard matrix conjecture. We introduce a new definition of equivalence for base sequences BS(+1, and construct a canonical form. By using this canonical form, we have enumerated the equivalence classes of BS(+1, for ≤30. As the number of equivalence classes grows rapidly (but not monotonically with n, the tables in the paper cover only the cases ≤13.

  17. DNA Sequencing Using capillary Electrophoresis

    Energy Technology Data Exchange (ETDEWEB)

    Dr. Barry Karger

    2011-05-09

    The overall goal of this program was to develop capillary electrophoresis as the tool to be used to sequence for the first time the Human Genome. Our program was part of the Human Genome Project. In this work, we were highly successful and the replaceable polymer we developed, linear polyacrylamide, was used by the DOE sequencing lab in California to sequence a significant portion of the human genome using the MegaBase multiple capillary array electrophoresis instrument. In this final report, we summarize our efforts and success. We began our work by separating by capillary electrophoresis double strand oligonucleotides using cross-linked polyacrylamide gels in fused silica capillaries. This work showed the potential of the methodology. However, preparation of such cross-linked gel capillaries was difficult with poor reproducibility, and even more important, the columns were not very stable. We improved stability by using non-cross linked linear polyacrylamide. Here, the entangled linear chains could move when osmotic pressure (e.g. sample injection) was imposed on the polymer matrix. This relaxation of the polymer dissipated the stress in the column. Our next advance was to use significantly lower concentrations of the linear polyacrylamide that the polymer could be automatically blown out after each run and replaced with fresh linear polymer solution. In this way, a new column was available for each analytical run. Finally, while testing many linear polymers, we selected linear polyacrylamide as the best matrix as it was the most hydrophilic polymer available. Under our DOE program, we demonstrated initially the success of the linear polyacrylamide to separate double strand DNA. We note that the method is used even today to assay purity of double stranded DNA fragments. Our focus, of course, was on the separation of single stranded DNA for sequencing purposes. In one paper, we demonstrated the success of our approach in sequencing up to 500 bases. Other

  18. Pythagorean Triples from Harmonic Sequences.

    Science.gov (United States)

    DiDomenico, Angelo S.; Tanner, Randy J.

    2001-01-01

    Shows how all primitive Pythagorean triples can be generated from harmonic sequences. Use inductive and deductive reasoning to explore how Pythagorean triples are connected with another area of mathematics. (KHR)

  19. Overview of Sequence Data Formats.

    Science.gov (United States)

    Zhang, Hongen

    2016-01-01

    Next-generation sequencing experiment can generate billions of short reads for each sample and processing of the raw reads will add more information. Various file formats have been introduced/developed in order to store and manipulate this information. This chapter presents an overview of the file formats including FASTQ, FASTA, SAM/BAM, GFF/GTF, BED, and VCF that are commonly used in analysis of next-generation sequencing data.

  20. Structural Complexity of DNA Sequence

    Directory of Open Access Journals (Sweden)

    Cheng-Yuan Liou

    2013-01-01

    Full Text Available In modern bioinformatics, finding an efficient way to allocate sequence fragments with biological functions is an important issue. This paper presents a structural approach based on context-free grammars extracted from original DNA or protein sequences. This approach is radically different from all those statistical methods. Furthermore, this approach is compared with a topological entropy-based method for consistency and difference of the complexity results.

  1. Nanogrid rolling circle DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Church, George M.; Porreca, Gregory J.; Shendure, Jay; Rosenbaum, Abraham Meir

    2017-04-18

    The present invention relates to methods for sequencing a polynucleotide immobilized on an array having a plurality of specific regions each having a defined diameter size, including synthesizing a concatemer of a polynucleotide by rolling circle amplification, wherein the concatemer has a cross-sectional diameter greater than the diameter of a specific region, immobilizing the concatemer to the specific region to make an immobilized concatemer, and sequencing the immobilized concatemer.

  2. Pig genome sequence - analysis and publication strategy

    NARCIS (Netherlands)

    Archibald, A.L.; Bolund, L.; Churcher, C.; Fredholm, M.; Groenen, M.A.M.; Harlizius, B.

    2010-01-01

    Background - The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. Results - Assemblies of the B

  3. Long-range barcode labeling-sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Chen, Feng; Zhang, Tao; Singh, Kanwar K.; Pennacchio, Len A.; Froula, Jeff L.; Eng, Kevin S.

    2016-10-18

    Methods for sequencing single large DNA molecules by clonal multiple displacement amplification using barcoded primers. Sequences are binned based on barcode sequences and sequenced using a microdroplet-based method for sequencing large polynucleotide templates to enable assembly of haplotype-resolved complex genomes and metagenomes.

  4. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    Science.gov (United States)

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  5. ARC Code TI: sequenceMiner

    Data.gov (United States)

    National Aeronautics and Space Administration — The sequenceMiner was developed to address the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. sequenceMiner works...

  6. Sequencing Needs for Viral Diagnostics

    Energy Technology Data Exchange (ETDEWEB)

    Gardner, S N; Lam, M; Mulakken, N J; Torres, C L; Smith, J R; Slezak, T

    2004-01-26

    We built a system to guide decisions regarding the amount of genomic sequencing required to develop diagnostic DNA signatures, which are short sequences that are sufficient to uniquely identify a viral species. We used our existing DNA diagnostic signature prediction pipeline, which selects regions of a target species genome that are conserved among strains of the target (for reliability, to prevent false negatives) and unique relative to other species (for specificity, to avoid false positives). We performed simulations, based on existing sequence data, to assess the number of genome sequences of a target species and of close phylogenetic relatives (''near neighbors'') that are required to predict diagnostic signature regions that are conserved among strains of the target species and unique relative to other bacterial and viral species. For DNA viruses such as variola (smallpox), three target genomes provide sufficient guidance for selecting species-wide signatures. Three near neighbor genomes are critical for species specificity. In contrast, most RNA viruses require four target genomes and no near neighbor genomes, since lack of conservation among strains is more limiting than uniqueness. SARS and Ebola Zaire are exceptional, as additional target genomes currently do not improve predictions, but near neighbor sequences are urgently needed. Our results also indicate that double stranded DNA viruses are more conserved among strains than are RNA viruses, since in most cases there was at least one conserved signature candidate for the DNA viruses and zero conserved signature candidates for the RNA viruses.

  7. Sequence-dependent nucleosome positioning.

    Science.gov (United States)

    Chung, Ho-Ryun; Vingron, Martin

    2009-03-13

    Eukaryotic DNA is organized into a macromolecular structure called chromatin. The basic repeating unit of chromatin is the nucleosome, which consists of two copies of each of the four core histones and DNA. The nucleosomal organization and the positions of nucleosomes have profound effects on all DNA-dependent processes. Understanding the factors that influence nucleosome positioning is therefore of general interest. Among the many determinants of nucleosome positioning, the DNA sequence has been proposed to have a major role. Here, we analyzed more than 860,000 nucleosomal DNA sequences to identify sequence features that guide the formation of nucleosomes in vivo. We found that both a periodic enrichment of AT base pairs and an out-of-phase oscillating enrichment of GC base pairs as well as the overall preference for GC base pairs are determinants of nucleosome positioning. The preference for GC pairs can be related to a lower energetic cost required for deformation of the DNA to wrap around the histones. In line with this idea, we found that only incorporation of both signal components into a sequence model for nucleosome formation results in maximal predictive performance on a genome-wide scale. In this manner, one achieves greater predictive power than published approaches. Our results confirm the hypothesis that the DNA sequence has a major role in nucleosome positioning in vivo.

  8. Explaining the harmonic sequence paradox.

    Science.gov (United States)

    Schmidt, Ulrich; Zimper, Alexander

    2012-05-01

    According to the harmonic sequence paradox, an expected utility decision maker's willingness to pay for a gamble whose expected payoffs evolve according to the harmonic series is finite if and only if his marginal utility of additional income becomes zero for rather low payoff levels. Since the assumption of zero marginal utility is implausible for finite payoff levels, expected utility theory - as well as its standard generalizations such as cumulative prospect theory - are apparently unable to explain a finite willingness to pay. This paper presents first an experimental study of the harmonic sequence paradox. Additionally, it demonstrates that the theoretical argument of the harmonic sequence paradox only applies to time-patient decision makers, whereas the paradox is easily avoided if time-impatience is introduced.

  9. Transgressive Surface as Sequence Boundary

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    Analysis of the four cases of the sequence boundary (SB)-transgressive surface (TS) relation in nature shows that applying transgressive surfaces as sequence boundaries has the following merits: it improves the methodology of stratigraphic subdivision; the position of transgressive surface in a sea level curve is relatively fixed; the transgressive surface is a transforming surface of the stratal structure; in platforms or ramps, the transgressive surface is the only choice for determining the sequence boundary; the transgressive surface is a readily recognized physical surface reflected by seismic records in seismostratigraphy. The paper reaches a conclusion that to delineate a SB in terms of the TS is theoretically and practically better than to delineate it between highstand and lowstand sediments as has been done traditionally.

  10. On the base sequence conjecture

    CERN Document Server

    Djokovic, Dragomir Z

    2010-01-01

    Let BS(m,n) denote the set of base sequences (A;B;C;D), with A and B of length m and C and D of length n. The base sequence conjecture (BSC) asserts that BS(n+1,n) exist (i.e., are non-empty) for all n. This is known to be true for n <= 36 and when n is a Golay number. We show that it is also true for n=37 and n=38. It is worth pointing out that BSC is stronger than the famous Hadamard matrix conjecture. In order to demonstrate the abundance of base sequences, we have previously attached to BS(n+1,n) a graph Gamma_n and computed the Gamma_n for n <= 27. We now extend these computations and determine the Gamma_n for n=28,...,35. We also propose a conjecture describing these graphs in general.

  11. Comparative analysis of sequences from PT 2013

    DEFF Research Database (Denmark)

    Mikkelsen, Susie Sommer

    . All but one sequence mapped to the MCP gene while the last sequence mapped to the Neurofilament gene. Approx. half of the sequences contained no errors while the rest differed with 88-99 percent similarity with most having 99% similarity. One sequence, when BLASTed, showed most similarity to European...... Sheatfish and not EHNV. Generally, mistakes occurred at the ends of the sequences. This can be due to several factors. One is that the sequence has not been trimmed of the sequence primer sites. Another is the lack of quality control of the chromatogram. Finally, sequencing in just one direction can result...

  12. Sequence Patterns of Identity Authentication Protocols

    Institute of Scientific and Technical Information of China (English)

    Tao Hongcai; He Dake

    2006-01-01

    From the viewpoint of protocol sequence, analyses are made of the sequence patterns of possible identity authentication protocol under two cases: with or without the trusted third party (TTP). Ten feasible sequence patterns of authentication protocol with TTP and 5 sequence patterns without TTP are gained. These gained sequence patterns meet the requirements for identity authentication,and basically cover almost all the authentication protocols with TTP and without TTP at present. All of the sequence patterns gained are classified into unilateral or bilateral authentication. Then , according to the sequence symmetry, several good sequence patterns with TTP are evaluated. The accompolished results can provide a reference to design of new identity authentication protocols.

  13. KERNEL WORDS AND GAP SEQUENCE OF THE TRIBONACCI SEQUENCE

    Institute of Scientific and Technical Information of China (English)

    Yuke HUANG; Zhiying WEN

    2016-01-01

    In this paper, we investigate the factor properties and gap sequence of the Tri-bonacci sequence, the fixed point of the substitution σ(a, b, c) = (ab, ac, a). Let ωp be the p-th occurrence of ω and Gp(ω) be the gap between ωp and ωp+1. We introduce a notion of kernel for each factor ω, and then give the decomposition of the factor ω with respect to its kernel. Using the kernel and the decomposition, we prove the main result of this paper:for each factorω, the gap sequence{Gp(ω)}p≥1 is the Tribonacci sequence over the alphabet{G1(ω), G2(ω), G4(ω)}, and the expressions of gaps are determined completely. As an appli-cation, for each factorω and p∈N, we determine the position ofωp. Finally we introduce a notion of spectrum for studying some typical combinatorial properties, such as power, overlap and separate of factors.

  14. Sequences and series involving the sequence of composite numbers

    Directory of Open Access Journals (Sweden)

    Panayiotis Vlamos

    2002-01-01

    Full Text Available Denoting by pn and cn the nth prime number and the nth composite number, respectively, we prove that both the sequence (xnn≥1, defined by xn=∑k=1n (ck+1−ck / k−pn / n, and the series ∑n=1∞ (pcn−cpn / npn are convergent.

  15. Integrated sequence analysis. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Andersson, K.; Pyy, P

    1998-02-01

    The NKS/RAK subprojet 3 `integrated sequence analysis` (ISA) was formulated with the overall objective to develop and to test integrated methodologies in order to evaluate event sequences with significant human action contribution. The term `methodology` denotes not only technical tools but also methods for integration of different scientific disciplines. In this report, we first discuss the background of ISA and the surveys made to map methods in different application fields, such as man machine system simulation software, human reliability analysis (HRA) and expert judgement. Specific event sequences were, after the surveys, selected for application and testing of a number of ISA methods. The event sequences discussed in the report were cold overpressure of BWR, shutdown LOCA of BWR, steam generator tube rupture of a PWR and BWR disturbed signal view in the control room after an external event. Different teams analysed these sequences by using different ISA and HRA methods. Two kinds of results were obtained from the ISA project: sequence specific and more general findings. The sequence specific results are discussed together with each sequence description. The general lessons are discussed under a separate chapter by using comparisons of different case studies. These lessons include areas ranging from plant safety management (design, procedures, instrumentation, operations, maintenance and safety practices) to methodological findings (ISA methodology, PSA,HRA, physical analyses, behavioural analyses and uncertainty assessment). Finally follows a discussion about the project and conclusions are presented. An interdisciplinary study of complex phenomena is a natural way to produce valuable and innovative results. This project came up with structured ways to perform ISA and managed to apply the in practice. The project also highlighted some areas where more work is needed. In the HRA work, development is required for the use of simulators and expert judgement as

  16. Network motifs in music sequences

    CERN Document Server

    Zanette, Damian H

    2010-01-01

    In this note, I summarize ongoing research on motif distribution in networks built up out of symbolic sequences of Western musical origin. Their motif significance profiles exhibit remarkable consistency over different styles and periods, and define a class that cannot be identified with any of the four "superfamilies" to which most real networks seem to belong. Networks from music sequences possess an unusual abundance of bidirectional connections, due to the inherent reversibility of short musical note patterns. This property contributes to motif significance from both local and large-scale features of musical structure.

  17. Convergence of Fuzzy Set Sequences

    Institute of Scientific and Technical Information of China (English)

    FENG Yu-hu

    2002-01-01

    There are more than one mode of convergence with respect to the fuzzy set sequences. In this paper,common six modes of convergence and their relationships are discussed. These six modes are convergence in uniform metric D, convergence in separable metric Dp or D*p, 1 ≤ p <∞, convergence in level set, strong convergence in level set and weak convergence. Suitable counterexamples are given. The necessary and sufficient conditions of convergence in uniform metric D are described. Some comme nts on the convergence of LRfuzzy number sequences are represented.

  18. DNA Sequencing Using capillary Electrophoresis

    Energy Technology Data Exchange (ETDEWEB)

    Dr. Barry Karger

    2011-05-09

    The overall goal of this program was to develop capillary electrophoresis as the tool to be used to sequence for the first time the Human Genome. Our program was part of the Human Genome Project. In this work, we were highly successful and the replaceable polymer we developed, linear polyacrylamide, was used by the DOE sequencing lab in California to sequence a significant portion of the human genome using the MegaBase multiple capillary array electrophoresis instrument. In this final report, we summarize our efforts and success. We began our work by separating by capillary electrophoresis double strand oligonucleotides using cross-linked polyacrylamide gels in fused silica capillaries. This work showed the potential of the methodology. However, preparation of such cross-linked gel capillaries was difficult with poor reproducibility, and even more important, the columns were not very stable. We improved stability by using non-cross linked linear polyacrylamide. Here, the entangled linear chains could move when osmotic pressure (e.g. sample injection) was imposed on the polymer matrix. This relaxation of the polymer dissipated the stress in the column. Our next advance was to use significantly lower concentrations of the linear polyacrylamide that the polymer could be automatically blown out after each run and replaced with fresh linear polymer solution. In this way, a new column was available for each analytical run. Finally, while testing many linear polymers, we selected linear polyacrylamide as the best matrix as it was the most hydrophilic polymer available. Under our DOE program, we demonstrated initially the success of the linear polyacrylamide to separate double strand DNA. We note that the method is used even today to assay purity of double stranded DNA fragments. Our focus, of course, was on the separation of single stranded DNA for sequencing purposes. In one paper, we demonstrated the success of our approach in sequencing up to 500 bases. Other

  19. The origin of biased sequence depth in sequence-independent nucleic acid amplification and optimization for efficient massive parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Toon Rosseel

    Full Text Available Sequence Independent Single Primer Amplification is one of the most widely used random amplification approaches in virology for sequencing template preparation. This technique relies on oligonucleotides consisting of a 3' random part used to prime complementary DNA synthesis and a 5' defined tag sequence for subsequent amplification. Recently, this amplification method was combined with next generation sequencing to obtain viral sequences. However, these studies showed a biased distribution of the resulting sequence reads over the analyzed genomes. The aim of this study was to elucidate the mechanisms that lead to biased sequence depth when using random amplification. Avian paramyxovirus type 8 was used as a model RNA virus to investigate these mechanisms. We showed, based on in silico analysis of the sequence depth in relation to GC-content, predicted RNA secondary structure and sequence complementarity to the 3' part of the tag sequence, that the tag sequence has the main contribution to the observed bias in sequence depth. We confirmed this finding experimentally using both fragmented and non-fragmented viral RNAs as well as primers differing in random oligomer length (6 or 12 nucleotides and in the sequence of the amplification tag. The observed oligonucleotide annealing bias can be reduced by extending the random oligomer sequence and by in silico combining sequence data from SISPA experiments using different 5' defined tag sequences. These findings contribute to the optimization of random nucleic acid amplification protocols that are currently required for downstream applications such as viral metagenomics and microarray analysis.

  20. Sequences in language and text

    CERN Document Server

    Mikros, George K

    2015-01-01

    The aim of this volume is to present the diverse but highly interesting area of the quantitative analysis of the sequence of various linguistic structures. The collected articles present a wide spectrum of quantitative analyses of linguistic syntagmatic structures and explore novel sequential linguistic entities. This volume will be interesting to all researchers studying linguistics using quantitative methods.

  1. Instruction Sequences for Computer Science

    NARCIS (Netherlands)

    Bergstra, J.A.; Middelburg, C.A.

    2012-01-01

    This book demonstrates that the concept of an instruction sequence offers a novel and useful viewpoint on issues relating to diverse subjects in computer science. Selected issues relating to well-known subjects from the theory of computation and the area of computer architecture are rigorously

  2. Single-primer fluorescent sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Ruth, J.L.; Morgan, C.A.; Middendorf, L.R.; Grone, D.L.; Brumbaugh, J.A.

    1987-05-01

    Modified linker arm oligonucleotides complementary to standard M13 priming sites were synthesized, labelled with either one, two, or three fluoresceins, and purified by reverse-phase HPLC. When used as primers in standard dideoxy M13 sequencing with /sup 32/P-dNTPs, normal autoradiographic patterns were obtained. To eliminate the radioactivity, direct on-line fluorescence detection was achieved by the use of a scanning 10 mW Argon laser emitting 488 nm light. Fluorescent bands were detected directly in standard 0.2 or 0.35 mm thick polyacrylamide gels at a distance of 24 cm from the loading wells by a photomultiplier tube filtered at 520 nm. Horizontal and temporal location of each band was displayed by computer as a band in real time, providing visual appearance similar to normal 4-lane autoradiograms. Using a single primer labelled with two fluoresceins, sequences of between 500 and 600 bases have been read in a single loading with better than 98% accuracy; up to 400 bases can be read reproducibly with no errors. More than 50 sequences have been determined by this method. This approach requires only 1-2 ug of cloned template, and produces continuous sequence data at about one band per minute.

  3. Multifractal analyses of music sequences

    Science.gov (United States)

    Su, Zhi-Yuan; Wu, Tzuyin

    2006-09-01

    Multifractal analysis is applied to study the fractal property of music. In this paper, a method is proposed to transform both the melody and rhythm of a music piece into individual sets of distributed points along a one-dimensional line. The structure of the musical composition is thus manifested and characterized by the local clustering pattern of these sequences of points. Specifically, the local Hölder exponent and the multifractal spectrum are calculated for the transformed music sequences according to the multifractal formalism. The observed fluctuations of the Hölder exponent along the music sequences confirm the non-uniformity feature in the structures of melodic and rhythmic motions of music. Our present result suggests that the shape and opening width of the multifractal spectrum plot can be used to distinguish different styles of music. In addition, a characteristic curve is constructed by mapping the point sequences converted from the melody and rhythm of a musical work into a two-dimensional graph. Each different pieces of music has its own unique characteristic curve. This characteristic curve, which also exhibits a fractal trait, unveils the intrinsic structure of music.

  4. Fractals in DNA sequence analysis

    Institute of Scientific and Technical Information of China (English)

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  5. Farey Sequences and Resistor Networks

    Indian Academy of Sciences (India)

    Sameen Ahmed Khan

    2012-05-01

    In this article, we employ the Farey sequence and Fibonacci numbers to establish strict upper and lower bounds for the order of the set of equivalent resistances for a circuit constructed from equal resistors combined in series and in parallel. The method is applicable for networks involving bridge and non-planar circuits.

  6. The Toothpick Sequence and Other Sequences from Cellular Automata

    CERN Document Server

    Applegate, David; Sloane, N J A

    2010-01-01

    A two-dimensional arrangement of toothpicks is constructed by the following iterative procedure. At stage 1, place a single toothpick of length 1 on a square grid, aligned with the y-axis. At each subsequent stage, for every exposed toothpick end, place an orthogonal toothpick centered at that end. The resulting structure has a fractal-like appearance. We will analyze the toothpick sequence, which gives the total number of toothpicks after n steps. We also study several related sequences that arise from enumerating active cells in cellular automata. Some unusual recurrences appear: a typical example is that instead of the Fibonacci recurrence, which we may write as a(2+i) = a(i) + a(i+1), we set n = 2^k+i (0 = 0} (1+x^{2^k-1}+2x^{2^k}) and variations thereof.

  7. Algebraic divisibility sequences over function fields

    CERN Document Server

    Ingram, Patrick; Silverman, Joseph H; Stange, Katherine E; Streng, Marco

    2011-01-01

    We study the existence of primes and of primitive divisors in classical divisibility sequences defined over function fields. Under various hypotheses, we prove that Lucas sequences and elliptic divisibility sequences over function fields defined over number fields contain infinitely many irreducible elements. We also prove that an elliptic divisibility sequence over a function field has only finitely many terms lacking a primitive divisor.

  8. Stream cipher based on GSS sequences

    Institute of Scientific and Technical Information of China (English)

    HU Yupu; XIAO Guozhen

    2004-01-01

    Generalized self-shrinking sequences, simply named the GSS sequences,are novel periodic sequences that have many advantages in cryptography. In this paper,we give several results about GSS sequence's application to cryptography. First, we give a simple method for selecting those GSS sequences whose least periods reach the maximum. Second, we give a method for describing and computing the auto-correlation coefficients of GSS sequences. Finally, we point out that some GSS sequences, when used as stream ciphers, have a security weakness.

  9. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol;

    2010-01-01

    BACKGROUND: The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. RESULTS: Assemblies......) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30x genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were...

  10. Sequence-structure relations of biopolymers

    CERN Document Server

    Barrett, Christopher; Reidys, Christian M

    2015-01-01

    Motivation: DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identify more general embedded "patterns" in DNA and RNA sequences. Results: We compute the partition function of sequences with respect to a fixed structure and connect this computation to the mutual information of a sequence-structure pair for RNA secondary structures. We present a Boltzmann sampler and obtain the a priori probability of specific sequence patterns. We present a detailed analysis for the three PDB-structures, 2JXV (hairpin), 2N3R (3-branch multi-loop) and 1EHZ (tRNA). We localize specific sequence patterns, contrast the energy spectrum of the Boltzmann sampled sequences versus those sequences that refold ...

  11. Expressing stochastic filters via number sequences

    OpenAIRE

    Capponi, A.; Farina, A; Pilotto, C.

    2010-01-01

    We generalize the results presented in [1] regarding the relation between the Kalman filter and the Fibonacci sequence. We consider more general filtering models and relate the finite dimensional Kalman and Benes filters to the Fibonacci sequence and to the Golden Section. We also prove that Fibonacci numbers may be expressed as the convolution of the Fibonacci and Padovan sequence, thus extending the connection between stochastic filtering and Fibonacci sequence to the Padovan sequence.

  12. Cassini Mission Sequence Subsystem (MSS)

    Science.gov (United States)

    Alland, Robert

    2011-01-01

    This paper describes my work with the Cassini Mission Sequence Subsystem (MSS) team during the summer of 2011. It gives some background on the motivation for this project and describes the expected benefit to the Cassini program. It then introduces the two tasks that I worked on - an automatic system auditing tool and a series of corrections to the Cassini Sequence Generator (SEQ_GEN) - and the specific objectives these tasks were to accomplish. Next, it details the approach I took to meet these objectives and the results of this approach, followed by a discussion of how the outcome of the project compares with my initial expectations. The paper concludes with a summary of my experience working on this project, lists what the next steps are, and acknowledges the help of my Cassini colleagues.

  13. Nonparametric Inference for Periodic Sequences

    KAUST Repository

    Sun, Ying

    2012-02-01

    This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.

  14. Genome Sequence of Mycobacteriophage Momo.

    Science.gov (United States)

    Pope, Welkin H; Bina, Elizabeth A; Brahme, Indraneel S; Hill, Amy B; Himmelstein, Philip H; Hunsicker, Sara M; Ish, Amanda R; Le, Tinh S; Martin, Mary M; Moscinski, Catherine N; Shetty, Sameer A; Swierzewski, Tomasz; Iyengar, Varun B; Kim, Hannah; Schafer, Claire E; Grubb, Sarah R; Warner, Marcie H; Bowman, Charles A; Russell, Daniel A; Hatfull, Graham F

    2015-06-18

    Momo is a newly discovered phage of Mycobacterium smegmatis mc(2)155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages. Copyright © 2015 Pope et al.

  15. Multiplicative LSTM for sequence modelling

    OpenAIRE

    Krause, Ben; Lu, Liang; Murray, Iain; Renals, Steve

    2016-01-01

    This paper introduces multiplicative LSTM, a novel hybrid recurrent neural network architecture for sequence modelling that combines the long short-term memory (LSTM) and multiplicative recurrent neural network architectures. Multiplicative LSTM is motivated by its flexibility to have very different recurrent transition functions for each possible input, which we argue helps make it more expressive in autoregressive density estimation. We show empirically that multiplicative LSTM outperforms ...

  16. Multineuronal Spike Sequences Repeat with Millisecond Precision

    Directory of Open Access Journals (Sweden)

    Koki eMatsumoto

    2013-06-01

    Full Text Available Cortical microcircuits are nonrandomly wired by neurons. As a natural consequence, spikes emitted by microcircuits are also nonrandomly patterned in time and space. One of the prominent spike organizations is a repetition of fixed patterns of spike series across multiple neurons. However, several questions remain unsolved, including how precisely spike sequences repeat, how the sequences are spatially organized, how many neurons participate in sequences, and how different sequences are functionally linked. To address these questions, we monitored spontaneous spikes of hippocampal CA3 neurons ex vivo using a high-speed functional multineuron calcium imaging technique that allowed us to monitor spikes with millisecond resolution and to record the location of spiking and nonspiking neurons. Multineuronal spike sequences were overrepresented in spontaneous activity compared to the statistical chance level. Approximately 75% of neurons participated in at least one sequence during our observation period. The participants were sparsely dispersed and did not show specific spatial organization. The number of sequences relative to the chance level decreased when larger time frames were used to detect sequences. Thus, sequences were precise at the millisecond level. Sequences often shared common spikes with other sequences; parts of sequences were subsequently relayed by following sequences, generating complex chains of multiple sequences.

  17. Method and apparatus for biological sequence comparison

    Science.gov (United States)

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  18. Static multiplicities in heterogeneous azeotropic distillation sequences

    DEFF Research Database (Denmark)

    Esbjerg, Klavs; Andersen, Torben Ravn; Jørgensen, Sten Bay

    1998-01-01

    In this paper the results of a bifurcation analysis on heterogeneous azeotropic distillation sequences are given. Two sequences suitable for ethanol dehydration are compared: The 'direct' and the 'indirect' sequence. It is shown, that the two sequences, despite their similarities, exhibit very...... different static behavior. The method of Petlyuk and Avet'yan (1971), Bekiaris et al. (1993), which assumes infinite reflux and infinite number of stages, is extended to and applied on heterogeneous azeotropic distillation sequences. The predictions are substantiated through simulations. The static sequence...

  19. On Inclusion Relations between Some Sequence Spaces

    Directory of Open Access Journals (Sweden)

    R. Çolak

    2016-01-01

    Full Text Available We determine the relations between the classes S^λ of almost λ-statistically convergent sequences and the relations between the classes V^,λ of strongly almost V,λ-summable sequences for various sequences λ, μ in the class Λ. Furthermore we also give the relations between the classes S^λ of almost λ-statistically convergent sequences and the classes V^,λ of strongly almost V,λ-summable sequences for various sequences λ,μ∈Λ.

  20. Bernoulli measure of complex admissible kneading sequences

    CERN Document Server

    Bruin, Henk

    2012-01-01

    Iterated quadratic polynomials give rise to a rich collection of different dynamical systems that are parametrized by a simple complex parameter $c$. The different dynamical features are encoded by the \\emph{kneading sequence} which is an infinite sequence over $\\{0,\\1\\}$. Not every such sequence actually occurs in complex dynamics. The set of admissible kneading sequences was described by Milnor and Thurston for real quadratic polynomials, and by the authors in the complex case. We prove that the set of admissible kneading sequences has positive Bernoulli measure within the set of sequences over $\\{0,\\1\\}$.

  1. Blind sequence-length estimation of low-SNR cyclostationary sequences

    CSIR Research Space (South Africa)

    Vlok, JD

    2014-06-01

    Full Text Available Several existing direct-sequence spread spectrum (DSSS) detection and estimation algorithms assume prior knowledge of the symbol period or sequence length, although very few sequence-length estimation techniques are available in the literature...

  2. Modified Genetic Algorithm for DNA Sequence Assembly by Shotgun and Hybridization Sequencing Techniques

    Directory of Open Access Journals (Sweden)

    Prof.Narayan Kumar Sahu

    2012-09-01

    Full Text Available Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion of pair wise fragment overlap. While shotgun sequencing infers a DNA sequence given the sequences of overlapping fragments, a recent and complementary method, called sequencing by hybridization (SBH, infers a DNA sequence given the set of oligomers that represents all sub words of some fixed length, k. In this paper, we propose a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods. Based on our preliminary investigations, the algorithm promises- to be very fast and practical for DNA sequence assembly [1].

  3. Biomolecule Sequencer: Nanopore Sequencing Technology for In-Situ Environmental Monitoring and Astrobiology

    Science.gov (United States)

    John, K. K.; Botkin, D. J.; Burton, A. S.; Castro-Wallace, S. L.; Chaput, J. D.; Dworkin, J. P.; Lupisella, M. L.; Mason, C. E.; Rubins, K. H.; Smith, D. J.; Stahl, S.; Switzer, C.

    2016-10-01

    Biomolecule Sequencer will demonstrate, for the first time, that DNA sequencing is feasible as a tool for in-situ environmental monitoring and astrobiology. A space-based sequencer could identify microbes, diseases, and help detect DNA-based life.

  4. Integration of retinal image sequences

    Science.gov (United States)

    Ballerini, Lucia

    1998-10-01

    In this paper a method for noise reduction in ocular fundus image sequences is described. The eye is the only part of the human body where the capillary network can be observed along with the arterial and venous circulation using a non invasive technique. The study of the retinal vessels is very important both for the study of the local pathology (retinal disease) and for the large amount of information it offers on systematic haemodynamics, such as hypertension, arteriosclerosis, and diabetes. In this paper a method for image integration of ocular fundus image sequences is described. The procedure can be divided in two step: registration and fusion. First we describe an automatic alignment algorithm for registration of ocular fundus images. In order to enhance vessel structures, we used a spatially oriented bank of filters designed to match the properties of the objects of interest. To evaluate interframe misalignment we adopted a fast cross-correlation algorithm. The performances of the alignment method have been estimated by simulating shifts between image pairs and by using a cross-validation approach. Then we propose a temporal integration technique of image sequences so as to compute enhanced pictures of the overall capillary network. Image registration is combined with image enhancement by fusing subsequent frames of a same region. To evaluate the attainable results, the signal-to-noise ratio was estimated before and after integration. Experimental results on synthetic images of vessel-like structures with different kind of Gaussian additive noise as well as on real fundus images are reported.

  5. Discrete low-discrepancy sequences

    CERN Document Server

    Angel, Omer; Martin, James B; Propp, James

    2009-01-01

    Holroyd and Propp used Hall's marriage theorem to show that, given a probability distribution pi on a finite set S, there exists an infinite sequence s_1,s_2,... in S such that for all integers k >= 1 and all s in S, the number of i in [1,k] with s_i = s differs from k pi(s) by at most 1. We prove a generalization of this result using a simple explicit algorithm. A special case of this algorithm yields an extension of Holroyd and Propp's result to the case of discrete probability distributions on infinite sets.

  6. Infinite matrices and sequence spaces

    CERN Document Server

    Cooke, Richard G

    2014-01-01

    This clear and correct summation of basic results from a specialized field focuses on the behavior of infinite matrices in general, rather than on properties of special matrices. Three introductory chapters guide students to the manipulation of infinite matrices, covering definitions and preliminary ideas, reciprocals of infinite matrices, and linear equations involving infinite matrices.From the fourth chapter onward, the author treats the application of infinite matrices to the summability of divergent sequences and series from various points of view. Topics include consistency, mutual consi

  7. Differential correlation for sequencing data.

    Science.gov (United States)

    Siska, Charlotte; Kechris, Katerina

    2017-01-19

    Several methods have been developed to identify differential correlation (DC) between pairs of molecular features from -omics studies. Most DC methods have only been tested with microarrays and other platforms producing continuous and Gaussian-like data. Sequencing data is in the form of counts, often modeled with a negative binomial distribution making it difficult to apply standard correlation metrics. We have developed an R package for identifying DC called Discordant which uses mixture models for correlations between features and the Expectation Maximization (EM) algorithm for fitting parameters of the mixture model. Several correlation metrics for sequencing data are provided and tested using simulations. Other extensions in the Discordant package include additional modeling for different types of differential correlation, and faster implementation, using a subsampling routine to reduce run-time and address the assumption of independence between molecular feature pairs. With simulations and breast cancer miRNA-Seq and RNA-Seq data, we find that Spearman's correlation has the best performance among the tested correlation methods for identifying differential correlation. Application of Spearman's correlation in the Discordant method demonstrated the most power in ROC curves and sensitivity/specificity plots, and improved ability to identify experimentally validated breast cancer miRNA. We also considered including additional types of differential correlation, which showed a slight reduction in power due to the additional parameters that need to be estimated, but more versatility in applications. Finally, subsampling within the EM algorithm considerably decreased run-time with negligible effect on performance. A new method and R package called Discordant is presented for identifying differential correlation with sequencing data. Based on comparisons with different correlation metrics, this study suggests Spearman's correlation is appropriate for sequencing data

  8. Asymptotics of Lagged Fibonacci Sequences

    CERN Document Server

    Mertens, Stephan

    2009-01-01

    Consider "lagged" Fibonacci sequences $a(n) = a(n-1)+a(\\lfloor n/k\\rfloor)$ for $k > 1$. We show that $\\lim_{n\\to\\infty} a(kn)/a(n)\\cdot\\ln n/n = k\\ln k$ and we demonstrate the slow numerical convergence to this limit and how to deal with this slow convergence. We also discuss the connection between two classical results of N.G. de Bruijn and K. Mahler on the asymptotics of $a(n)$.

  9. FOGSAA: Fast Optimal Global Sequence Alignment Algorithm

    Science.gov (United States)

    Chakraborty, Angana; Bandyopadhyay, Sanghamitra

    2013-04-01

    In this article we propose a Fast Optimal Global Sequence Alignment Algorithm, FOGSAA, which aligns a pair of nucleotide/protein sequences faster than any optimal global alignment method including the widely used Needleman-Wunsch (NW) algorithm. FOGSAA is applicable for all types of sequences, with any scoring scheme, and with or without affine gap penalty. Compared to NW, FOGSAA achieves a time gain of (70-90)% for highly similar nucleotide sequences (> 80% similarity), and (54-70)% for sequences having (30-80)% similarity. For other sequences, it terminates with an approximate score. For protein sequences, the average time gain is between (25-40)%. Compared to three heuristic global alignment methods, the quality of alignment is improved by about 23%-53%. FOGSAA is, in general, suitable for aligning any two sequences defined over a finite alphabet set, where the quality of the global alignment is of supreme importance.

  10. Genome Sequences of Eight Morphologically Diverse Alphaproteobacteria▿

    OpenAIRE

    Brown, Pamela J.B.; Kysela, David T.; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V

    2011-01-01

    The Alphaproteobacteriacomprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium.

  11. Genome sequences of eight morphologically diverse Alphaproteobacteria.

    Science.gov (United States)

    Brown, Pamela J B; Kysela, David T; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V

    2011-09-01

    The Alphaproteobacteria comprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium.

  12. Genome Sequences of Eight Morphologically Diverse Alphaproteobacteria▿

    Science.gov (United States)

    Brown, Pamela J. B.; Kysela, David T.; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V.

    2011-01-01

    The Alphaproteobacteriacomprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium. PMID:21705585

  13. The Art of Gymnastics: Creating Sequences.

    Science.gov (United States)

    Rovegno, Inez

    1988-01-01

    Offering students opportunities for creating movement sequences in gymnastics allows them to understand the essence of gymnastics, have creative experiences, and learn about themselves. The process of creating sequences is described. (MT)

  14. CONSONANT SEQUENCE REDUCTION IN CHILD PHONOLOGY

    African Journals Online (AJOL)

    Dr Kola ADENIYI

    in children's developmental difficulties, it is apparent that their vocal tracts are not ... proper articulation of certain complex sounds or sound sequences. ...... the nasal-oral consonant sequence is even limited in Yoruba, it means the frequency of.

  15. On topological spaces possessing uniformly distributed sequences

    CERN Document Server

    Bogachev, V I

    2007-01-01

    Two classes of topological spaces are introduced on which every probability Radon measure possesses a uniformly distributed sequence or a uniformly tight uniformly distributed sequence. It is shown that these classes are stable under multiplication by completely regular Souslin spaces

  16. On General Fibonacci Sequences in Groups

    OpenAIRE

    Özkan, Engin

    2003-01-01

    In this paper, we have constituted 3-step general Fibonacci sequences in a nilpotent group with exponent p (p is a prime number) and nilpotency class 4 and given formulas to find the a term of the sequence.

  17. On Paranorm Zweier -Convergent Sequence Spaces

    Directory of Open Access Journals (Sweden)

    Vakeel A. Khan

    2013-01-01

    Full Text Available In this paper, we introduce the paranorm Zweier -convergent sequence spaces , , and , a sequence of positive real numbers. We study some topological properties, prove the decomposition theorem, and study some inclusion relations on these spaces.

  18. Strong sequences and independent sets

    Directory of Open Access Journals (Sweden)

    Joanna Jureczko

    2016-05-01

    Full Text Available A family $\\mathcal{S} \\in \\mathcal{P}(\\omega$ is \\textit{an independent family} if for each pair $\\mathcal{A, B}$ of disjoint finite subsets of $\\mathcal{S}$ the set $\\bigcap \\mathcal{A} \\cap (\\omega \\setminus \\bigcup \\mathcal{B}$ is nonempty. The fact that there is an independent family on $\\omega$ of size continuum was proved by Fichtenholz and Kantorowicz in \\cite{FK}. If we substitute $\\mathcal{P}(\\omega$ by a set $(X, r$ with arbitrary relation \\textit{r} it is natural question about existence and length of an independent set on $(X, r$. In this paper special assumptions of such existence will be considered. On the other hand in 60s' of the last century the strong sequences method was introduced by Efimov. He used it for proving some famous theorems in dyadic spaces like: Marczewski theorem on cellularity, Shanin theorem on a calibre, Esenin-Volpin theorem and others. In this paper there will be considered: length of strong sequences, the length of independent sets and other well known cardinal invariants and there will be examined inequalities among them.

  19. Sequence Analysis in Demographic Research

    Directory of Open Access Journals (Sweden)

    Billari, Francesco C.

    2001-01-01

    Full Text Available EnglishThis paper examines the salient features of sequence analysis in demogrpahicresearch. The new approach allows a holistic perspective on life course analysis and is based on arepresentation of lives as sequences of states. Some of the methods for analyzing such data aresketched, from complex description to optimal matching ot monoethetic divisive algorithms. Afer ashort ilustration of a demographically-relevant example, the needs in terms of data collection and theopportunities of applying the same aproach to synthetic data are discussed.FrenchOn examine ici les principaux éléments de l’analyse par séquence endémographie. Cette nouvelle technique permet une perspective unifiée del’analyse du cours de la vie, en représentant la vie comme une série d’états.Certaines des méthodes pour de telles analyses sont décrites, en commençant parla description complexe, pour considérer ensuite les alignements optimales, etles algorithmes de division. Après un court exemple en démographie, onconsidère les besoins en données et les possibilités d’application aux donnéessynthétique.

  20. Information Analysis of DNA Sequences

    CERN Document Server

    Mohammed, Riyazuddin

    2010-01-01

    The problem of differentiating the informational content of coding (exons) and non-coding (introns) regions of a DNA sequence is one of the central problems of genomics. The introns are estimated to be nearly 95% of the DNA and since they do not seem to participate in the process of transcription of amino-acids, they have been termed "junk DNA." Although it is believed that the non-coding regions in genomes have no role in cell growth and evolution, demonstration that these regions carry useful information would tend to falsify this belief. In this paper, we consider entropy as a measure of information by modifying the entropy expression to take into account the varying length of these sequences. Exons are usually much shorter in length than introns; therefore the comparison of the entropy values needs to be normalized. A length correction strategy was employed using randomly generated nucleonic base strings built out of the alphabet of the same size as the exons under question. Our analysis shows that intron...

  1. Movement sequencing in Huntington disease.

    Science.gov (United States)

    Georgiou-Karistianis, Nellie; Long, Jeffrey D; Lourens, Spencer G; Stout, Julie C; Mills, James A; Paulsen, Jane S

    2014-08-01

    To examine longitudinal changes in movement sequencing in prodromal Huntington's disease (HD) participants (795 prodromal HD; 225 controls) from the PREDICT-HD study. Prodromal HD participants were tested over seven annual visits and were stratified into three groups (low, medium, high) based on their CAG-Age Product (CAP) score, which indicates likely increasing proximity to diagnosis. A cued movement sequence task assessed the impact of advance cueing on response initiation and execution via three levels of advance information. Compared to controls, all CAP groups showed longer initiation and movement times across all conditions at baseline, demonstrating a disease gradient for the majority of outcomes. Across all conditions, the high CAP group had the highest mean for baseline testing, but also demonstrated an increase in movement time across the study. For initiation time, the high CAP group showed the highest mean baseline time across all conditions, but also faster decreasing rates of change over time. With progress to diagnosis, participants may increasingly use compensatory strategies, as evidenced by faster initiation. However, this occurred in conjunction with slowed execution times, suggesting a decline in effectively accessing control processes required to translate movement into effective execution.

  2. Effects of Sequence Partitioning on Compression Rate

    CERN Document Server

    Alagoz, B Baykant

    2010-01-01

    In the paper, a theoretical work is done for investigating effects of splitting data sequence into packs of data set. We proved that a partitioning of data sequence is possible to find such that the entropy rate at each subsequence is lower than entropy rate of the source. Effects of sequence partitioning on overall compression rate are argued on the bases of partitioning statistics, and then, an optimization problem for an optimal partition is defined to improve overall compression rate of a sequence.

  3. Cross-correlation properties of cyclotomic sequences

    CERN Document Server

    Cai, Kai; Zheng, Zhiming

    2009-01-01

    Sequences with good correlation properties are widely used in engineering applications, especially in the area of communications. Among the known sequences, cyclotomic families have the optimal autocorrelation property. In this paper, we decide the cross-correlation function of the known cyclotomic sequences completely. Moreover, to get our results, the relations between the multiplier group and the decimations of the characteristic sequence are also established for an arbitrary difference set.

  4. Comparative genomics beyond sequence-based alignments

    DEFF Research Database (Denmark)

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.;

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...

  5. Incidental Sequence Learning across the Lifespan

    Science.gov (United States)

    Weiermann, Brigitte; Meier, Beat

    2012-01-01

    The purpose of the present study was to investigate incidental sequence learning across the lifespan. We tested 50 children (aged 7-16), 50 young adults (aged 20-30), and 50 older adults (aged >65) with a sequence learning paradigm that involved both a task and a response sequence. After several blocks of practice, all age groups slowed down…

  6. Hardware Acceleration of Bioinformatics Sequence Alignment Applications

    NARCIS (Netherlands)

    Hasan, L.

    2011-01-01

    Biological sequence alignment is an important and challenging task in bioinformatics. Alignment may be defined as an arrangement of two or more DNA or protein sequences to highlight the regions of their similarity. Sequence alignment is used to infer the evolutionary relationship between a set of pr

  7. Disease gene identification strategies for exome sequencing

    NARCIS (Netherlands)

    Gilissen, C.; Hoischen, A.; Brunner, H.G.; Veltman, J.A.

    2012-01-01

    Next generation sequencing can be used to search for Mendelian disease genes in an unbiased manner by sequencing the entire protein-coding sequence, known as the exome, or even the entire human genome. Identifying the pathogenic mutation amongst thousands to millions of genomic variants is a major c

  8. PacBio Sequencing and Its Applications

    Institute of Scientific and Technical Information of China (English)

    Anthony Rhoads; Kin Fai Au

    2015-01-01

    Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with dis-eases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Addition-ally, PacBio’s sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone.

  9. Joint Sequence Analysis: Association and Clustering

    Science.gov (United States)

    Piccarreta, Raffaella

    2017-01-01

    In its standard formulation, sequence analysis aims at finding typical patterns in a set of life courses represented as sequences. Recently, some proposals have been introduced to jointly analyze sequences defined on different domains (e.g., work career, partnership, and parental histories). We introduce measures to evaluate whether a set of…

  10. RNAome sequencing delineates the complete RNA landscape

    NARCIS (Netherlands)

    K.W.J. Derks (Kasper); J. Pothof (Joris)

    2015-01-01

    textabstractStandard RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species. For example, small and large RNAs from the same sample cannot be sequenced in a single sequence run. We designed RNAome sequencing, which is a strand-specif

  11. The recurrence sequence via the Fibonacci groups

    Science.gov (United States)

    Aküzüm, Yeşim; Deveci, Ömür

    2016-04-01

    This work develops properties of the recurrence sequence defined by the aid of the relation matrix of the Fibonacci groups. The study of this sequence modulo m yields cyclic groups and semigroups from generating matrix. Finally, we extend the sequence defined to groups and then, we obtain its period in the Fibonacci groups.

  12. Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi.

    Directory of Open Access Journals (Sweden)

    Leho Tedersoo

    Full Text Available Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/ for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/, the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

  13. Exome sequencing: what clinicians need to know

    Directory of Open Access Journals (Sweden)

    Sastre L

    2014-03-01

    Full Text Available Leandro SastreInstituto de Investigaciones Biomédicas, CSIC/UAM, C/Arturo Duperier 4, Madrid, Spain; Terapias Experimentales y Biomarcadores en Cáncer, IdiPaz, Madrid, Spain; CIBER de Enfermedades Raras, CIBERER, Valencia, SpainAbstract: The recent development of high throughput methods of deoxyribonucleic acid (DNA sequencing has made it possible to determine individual genome sequences and their specific variations. A region of particular interest is the protein-coding part of the genome, or exome, which is composed of gene exons. The principles of exome purification and sequencing will be described in this review, as well as analyses of the data generated. Results will be discussed in terms of their possible functional and clinical significance. The advantages and limitations of exome sequencing will be compared to those of other massive sequencing approaches such as whole-genome sequencing, ribonucleic acid sequencing or selected DNA sequencing. Exome sequencing has been used recently in the study of various diseases. Monogenic diseases with Mendelian inheritance are among these, but studies have also been carried out on genetic variations that represent risk factors for complex diseases. Cancer is another intensive area for exome sequencing studies. Several examples of the use of exome sequencing in the diagnosis, prognosis, and treatment of these diseases will be described. Finally, remaining challenges and some practical and ethical considerations for the clinical application of exome sequencing will be discussed.Keywords: massively parallel sequencing, RNA sequencing, whole-genome sequencing, genetic variants, molecular diagnosis, pharmacogenomics, personalized medicine, NGS, SGS, SNP, SNV

  14. Chip-based sequencing nucleic acids

    Energy Technology Data Exchange (ETDEWEB)

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  15. MatrixPlot: visualizing sequence constraints

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Stærfeldt, Hans Henrik; Lund, Ole

    1999-01-01

    MatrixPlot: visualizing sequence constraints. Sub-title Abstract Summary : MatrixPlot is a program for making high-quality matrix plots, such as mutual information plots of sequence alignments and distance matrices of sequences with known three-dimensional coordinates. The user can add information...... about the sequences (e.g. a sequence logo profile) along the edges of the plot, as well as zoom in on any region in the plot. Availability : MatrixPlot can be obtained on request, and can also be accessed online at http://www. cbs.dtu.dk/services/MatrixPlot. Contact : gorodkin@cbs.dtu.dk...

  16. Chip-based sequencing nucleic acids

    Science.gov (United States)

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  17. MatrixPlot: visualizing sequence constraints

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Stærfeldt, Hans Henrik; Lund, Ole

    1999-01-01

    MatrixPlot: visualizing sequence constraints. Sub-title Abstract Summary : MatrixPlot is a program for making high-quality matrix plots, such as mutual information plots of sequence alignments and distance matrices of sequences with known three-dimensional coordinates. The user can add information...... about the sequences (e.g. a sequence logo profile) along the edges of the plot, as well as zoom in on any region in the plot. Availability : MatrixPlot can be obtained on request, and can also be accessed online at http://www. cbs.dtu.dk/services/MatrixPlot. Contact : gorodkin@cbs.dtu.dk...

  18. Permutation Entropy for Random Binary Sequences

    Directory of Open Access Journals (Sweden)

    Lingfeng Liu

    2015-12-01

    Full Text Available In this paper, we generalize the permutation entropy (PE measure to binary sequences, which is based on Shannon’s entropy, and theoretically analyze this measure for random binary sequences. We deduce the theoretical value of PE for random binary sequences, which can be used to measure the randomness of binary sequences. We also reveal the relationship between this PE measure with other randomness measures, such as Shannon’s entropy and Lempel–Ziv complexity. The results show that PE is consistent with these two measures. Furthermore, we use PE as one of the randomness measures to evaluate the randomness of chaotic binary sequences.

  19. Computing with Hereditarily Finite Sequences

    CERN Document Server

    Tarau, Paul

    2011-01-01

    e use Prolog as a flexible meta-language to provide executable specifications of some fundamental mathematical objects and their transformations. In the process, isomorphisms are unraveled between natural numbers and combinatorial objects (rooted ordered trees representing hereditarily finite sequences and rooted ordered binary trees representing G\\"odel's System {\\bf T} types). This paper focuses on an application that can be seen as an unexpected "paradigm shift": we provide recursive definitions showing that the resulting representations are directly usable to perform symbolically arbitrary-length integer computations. Besides the theoretically interesting fact of "breaking the arithmetic/symbolic barrier", the arithmetic operations performed with symbolic objects like trees or types turn out to be genuinely efficient -- we derive implementations with asymptotic performance comparable to ordinary bitstring implementations of arbitrary-length integer arithmetic. The source code of the paper, organized as a ...

  20. [DNA sequencing technology and automatization of it].

    Science.gov (United States)

    Kraev, A S

    1991-01-01

    Precise manipulations with genetic material, typical for modern experiments in molecular biology and in new biotechnology, require a capability to determine DNA base sequence. This capability enables today to exploit specific genetic knowledge for the dissection of complex cell processes and for modulation of cell metabolism in transgenic organisms. The review focuses on such DNA sequencing technologies that are widespread in general laboratory practice. They can safely be called, with the availability of commercial reagents, industrial techniques. Modern DNA sequencing requires recurrent breakdown of large genomic DNA into smaller pieces, that are then amplified, sequenced and the initial long stretch reconstructed via overlap of small pieces. The DNA sequencing process has several steps: a DNA fragment is obtained in sufficient quantity and purity, it is converted to a form suitable for a particular sequencing method, a sequencing reaction is performed and its products fractionated; and finally the resultant data are interpreted (i.e. an autoradiograph is read into a computer memory) and a long sequence in reconstructed via overlap of short stretches. These steps are considered in separate parts; an accent is made on sequencing strategies with respect to their biological task. In the last part, possibilities for automation of sequencing experiment are considered, followed by a discussion of domestic problems in DNA sequencing.

  1. Experimental investigation of an RNA sequence space

    Science.gov (United States)

    Lee, Youn-Hyung; Dsouza, Lisa; Fox, George E.

    1993-12-01

    Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring, and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs. This approach will allow direct study of the constraints governing RNA evolution and allow inquiry into how the last common ancestor of extant life apparently came to have very complex ribosomal RNAs that subsequently were very conserved.

  2. Discovering novel sequence motifs with MEME.

    Science.gov (United States)

    Bailey, Timothy L

    2002-11-01

    This unit illustrates how to use MEME to discover motifs in a group of related nucleotide or peptide sequences. A MEME motif is a sequence pattern that occurs repeatedly in one or more sequences in the input group. MEME can be used to discover novel patterns because it bases its discoveries only on the input sequences, not on any prior knowledge (such as databases of known motifs). The input to MEME is a set of unaligned sequences of the same type (peptide or nucleotide). For each motif it discovers, MEME reports the occurrences (sites), consensus sequence, and the level of conservation (information content) at each position in the pattern. MEME also produces block diagrams showing where all of the discovered motifs occur in the training set sequences. MEME's hypertext (HTML) output also contains buttons that allow for the convenient use of the motifs in other searches.

  3. Design of Digital Hybrid Chaotic Sequence Generator

    Institute of Scientific and Technical Information of China (English)

    RAO Nini; ZENG Dong

    2004-01-01

    The feasibility of the hybrid chaotic sequences as the spreading codes in code divided multiple access(CDMA) system is analyzed.The design and realization of the digital hybrid chaotic sequence generator by very high speed integrated circuit hardware description language(VHDL) are described.A valid hazard canceledl method is presented.Computer simulations show that the stable digital sequence waveforms can be produced.The correlations of the digital hybrid chaotic sequences are compared with those of m-sequences.The results show that the correlations of the digital hybrid chaotic sequences are almost as good as those of m-sequences.The works in this paper explored a road for the practical applications of chaos.

  4. Sequencing technologies for animal cell culture research.

    Science.gov (United States)

    Kremkow, Benjamin G; Lee, Kelvin H

    2015-01-01

    Over the last 10 years, 2nd and 3rd generation sequencing technologies have made the use of genomic sequencing within the animal cell culture community increasingly commonplace. Each technology's defining characteristics are unique, including the cost, time, sequence read length, daily throughput, and occurrence of sequence errors. Given each sequencing technology's intrinsic advantages and disadvantages, the optimal technology for a given experiment depends on the particular experiment's objective. This review discusses the current characteristics of six next-generation sequencing technologies, compares the differences between them, and characterizes their relevance to the animal cell culture community. These technologies are continually improving, as evidenced by the recent achievement of the field's benchmark goal: sequencing a human genome for less than $1,000.

  5. Assembly Sequence Planning for Mechanical Products

    Institute of Scientific and Technical Information of China (English)

    1999-01-01

    A method for assembly sequence planning is proposed in this paper. First, two methods for assembly sequence planning are compared, which are indirect method and direct method. Then, the limits of the previous assembly planning system are pointed out. On the basis of indirect method, an improved method for assembly sequence planning is put forward. This method is composed of four parts, which are assembly modeling for products, assembly sequence representing, assembly sequence planning, and evaluation and optimization. The assembly model is established by human machine interaction, and the assembly model contains components' information and the assembly relation among the components. The assembly sequence planning is based on the breaking up of the assembly model. And/or graph is used to represent assembly sequence set. Every component which satisfies the disassembly condition is recorded as a node of an and/or graph. After the disassembly sequence and/or graph is generated, heuristic algorithm - AO* algorithm is used to search the disassembly sequence and/or graph, and the optimum assembly sequence planning is realized. This method is proved to be effective in a prototype system which is a sub-project of a state 863/CIMS research project of China - ‘Concurrent Engineering’.

  6. Randomness in Sequence Evolution Increases over Time.

    Directory of Open Access Journals (Sweden)

    Guangyu Wang

    Full Text Available The second law of thermodynamics states that entropy, as a measure of randomness in a system, increases over time. Although studies have investigated biological sequence randomness from different aspects, it remains unknown whether sequence randomness changes over time and whether this change consists with the second law of thermodynamics. To capture the dynamics of randomness in molecular sequence evolution, here we detect sequence randomness based on a collection of eight statistical random tests and investigate the randomness variation of coding sequences with an application to Escherichia coli. Given that core/essential genes are more ancient than specific/non-essential genes, our results clearly show that core/essential genes are more random than specific/non-essential genes and accordingly indicate that sequence randomness indeed increases over time, consistent well with the second law of thermodynamics. We further find that an increase in sequence randomness leads to increasing randomness of GC content and longer sequence length. Taken together, our study presents an important finding, for the first time, that sequence randomness increases over time, which may provide profound insights for unveiling the underlying mechanisms of molecular sequence evolution.

  7. Effects of an Additional Sequence of Color Stimuli on Visuomotor Sequence Learning

    Science.gov (United States)

    Tanaka, Kanji; Watanabe, Katsumi

    2017-01-01

    Through practice, people are able to integrate a secondary sequence (e.g., a stimulus-based sequence) into a primary sequence (e.g., a response-based sequence), but it is still controversial whether the integrated sequences lead to better learning than only the primary sequence. In the present study, we aimed to investigate the effects of a sequence that integrated space and color sequences on early and late learning phases (corresponding to effector-independent and effector-dependent learning, respectively) and how the effects differed in the integrated and primary sequences in each learning phase. In the task, the participants were required to learn a sequence of button presses using trial-and-error and to perform the sequence successfully for 20 trials (m × n task). First, in the baseline task, all participants learned a non-colored sequence, in which the response button always turned red. Then, in the learning task, the participants were assigned to two groups: a colored sequence group (i.e., space and color) or a non-colored sequence group (i.e., space). In the colored sequence, the response button turned a pre-determined color and the participants were instructed to attend to the sequences of both location and color as much as they could. The results showed that the participants who performed the colored sequence acquired the correct button presses of the sequence earlier, but showed a slower mean performance time than those who performed the non-colored sequence. Moreover, the slower performance time in the colored sequence group remained in a subsequent transfer task in which the spatial configurations of the buttons were vertically mirrored from the learning task. These results indicated that if participants explicitly attended to both the spatial response sequence and color stimulus sequence at the same time, they could develop their spatial representations of the sequence earlier (i.e., early development of the effector-independent learning), but might

  8. Comparison of sequence reads obtained from three next-generation sequencing platforms.

    Directory of Open Access Journals (Sweden)

    Shingo Suzuki

    Full Text Available Next-generation sequencing technologies enable the rapid cost-effective production of sequence data. To evaluate the performance of these sequencing technologies, investigation of the quality of sequence reads obtained from these methods is important. In this study, we analyzed the quality of sequence reads and SNP detection performance using three commercially available next-generation sequencers, i.e., Roche Genome Sequencer FLX System (FLX, Illumina Genome Analyzer (GA, and Applied Biosystems SOLiD system (SOLiD. A common genomic DNA sample obtained from Escherichia coli strain DH1 was applied to these sequencers. The obtained sequence reads were aligned to the complete genome sequence of E. coli DH1, to evaluate the accuracy and sequence bias of these sequence methods. We found that the fraction of "junk" data, which could not be aligned to the reference genome, was largest in the data set of SOLiD, in which about half of reads could not be aligned. Among data sets after alignment to the reference, sequence accuracy was poorest in GA data sets, suggesting relatively low fidelity of the elongation reaction in the GA method. Furthermore, by aligning the sequence reads to the E. coli strain W3110, we screened sequence differences between two E. coli strains using data sets of three different next-generation platforms. The results revealed that the detected sequence differences were similar among these three methods, while the sequence coverage required for the detection was significantly small in the FLX data set. These results provided valuable information on the quality of short sequence reads and the performance of SNP detection in three next-generation sequencing platforms.

  9. Continued fractions and heavy sequences

    CERN Document Server

    Boshernitzan, Michael

    2009-01-01

    We initiate the study of the sets $H(c)$, $0=x-[x]$ stands for the fractional part of $x\\in \\mathbb R$. We prove that, for rational $c$, the sets $H(c)$ are of positive Hausdorff dimension and, in particular, are uncountable. For integers $m\\geq1$, we obtain a surprising characterization of the numbers $\\alpha\\in H_m= H(\\frac1m)$ in terms of their continued fraction expansions: The odd entries (partial quotients) of these expansions are divisible by $m$. The characterization implies that $x\\in H_m$ if and only if $\\frac 1{mx} \\in H_m$, for $x>0$. We are unaware of a direct proof of this equivalence, without making a use of the mentioned characterization of the sets $H_m$. We also introduce the dual sets $\\hat H_m$ of reals $y$ for which the sequence of integers $\\big([ky]\\big)_{k\\geq1}$ consistently hits the set $m\\mathbb Z$ with the at least expected frequency $\\frac1m$ and establish the connection with the sets $H_m$: {2mm} If $xy=m$ for $x,y>0$, then $x\\in H_m$ if and only if $y\\in \\hat H_m$. The motivatio...

  10. Pareto optimal pairwise sequence alignment.

    Science.gov (United States)

    DeRonne, Kevin W; Karypis, George

    2013-01-01

    Sequence alignment using evolutionary profiles is a commonly employed tool when investigating a protein. Many profile-profile scoring functions have been developed for use in such alignments, but there has not yet been a comprehensive study of Pareto optimal pairwise alignments for combining multiple such functions. We show that the problem of generating Pareto optimal pairwise alignments has an optimal substructure property, and develop an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. All possible sets of two, three, and four profile scoring functions are used from a pool of 11 functions and applied to 588 pairs of proteins in the ce_ref data set. The performance of the best objective combinations on ce_ref is also evaluated on an independent set of 913 protein pairs extracted from the BAliBASE RV11 data set. Our dynamic-programming-based heuristic approach produces approximated Pareto optimal frontiers of pairwise alignments that contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments whose quality is better than the alignments obtained by single objectives. However, the task of identifying a single high-quality alignment among those in the Pareto frontier remains challenging.

  11. Comparison of next-generation sequencing systems.

    Science.gov (United States)

    Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie

    2012-01-01

    With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized.

  12. The genome sequence of parrot bornavirus 5.

    Science.gov (United States)

    Guo, Jianhua; Tizard, Ian

    2015-12-01

    Although several new avian bornaviruses have recently been described, information on their evolution, virulence, and sequence are often limited. Here we report the complete genome sequence of parrot bornavirus 5 (PaBV-5) isolated from a case of proventricular dilatation disease in a Palm cockatoo (Probosciger aterrimus). The complete genome consists of 8842 nucleotides with distinct 5' and 3' end sequences. This virus shares nucleotide sequence identities of 69-74 % with other bornaviruses in the genomic regions excluding the 5' and 3' terminal sequences. Phylogenetic analysis based on the genomic regions demonstrated this new isolate is an isolated branch within the clade that includes the aquatic bird bornaviruses and the passerine bornaviruses. Based on phylogenetic analyses and its low nucleotide sequence identities with other bornavirus, we support the proposal that PaBV-5 be assigned to a new bornavirus species:- Psittaciform 2 bornavirus.

  13. Application of ecostratigraphy to sequences tratigraphy

    Institute of Scientific and Technical Information of China (English)

    殷鸿福; 童金南; 张克信; 吴顺宝

    1997-01-01

    The results of ecostratigraphy can directly serve sequence stratigraphy. The habitat type curve is useful not only in the analysis of sequences and parasequences, but also in demonstration of the process of regional sea level change. The various biological surfaces usually coincide with or relate to the boundaries of sequences or system tracts. The ecostratigraphic framework composed of coenozones, community sequences and ecotracts with good timing completely corresponds to the sequence stratigraphic framework of the sedimentary basin. Therefore, through establishment of the habitat type curve in individual section, recognition of the various biological surfaces, regional ecostratigraphic correlation and the formation of an ecostratigraphic framework of the sedimentary basin, ecostratigraphy plays an important role in the study of sequence stratigraphy and the reconstruction of regional and even global sea level changes.

  14. Sequencing intractable DNA to close microbial genomes.

    Science.gov (United States)

    Hurt, Richard A; Brown, Steven D; Podar, Mircea; Palumbo, Anthony V; Elias, Dwayne A

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  15. Some fundamenltal problems in outcrop sequence stratigraphy

    Institute of Scientific and Technical Information of China (English)

    王训练

    1999-01-01

    Some fundamental problems in outcrop sequence stratigraphy are discussed, and the following ideas are obtained: (i) Detailed sedimentary facies analysis and study on stacking pattern of parasequences, careful and accurate study of biostratigraphy, and stratigraphical correlation of different facies areas are the essential conditions for proper identification of sequences. (ii) The first flooding surface may be an ideal sequence boundary in outcrop sequence stratigraphy, where the most distinct palaeontological and sedimentary changes take place and make the surface readily recognizable in outcrop. (iii) The distribution in space, specially in different facies belts, is regarded as an important criterion for defining and recognizing the various orders of sequences. The third-order sequence is probably global in nature , which may be discerned in various depositional facies belts at least on one continental margin, and can be correlated over long distances, sometimes worldwide. (iv) The first flooding surf

  16. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  17. Evolutionarily conserved sequences on human chromosome 21

    Energy Technology Data Exchange (ETDEWEB)

    Frazer, Kelly A.; Sheehan, John B.; Stokowski, Renee P.; Chen, Xiyin; Hosseini, Roya; Cheng, Jan-Fang; Fodor, Stephen P.A.; Cox, David R.; Patil, Nila

    2001-09-01

    Comparison of human sequences with the DNA of other mammals is an excellent means of identifying functional elements in the human genome. Here we describe the utility of high-density oligonucleotide arrays as a rapid approach for comparing human sequences with the DNA of multiple species whose sequences are not presently available. High-density arrays representing approximately 22.5 Mb of nonrepetitive human chromosome 21 sequence were synthesized and then hybridized with mouse and dog DNA to identify sequences conserved between humans and mice (human-mouse elements) and between humans and dogs (human-dog elements). Our data show that sequence comparison of multiple species provides a powerful empiric method for identifying actively conserved elements in the human genome. A large fraction of these evolutionarily conserved elements are present in regions on chromosome 21 that do not encode known genes.

  18. Predicting Contextual Sequences via Submodular Function Maximization

    CERN Document Server

    Dey, Debadeepta; Hebert, Martial; Bagnell, J Andrew

    2012-01-01

    Sequence optimization, where the items in a list are ordered to maximize some reward has many applications such as web advertisement placement, search, and control libraries in robotics. Previous work in sequence optimization produces a static ordering that does not take any features of the item or context of the problem into account. In this work, we propose a general approach to order the items within the sequence based on the context (e.g., perceptual information, environment description, and goals). We take a simple, efficient, reduction-based approach where the choice and order of the items is established by repeatedly learning simple classifiers or regressors for each "slot" in the sequence. Our approach leverages recent work on submodular function maximization to provide a formal regret reduction from submodular sequence optimization to simple cost-sensitive prediction. We apply our contextual sequence prediction algorithm to optimize control libraries and demonstrate results on two robotics problems: ...

  19. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

    DEFF Research Database (Denmark)

    de Souza, S J; Camargo, A A; Briones, M R;

    2000-01-01

    by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48......Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central...... coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1, 181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes...

  20. Preparing DNA libraries for multiplexed paired-end deep sequencing for Illumina GA sequencers.

    Science.gov (United States)

    Son, Mike S; Taylor, Ronald K

    2011-02-01

    Whole-genome sequencing, also known as deep sequencing, is becoming a more affordable and efficient way to identify SNP mutations, deletions, and insertions in DNA sequences across several different strains. Two major obstacles preventing the widespread use of deep sequencers are the costs involved in services used to prepare DNA libraries for sequencing and the overall accuracy of the sequencing data. This unit describes the preparation of DNA libraries for multiplexed paired-end sequencing using the Illumina GA series sequencer. Self-preparation of DNA libraries can help reduce overall expenses, especially if optimization is required for the different samples, and use of the Illumina GA Sequencer can improve the quality of the data.

  1. EGNAS: an exhaustive DNA sequence design algorithm

    Directory of Open Access Journals (Sweden)

    Kick Alfred

    2012-06-01

    Full Text Available Abstract Background The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of sequences with defined properties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences offers the possibility of controlling both interstrand and intrastrand properties. The guanine-cytosine content can be adjusted. Sequences can be forced to start and end with guanine or cytosine. This option reduces the risk of “fraying” of DNA strands. It is possible to limit cross hybridizations of a defined length, and to adjust the uniqueness of sequences. Self-complementarity and hairpin structures of certain length can be avoided. Sequences and subsequences can optionally be forbidden. Furthermore, sequences can be designed to have minimum interactions with predefined strands and neighboring sequences. Results The algorithm is realized in a C++ program. TAG sequences can be generated and combined with primers for single-base extension reactions, which were described for multiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldback through intrastrand interaction of TAG-primer pairs can be limited. The design of sequences for specific attachment of molecular constructs to DNA origami is presented. Conclusions We developed a new software tool called EGNAS for the design of unique nucleic acid sequences. The presented exhaustive algorithm allows to generate greater sets of sequences than with previous software and equal constraints. EGNAS is freely available for noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS.

  2. WEAK CONVERGENCE OF HENSTOCK INTEGRABLE SEQUENCES

    Institute of Scientific and Technical Information of China (English)

    LuisaDiPiazza

    1994-01-01

    Some relationships between pointwise and weak convergence of a sequence of Henstock integrable functions are studied, In particular it is provided an example of a sequence of Henstock integrable functions whose pointwise limit is different from the weak one. By introducing an asymptotic version of the Henstock equiintegrability notion it is given a necessary and sufficient condition in order that a pointwisely convergent sequence of Henstock integrable functions is weakly convergent to its pointwise limit.

  3. Exome sequencing and genetic testing for MODY.

    Directory of Open Access Journals (Sweden)

    Stefan Johansson

    Full Text Available CONTEXT: Genetic testing for monogenic diabetes is important for patient care. Given the extensive genetic and clinical heterogeneity of diabetes, exome sequencing might provide additional diagnostic potential when standard Sanger sequencing-based diagnostics is inconclusive. OBJECTIVE: The aim of the study was to examine the performance of exome sequencing for a molecular diagnosis of MODY in patients who have undergone conventional diagnostic sequencing of candidate genes with negative results. RESEARCH DESIGN AND METHODS: We performed exome enrichment followed by high-throughput sequencing in nine patients with suspected MODY. They were Sanger sequencing-negative for mutations in the HNF1A, HNF4A, GCK, HNF1B and INS genes. We excluded common, non-coding and synonymous gene variants, and performed in-depth analysis on filtered sequence variants in a pre-defined set of 111 genes implicated in glucose metabolism. RESULTS: On average, we obtained 45 X median coverage of the entire targeted exome and found 199 rare coding variants per individual. We identified 0-4 rare non-synonymous and nonsense variants per individual in our a priori list of 111 candidate genes. Three of the variants were considered pathogenic (in ABCC8, HNF4A and PPARG, respectively, thus exome sequencing led to a genetic diagnosis in at least three of the nine patients. Approximately 91% of known heterozygous SNPs in the target exomes were detected, but we also found low coverage in some key diabetes genes using our current exome sequencing approach. Novel variants in the genes ARAP1, GLIS3, MADD, NOTCH2 and WFS1 need further investigation to reveal their possible role in diabetes. CONCLUSION: Our results demonstrate that exome sequencing can improve molecular diagnostics of MODY when used as a complement to Sanger sequencing. However, improvements will be needed, especially concerning coverage, before the full potential of exome sequencing can be realized.

  4. cis sequence effects on gene expression

    Directory of Open Access Journals (Sweden)

    Jacobs Kevin

    2007-08-01

    Full Text Available Abstract Background Sequence and transcriptional variability within and between individuals are typically studied independently. The joint analysis of sequence and gene expression variation (genetical genomics provides insight into the role of linked sequence variation in the regulation of gene expression. We investigated the role of sequence variation in cis on gene expression (cis sequence effects in a group of genes commonly studied in cancer research in lymphoblastoid cell lines. We estimated the proportion of genes exhibiting cis sequence effects and the proportion of gene expression variation explained by cis sequence effects using three different analytical approaches, and compared our results to the literature. Results We generated gene expression profiling data at N = 697 candidate genes from N = 30 lymphoblastoid cell lines for this study and used available candidate gene resequencing data at N = 552 candidate genes to identify N = 30 candidate genes with sufficient variance in both datasets for the investigation of cis sequence effects. We used two additive models and the haplotype phylogeny scanning approach of Templeton (Tree Scanning to evaluate association between individual SNPs, all SNPs at a gene, and diplotypes, with log-transformed gene expression. SNPs and diplotypes at eight candidate genes exhibited statistically significant (p cis sequence effects in our study, respectively. Conclusion Based on analysis of our results and the extant literature, one in four genes exhibits significant cis sequence effects, and for these genes, about 30% of gene expression variation is accounted for by cis sequence variation. Despite diverse experimental approaches, the presence or absence of significant cis sequence effects is largely supported by previously published studies.

  5. Some properties of generalized Fibonacci sequence

    Science.gov (United States)

    Chong, Chin-Yoon; Ho, C. K.

    2015-12-01

    For all non-negative integer n and real constants a, b, p and q, the generalized Fibonacci sequence {U n } is defined by Un+2 = pUn+1 + qUn with the initial values U0 = a and U1 = b. Throughout the paper, we study some properties of the generalized Fibonacci sequence. Our results will motivate some new research problems concerning the contribution of the generalized sequence.

  6. Hardware Accelerated Sequence Alignment with Traceback

    Directory of Open Access Journals (Sweden)

    Scott Lloyd

    2009-01-01

    in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations. With 256 processing elements in FPGA technology, a performance gain over 300 times that of a desktop computer is demonstrated on sequence lengths of 16000. For greater performance, the architecture is scalable to more processing elements.

  7. Recursive sequences in first-year calculus

    Science.gov (United States)

    Krainer, Thomas

    2016-02-01

    This article provides ready-to-use supplementary material on recursive sequences for a second-semester calculus class. It equips first-year calculus students with a basic methodical procedure based on which they can conduct a rigorous convergence or divergence analysis of many simple recursive sequences on their own without the need to invoke inductive arguments as is typically required in calculus textbooks. The sequences that are accessible to this kind of analysis are predominantly (eventually) monotonic, but also certain recursive sequences that alternate around their limit point as they converge can be considered.

  8. Maize genome sequencing by methylation filtration.

    Science.gov (United States)

    Palmer, Lance E; Rabinowicz, Pablo D; O'Shaughnessy, Andrew L; Balija, Vivekanand S; Nascimento, Lidia U; Dike, Sujit; de la Bastide, Melissa; Martienssen, Robert A; McCombie, W Richard

    2003-12-19

    Gene enrichment strategies offer an alternative to sequencing large and repetitive genomes such as that of maize. We report the generation and analysis of nearly 100,000 undermethylated (or methylation filtration) maize sequences. Comparison with the rice genome reveals that methylation filtration results in a more comprehensive representation of maize genes than those that result from expressed sequence tags or transposon insertion sites sequences. About 7% of the repetitive DNA is unmethylated and thus selected in our libraries, but potentially active transposons and unmethylated organelle genomes can be identified. Reverse transcription polymerase chain reaction can be used to finish the maize transcriptome.

  9. Genomic sequencing of Pleistocene cave bears

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  10. Detecting Emotions from Connected Action Sequences

    Science.gov (United States)

    Bernhardt, Daniel; Robinson, Peter

    In this paper we deal with the problem of detecting emotions from the body movements produced by naturally connected action sequences. Although action sequences are one of the most common forms of body motions in everyday scenarios their potential for emotion recognition has not been explored in the past. We show that there are fundamental differences between actions recorded in isolation and in natural sequences and demonstrate a number of techniques which allow us to correctly label action sequences with one of four emotions up to 86% of the time. Our results bring us an important step closer to recognizing emotions from body movements in natural scenarios.

  11. Researches on Sequence of Plant Cystatin: Phytocystatin

    Institute of Scientific and Technical Information of China (English)

    QINQingfeng; HEWei; LIANGJun; ZHANGXingyao

    2005-01-01

    Plant cystatins or phytocystatins are cysteine proteinase inhibitors exist widely in different plant species. Because they can kill insects by inhibiting the digestive function of the cysteine proteinase in gut, they are believed to play an important role in plant's defense against pests. Phytocystatins contain the conserved QXVXG motif and show some features on their sequence different to animal cystatins.After sequencing the protein directly and the cDNA clone, a large number of plant cystatins have been characterized. A multialignment with BLAST software and a detail analysis of 38 phytocystatins show that phytocystatins possess a specific conserved amino acid sequence [LRVI]-[AGT]-[RQKE]-[FY]-[AS]-[VI]-X-[EGHDQV]-[HYFQ]-N different to the conserved sequence demonstrated by Margis in 1998. This conserved sequence can be enough to detect with exclusivity phytocystatin sequences on protein data banks. A classification of these phytocystatins is performed and they can be divided into 3 groups according to their features on amino acid sequence, and the group-I can be still divided into 3 subgroups based on the feature of their amino acid and genomic sequence. By the CLUSTALX software,the most conserved nucleotide sequences of phytocystatins were found, which could be used to design the degenerate premiers to search new phytocystatins with PCR reaction.

  12. Massively parallel sequencing of forensic STRs

    DEFF Research Database (Denmark)

    Parson, Walther; Ballard, David; Budowle, Bruce

    2016-01-01

    The DNA Commission of the International Society for Forensic Genetics (ISFG) is reviewing factors that need to be considered ahead of the adoption by the forensic community of short tandem repeat (STR) genotyping by massively parallel sequencing (MPS) technologies. MPS produces sequence data...... accessible genome assembly, and in place before the uptake of MPS by the general forensic community starts to generate sequence data on a large scale. While the established nomenclature for CE-based STR analysis will remain unchanged in the future, the nomenclature of sequence-based STR genotypes will need...

  13. Visible periodicity of strong nucleosome DNA sequences.

    Science.gov (United States)

    Salih, Bilal; Tripathi, Vijay; Trifonov, Edward N

    2015-01-01

    Fifteen years ago, Lowary and Widom assembled nucleosomes on synthetic random sequence DNA molecules, selected the strongest nucleosomes and discovered that the TA dinucleotides in these strong nucleosome sequences often appear at 10-11 bases from one another or at distances which are multiples of this period. We repeated this experiment computationally, on large ensembles of natural genomic sequences, by selecting the strongest nucleosomes--i.e. those with such distances between like-named dinucleotides, multiples of 10.4 bases, the structural and sequence period of nucleosome DNA. The analysis confirmed the periodicity of TA dinucleotides in the strong nucleosomes, and revealed as well other periodic sequence elements, notably classical AA and TT dinucleotides. The matrices of DNA bendability and their simple linear forms--nucleosome positioning motifs--are calculated from the strong nucleosome DNA sequences. The motifs are in full accord with nucleosome positioning sequences derived earlier, thus confirming that the new technique, indeed, detects strong nucleosomes. Species- and isochore-specific variations of the matrices and of the positioning motifs are demonstrated. The strong nucleosome DNA sequences manifest the highest hitherto nucleosome positioning sequence signals, showing the dinucleotide periodicities in directly observable rather than in hidden form.

  14. Multiplexed microsatellite recovery using massively parallel sequencing.

    Science.gov (United States)

    Jennings, T N; Knaus, B J; Mullins, T D; Haig, S M; Cronn, R C

    2011-11-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356,958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5 M (USD).

  15. Locomotor sequence learning in visually guided walking.

    Science.gov (United States)

    Choi, Julia T; Jensen, Peter; Nielsen, Jens Bo

    2016-04-01

    Voluntary limb modifications must be integrated with basic walking patterns during visually guided walking. In this study we tested whether voluntary gait modifications can become more automatic with practice. We challenged walking control by presenting visual stepping targets that instructed subjects to modify step length from one trial to the next. Our sequence learning paradigm is derived from the serial reaction-time (SRT) task that has been used in upper limb studies. Both random and ordered sequences of step lengths were used to measure sequence-specific and sequence-nonspecific learning during walking. In addition, we determined how age (i.e., healthy young adults vs. children) and biomechanical factors (i.e., walking speed) affected the rate and magnitude of locomotor sequence learning. The results showed that healthy young adults (age 24 ± 5 yr,n= 20) could learn a specific sequence of step lengths over 300 training steps. Younger children (age 6-10 yr,n= 8) had lower baseline performance, but their magnitude and rate of sequence learning were the same compared with those of older children (11-16 yr,n= 10) and healthy adults. In addition, learning capacity may be more limited at faster walking speeds. To our knowledge, this is the first study to demonstrate that spatial sequence learning can be integrated with a highly automatic task such as walking. These findings suggest that adults and children use implicit knowledge about the sequence to plan and execute leg movement during visually guided walking.

  16. A measurement of disorder in binary sequences

    Science.gov (United States)

    Gong, Longyan; Wang, Haihong; Cheng, Weiwen; Zhao, Shengmei

    2015-03-01

    We propose a complex quantity, AL, to characterize the degree of disorder of L-length binary symbolic sequences. As examples, we respectively apply it to typical random and deterministic sequences. One kind of random sequences is generated from a periodic binary sequence and the other is generated from the logistic map. The deterministic sequences are the Fibonacci and Thue-Morse sequences. In these analyzed sequences, we find that the modulus of AL, denoted by |AL | , is a (statistically) equivalent quantity to the Boltzmann entropy, the metric entropy, the conditional block entropy and/or other quantities, so it is a useful quantitative measure of disorder. It can be as a fruitful index to discern which sequence is more disordered. Moreover, there is one and only one value of |AL | for the overall disorder characteristics. It needs extremely low computational costs. It can be easily experimentally realized. From all these mentioned, we believe that the proposed measure of disorder is a valuable complement to existing ones in symbolic sequences.

  17. Strebel differentials and Hamilton sequences

    Institute of Scientific and Technical Information of China (English)

    LI; Zhong(

    2001-01-01

    [1]Strebel, K., Point shift differentials and extremal quasiconformal mappings, Annale Acad. Scle. Fenn. Math., 1998, 23: 475 -494.[2]Gardiner, F. P., Approximation of infinite dimensional Teichmutller space, Trans. Amer. Soc., 1999, 282: 367-383.[3]Lakic, N. , The Strebel points, Comptemp. Math. , 1997, 211: 417-431.[4]Wu Sheng jian, Hamilton sequences for extremal quasiconformal mappings of the unit disc, Science in China, Ser. A, 1999,42(10): 1033-1042.[5]Li Zhong, Qi Yi, A note on point shift differentials, Science in China, Ser. A, 1999, 42(5): 449-455.[6]Hamilton, R. S., Extremal quasiconformal mappings with prescribed boundary values, Trans. Amer. Math. Soc. , 1969,138: 399-406.[7]Krushkal, S. , Extremal quasiconformal mappings, Sirbirsk. Mat. Zh., 1969, 10: 573-583.[8]Reich, E., Strebel, K., Extremal quasiconformal mappings with given boundary values, Contributions to Analysis, New York: Academic Press, 1974, 375-391.[9]Strebel, K. , On quasiconformal mappings of open Riemann surfaces, Commemt. Math. Helr., 1978, 53: 301-321.[10]Earle, C., Li Zhong, Extremal quasiconformal mappings in plane domains, Quasiconformal Mappings and Analysis A Col-lection of Papers Honoring F. W. Gehring, New York: Springer-Verlag, 1998, 141-158.[11]Strebel, K., On quadratic differentials and extremal quasiconforrnal mappings, in Proc. of the Intern. Congress of Math.,Vancouver, 1974.[12]Li Zhong, Some new results on the geometry of infinite dimensional Teichmuller space, in Proceedings of the 3rd International Colloquium on Finite or Infinite Dimensional Complex Analysis, 1995, 369-378.

  18. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  19. Timing-Sequence Testing of Parallel Programs

    Institute of Scientific and Technical Information of China (English)

    LIANG Yu; LI Shu; ZHANG Hui; HAN Chengde

    2000-01-01

    Testing of parallel programs involves two parts-testing of controlflow within the processes and testing of timing-sequence.This paper focuses on the latter, particularly on the timing-sequence of message-passing paradigms.Firstly the coarse-grained SYN-sequence model is built up to describe the execution of distributed programs. All of the topics discussed in this paper are based on it. The most direct way to test a program is to run it. A fault-free parallel program should be of both correct computing results and proper SYN-sequence. In order to analyze the validity of observed SYN-sequence, this paper presents the formal specification (Backus Normal Form) of the valid SYN-sequence. Till now there is little work about the testing coverage for distributed programs. Calculating the number of the valid SYN-sequences is the key to coverage problem, while the number of the valid SYN-sequences is terribly large and it is very hard to obtain the combination law among SYN-events. In order to resolve this problem, this paper proposes an efficient testing strategy-atomic SYN-event testing, which is to linearize the SYN-sequence (making it only consist of serial atomic SYN-events) first and then test each atomic SYN-event independently. This paper particularly provides the calculating formula about the number of the valid SYN-sequences for tree-topology atomic SYN-event (broadcast and combine). Furthermore,the number of valid SYN-sequences also,to some degree, mirrors the testability of parallel programs. Taking tree-topology atomic SYN-event as an example, this paper demonstrates the testability and communication speed of the tree-topology atomic SYN-event under different numbers of branches in order to achieve a more satisfactory tradeoff between testability and communication efficiency.

  20. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding.

    Directory of Open Access Journals (Sweden)

    Marta Brozynska

    Full Text Available Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina and Ion Torrent (Life Technology sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare. Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.

  1. Generalized Identities of Companion Fibonacci-Like Sequences

    OpenAIRE

    Shikha Bhatnagar; Bijendra Singh; Omprakash Sikhwal

    2013-01-01

    The Fibonacci sequence, Lucas sequence, Pell sequence, Pell-Lucas sequence, Jacobsthalsequence and Jacobsthal-Lucas sequence are most prominent examples of second order recursivesequences. In this paper, we deal with two companion Fibonacci- Like sequences which aregeneralization of Fibonacci-Like sequence. Further we obtain some generalized identities amongthe terms of companion Fibonacci-Like sequences, Jacobsthal and Jacobsthal-Lucas sequencesthrough Binet’s formulae.

  2. Sequencing the Cotton Genomes-Gossypium spp.

    Institute of Scientific and Technical Information of China (English)

    PATERSON Andrew H

    2008-01-01

    @@ The genomes of most major crops,including cotton,will be fully sequenced in the next fewyears.Cotton is unusual,although not unique,in that we will need to sequence not only cultivated(tetraploid) genotypes but their diploid progenitors,to understand how elite cottons have surpassedthe productivity and quality of their progenitors.

  3. Convergence of a Linear Recursive Sequence

    Science.gov (United States)

    Tay, E. G.; Toh, T. L.; Dong, F. M.; Lee, T. Y.

    2004-01-01

    A necessary and sufficient condition is found for a linear recursive sequence to be convergent, no matter what initial values are given. Its limit is also obtained when the sequence is convergent. Methods from various areas of mathematics are used to obtain the results.

  4. Novel algorithms for protein sequence analysis

    NARCIS (Netherlands)

    Ye, Kai

    2008-01-01

    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology”s paradigm is that this order of amino acids determines the protein”s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1

  5. Sequencing for the cream of the crop

    Science.gov (United States)

    In this invited commentary, we discuss how next-generation sequencing methods are beginning to find their way into plant genetics, promising substantial improvements in crop yields over the coming decades. Next-generation sequencing facilitates the construction of high-resolution variation maps, whi...

  6. Sequencing Events: Exploring Art and Art Jobs.

    Science.gov (United States)

    Stephens, Pamela Geiger; Shaddix, Robin K.

    2000-01-01

    Presents an activity for upper-elementary students that correlates the actions of archaeologists, patrons, and artists with the sequencing of events in a logical order. Features ancient Egyptian art images. Discusses the preparation of materials, motivation, a pre-writing activity, and writing a story in sequence. (CMK)

  7. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. CONCLUSIONS...

  8. Sequence Comparison: Close and Open problems

    NARCIS (Netherlands)

    Lenzini, Gabriele; Cerrai, P.; Freguglia, P.

    1997-01-01

    Comparing sequences is a very important activity both in computer science and in a many other areas as well. For example thank to text editors, everyone knows the particular instance of a sequence comparison problem knonw as ``string mathcing problem''. It consists in searching a given work eventual

  9. Stochastic modelling of daily rainfall sequences

    NARCIS (Netherlands)

    Buishand, T.A.

    1977-01-01

    Rainfall series of different climatic regions were analysed with the aim of generating daily rainfall sequences. A survey of the data is given in I, 1. When analysing daily rainfall sequences one must be aware of the following points:
    a. Seasonality. Because of seasonal variation

  10. Genome Sequence of Lactobacillus rhamnosus ATCC 8530

    OpenAIRE

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R.; Ziola, Barry

    2012-01-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences.

  11. SEQUENCE IN LEARNING--FACT OR FICTION.

    Science.gov (United States)

    MIEL, ALICE

    SEQUENCE IN LEARNING IS USEFUL ONLY AS IT CONTRIBUTES TO THE CONTINUITY OF A CHILD'S OVERALL DEVELOPMENT. CHILDREN MAY NOT GO THROUGH THE SAME SEQUENCE TO ARRIVE AT A SIMILAR POINT OF UNDERSTANDING. EDUCATIONAL PROGRESS IS INDICATED BY A CHILD'S GROWTH IN THE DEVELOPMENT OF STRATEGIC CONCEPTS, IN WAYS OF PROCESSING INFORMATION, AND IN WAYS OF…

  12. Controlling monomer-sequence using supramolecular templates

    OpenAIRE

    ten Brummelhuis, Niels

    2014-01-01

    The transcription and translation of information contained in nucleic acids that has been perfected by nature serves as inspiration for chemists to devise strategies for the creation of polymers with welldefined monomer sequences. In this review the various approaches in which templates (either biopolymers or synthetic ones) are used to influence the monomer-sequence are discussed.

  13. What's Next? Judging Sequences of Binary Events

    Science.gov (United States)

    Oskarsson, An T.; Van Boven, Leaf; McClelland, Gary H.; Hastie, Reid

    2009-01-01

    The authors review research on judgments of random and nonrandom sequences involving binary events with a focus on studies documenting gambler's fallacy and hot hand beliefs. The domains of judgment include random devices, births, lotteries, sports performances, stock prices, and others. After discussing existing theories of sequence judgments,…

  14. SPARSE SEQUENCE CONSTRUCTION OF LDPC CODES

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    This letter proposes a novel and simple construction of regular Low-Density Parity-Check (LDPC) codes using sparse binary sequences. It utilizes the cyclic cross correlation function of sparse sequences to generate codes with girth8. The new codes perform well using the sumproduct decoding. Low encodingcomplexity can also be achieved due to the inherent quasi-cyclic structure of the codes.

  15. Sequence Comparison: Close and Open problems

    NARCIS (Netherlands)

    Lenzini, Gabriele; Cerrai, P.; Freguglia, P.

    Comparing sequences is a very important activity both in computer science and in a many other areas as well. For example thank to text editors, everyone knows the particular instance of a sequence comparison problem knonw as ``string mathcing problem''. It consists in searching a given work

  16. Archaebacterial rhodopsin sequences: Implications for evolution

    Science.gov (United States)

    Lanyi, J. K.

    1991-01-01

    It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.

  17. Bonobos extract meaning from call sequences.

    Directory of Open Access Journals (Sweden)

    Zanna Clay

    Full Text Available Studies on language-trained bonobos have revealed their remarkable abilities in representational and communication tasks. Surprisingly, however, corresponding research into their natural communication has largely been neglected. We address this issue with a first playback study on the natural vocal behaviour of bonobos. Bonobos produce five acoustically distinct call types when finding food, which they regularly mix together into longer call sequences. We found that individual call types were relatively poor indicators of food quality, while context specificity was much greater at the call sequence level. We therefore investigated whether receivers could extract meaning about the quality of food encountered by the caller by integrating across different call sequences. We first trained four captive individuals to find two types of foods, kiwi (preferred and apples (less preferred at two different locations. We then conducted naturalistic playback experiments during which we broadcasted sequences of four calls, originally produced by a familiar individual responding to either kiwi or apples. All sequences contained the same number of calls but varied in the composition of call types. Following playbacks, we found that subjects devoted significantly more search effort to the field indicated by the call sequence. Rather than attending to individual calls, bonobos attended to the entire sequences to make inferences about the food encountered by a caller. These results provide the first empirical evidence that bonobos are able to extract information about external events by attending to vocal sequences of other individuals and highlight the importance of call combinations in their natural communication system.

  18. What's Next? Judging Sequences of Binary Events

    Science.gov (United States)

    Oskarsson, An T.; Van Boven, Leaf; McClelland, Gary H.; Hastie, Reid

    2009-01-01

    The authors review research on judgments of random and nonrandom sequences involving binary events with a focus on studies documenting gambler's fallacy and hot hand beliefs. The domains of judgment include random devices, births, lotteries, sports performances, stock prices, and others. After discussing existing theories of sequence judgments,…

  19. Wolbachia Sequence Typing in Butterflies Using Pyrosequencing.

    Science.gov (United States)

    Choi, Sungmi; Shin, Su-Kyoung; Jeong, Gilsang; Yi, Hana

    2015-09-01

    Wolbachia is an obligate symbiotic bacteria that is ubiquitous in arthropods, with 25-70% of insect species estimated to be infected. Wolbachia species can interact with their insect hosts in a mutualistic or parasitic manner. Sequence types (ST) of Wolbachia are determined by multilocus sequence typing (MLST) of housekeeping genes. However, there are some limitations to MLST with respect to the generation of clone libraries and the Sanger sequencing method when a host is infected with multiple STs of Wolbachia. To assess the feasibility of massive parallel sequencing, also known as next-generation sequencing, we used pyrosequencing for sequence typing of Wolbachia in butterflies. We collected three species of butterflies (Eurema hecabe, Eurema laeta, and Tongeia fischeri) common to Korea and screened them for Wolbachia STs. We found that T. fischeri was infected with a single ST of Wolbachia, ST41. In contrast, E. hecabe and E. laeta were each infected with two STs of Wolbachia, ST41 and ST40. Our results clearly demonstrate that pyrosequencing-based MLST has a higher sensitivity than cloning and Sanger sequencing methods for the detection of minor alleles. Considering the high prevalence of infection with multiple Wolbachia STs, next-generation sequencing with improved analysis would assist with scaling up approaches to Wolbachia MLST.

  20. Value of a newly sequenced bacterial genome

    DEFF Research Database (Denmark)

    Barbosa, Eudes; Aburjaile, Flavia F; Ramos, Rommel Tj

    2014-01-01

    and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses...

  1. Discrepancy of LS-sequences of partitions

    CERN Document Server

    Carbone, Ingrid

    2010-01-01

    In this paper we give a precise estimate of the discrepancy of a class of uniformly distributed sequences of partitions. Among them we found a large class having low discrepancy (which means of order 1/N. One of them is the Kakutani-Fibonacci sequence.

  2. Some identities of generalized Fibonacci sequence

    Science.gov (United States)

    Chong, Chin-Yoon; Cheah, C. L.; Ho, C. K.

    2014-07-01

    We introduced the generalized Fibonacci sequence {Un} defined by U0 = 0, U1 = 1, and Un+2 = pUn+1+qUn for all p, q∈Z+ and for all non-negative integers n. In this paper, we obtained some recursive formulas of the sequence.

  3. On the sum of generalized Fibonacci sequence

    Science.gov (United States)

    Chong, Chin-Yoon; Ho, C. K.

    2014-06-01

    We consider the generalized Fibonacci sequence {Un defined by U0 = 0, U1 = 1, and Un+2 = pUn+1+qUn for all n∈Z0+ and p, q∈Z+. In this paper, we derived various sums of the generalized Fibonacci sequence from their recursive relations.

  4. Regular Pentagons and the Fibonacci Sequence.

    Science.gov (United States)

    French, Doug

    1989-01-01

    Illustrates how to draw a regular pentagon. Shows the sequence of a succession of regular pentagons formed by extending the sides. Calculates the general formula of the Lucas and Fibonacci sequences. Presents a regular icosahedron as an example of the golden ratio. (YP)

  5. GENE SEQUENCE HOMOLOGY OF CHEMOKINES ACROSS SPECIES

    Science.gov (United States)

    The abundance of expressed gene and protein sequences available in the biological information databases facilitates comparison of protein homologies. A high degree of sequence similarity typically implies homology regarding structure and function and may provide clues to antibody cross-react...

  6. Concept For Generation Of Long Pseudorandom Sequences

    Science.gov (United States)

    Wang, C. C.

    1990-01-01

    Conceptual very-large-scale integrated (VLSI) digital circuit performs exponentiation in finite field. Algorithm that generates unusually long sequences of pseudorandom numbers executed by digital processor that includes such circuits. Concepts particularly advantageous for such applications as spread-spectrum communications, cryptography, and generation of ranging codes, synthetic noise, and test data, where usually desirable to make pseudorandom sequences as long as possible.

  7. Using Conventional Sequences in L2 French

    Science.gov (United States)

    Forsberg, Fanny

    2010-01-01

    By means of a phraseological identification method, this study provides a general description of the use of conventional sequences (CSs) in interviews at four different levels of spoken L2 French as well as in interviews with native speakers. Use of conventional sequences is studied with regard to overall quantity, category distribution and type…

  8. Fibonacci-triple sequences and some fundamental properties

    Directory of Open Access Journals (Sweden)

    Bijendra Singh

    2010-12-01

    Full Text Available Fibonacci sequence stands as a kind of super sequence with fabulous properties. This note presents Fibonacci-Triple sequences that may also be called 3-F sequences. This is the explosive development in the region of Fibonacci sequence. Our purpose of this paper is to demonstrate fundamental properties of Fibonacci-Triple sequence.

  9. Fibonacci-triple sequences and some fundamental properties

    OpenAIRE

    Bijendra Singh; Omprakash Sikhwal

    2010-01-01

    Fibonacci sequence stands as a kind of super sequence with fabulous properties. This note presents Fibonacci-Triple sequences that may also be called 3-F sequences. This is the explosive development in the region of Fibonacci sequence. Our purpose of this paper is to demonstrate fundamental properties of Fibonacci-Triple sequence.

  10. Conserved Sequence Processing in Primate Frontal Cortex.

    Science.gov (United States)

    Wilson, Benjamin; Marslen-Wilson, William D; Petkov, Christopher I

    2017-02-01

    An important aspect of animal perception and cognition is learning to recognize relationships between environmental events that predict others in time, a form of relational knowledge that can be assessed using sequence-learning paradigms. Humans are exquisitely sensitive to sequencing relationships, and their combinatorial capacities, most saliently in the domain of language, are unparalleled. Recent comparative research in human and nonhuman primates has obtained behavioral and neuroimaging evidence for evolutionarily conserved substrates involved in sequence processing. The findings carry implications for the origins of domain-general capacities underlying core language functions in humans. Here, we synthesize this research into a 'ventrodorsal gradient' model, where frontal cortex engagement along this axis depends on sequencing complexity, mapping onto the sequencing capacities of different species. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Sequencing and comparing whole mitochondrial genomes ofanimals

    Energy Technology Data Exchange (ETDEWEB)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  12. NGS-based deep bisulfite sequencing.

    Science.gov (United States)

    Lee, Suman; Kim, Joomyeong

    2016-01-01

    We have developed an NGS-based deep bisulfite sequencing protocol for the DNA methylation analysis of genomes. This approach allows the rapid and efficient construction of NGS-ready libraries with a large number of PCR products that have been individually amplified from bisulfite-converted DNA. This approach also employs a bioinformatics strategy to sort the raw sequence reads generated from NGS platforms and subsequently to derive DNA methylation levels for individual loci. The results demonstrated that this NGS-based deep bisulfite sequencing approach provide not only DNA methylation levels but also informative DNA methylation patterns that have not been seen through other existing methods.•This protocol provides an efficient method generating NGS-ready libraries from individually amplified PCR products.•This protocol provides a bioinformatics strategy sorting NGS-derived raw sequence reads.•This protocol provides deep bisulfite sequencing results that can measure DNA methylation levels and patterns of individual loci.

  13. Locomotor sequence learning in visually guided walking

    DEFF Research Database (Denmark)

    Choi, Julia T; Jensen, Peter; Nielsen, Jens Bo

    2016-01-01

    walking. In addition, we determined how age (i.e., healthy young adults vs. children) and biomechanical factors (i.e., walking speed) affected the rate and magnitude of locomotor sequence learning. The results showed that healthy young adults (age 24 ± 5 years, N = 20) could learn a specific sequence...... of step lengths over 300 training steps. Younger children (age 6-10 years, N = 8) have lower baseline performance, but their magnitude and rate of sequence learning was the same compared to older children (11-16 years, N = 10) and healthy adults. In addition, learning capacity may be more limited...... at faster walking speeds. To our knowledge, this is the first study to demonstrate that spatial sequence learning can be integrated with a highly automatic task like walking. These findings suggest that adults and children use implicit knowledge about the sequence to plan and execute leg movement during...

  14. Spiking neuron model for temporal sequence recognition.

    Science.gov (United States)

    Byrnes, Sean; Burkitt, Anthony N; Grayden, David B; Meffin, Hamish

    2010-01-01

    A biologically inspired neuronal network that stores and recognizes temporal sequences of symbols is described. Each symbol is represented by excitatory input to distinct groups of neurons (symbol pools). Unambiguous storage of multiple sequences with common subsequences is ensured by partitioning each symbol pool into subpools that respond only when the current symbol has been preceded by a particular sequence of symbols. We describe synaptic structure and neural dynamics that permit the selective activation of subpools by the correct sequence. Symbols may have varying durations of the order of hundreds of milliseconds. Physiologically plausible plasticity mechanisms operate on a time scale of tens of milliseconds; an interaction of the excitatory input with periodic global inhibition bridges this gap so that neural events representing successive symbols occur on this much faster timescale. The network is shown to store multiple overlapping sequences of events. It is robust to variation in symbol duration, it is scalable, and its performance degrades gracefully with perturbation of its parameters.

  15. Aligning Sequences by Minimum Description Length

    Directory of Open Access Journals (Sweden)

    John S. Conery

    2008-01-01

    Full Text Available This paper presents a new information theoretic framework for aligning sequences in bioinformatics. A transmitter compresses a set of sequences by constructing a regular expression that describes the regions of similarity in the sequences. To retrieve the original set of sequences, a receiver generates all strings that match the expression. An alignment algorithm uses minimum description length to encode and explore alternative expressions; the expression with the shortest encoding provides the best overall alignment. When two substrings contain letters that are similar according to a substitution matrix, a code length function based on conditional probabilities defined by the matrix will encode the substrings with fewer bits. In one experiment, alignments produced with this new method were found to be comparable to alignments from CLUSTALW. A second experiment measured the accuracy of the new method on pairwise alignments of sequences from the BAliBASE alignment benchmark.

  16. Metagenomics using next-generation sequencing.

    Science.gov (United States)

    Bragg, Lauren; Tyson, Gene W

    2014-01-01

    Traditionally, microbial genome sequencing has been restricted to the small number of species that can be grown in pure culture. The progressive development of culture-independent methods over the last 15 years now allows researchers to sequence microbial communities directly from environmental samples. This approach is commonly referred to as "metagenomics" or "community genomics". However, the term metagenomics is applied liberally in the literature to describe any culture-independent analysis of microbial communities. Here, we define metagenomics as shotgun ("random") sequencing of the genomic DNA of a sample taken directly from the environment. The metagenome can be thought of as a sampling of the collective genome of the microbial community. We outline the considerations and analyses that should be undertaken to ensure the success of a metagenomic sequencing project, including the choice of sequencing platform and methods for assembly, binning, annotation, and comparative analysis.

  17. Strategies for complete plastid genome sequencing.

    Science.gov (United States)

    Twyford, Alex D; Ness, Rob W

    2016-10-28

    Plastid sequencing is an essential tool in the study of plant evolution. This high-copy organelle is one of the most technically accessible regions of the genome, and its sequence conservation makes it a valuable region for comparative genome evolution, phylogenetic analysis and population studies. Here, we discuss recent innovations and approaches for de novo plastid assembly that harness genomic tools. We focus on technical developments including low-cost sequence library preparation approaches for genome skimming, enrichment via hybrid baits and methylation-sensitive capture, sequence platforms with higher read outputs and longer read lengths, and automated tools for assembly. These developments allow for a much more streamlined assembly than via conventional short-range PCR. Although newer methods make complete plastid sequencing possible for any land plant or green alga, there are still challenges for producing finished plastomes particularly from herbarium material or from structurally divergent plastids such as those of parasitic plants.

  18. Reading biological processes from nucleotide sequences

    Science.gov (United States)

    Murugan, Anand

    Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical

  19. GASP: Gapped Ancestral Sequence Prediction for proteins

    Directory of Open Access Journals (Sweden)

    Shields Denis C

    2004-09-01

    Full Text Available Abstract Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction, for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike.

  20. GASP: Gapped Ancestral Sequence Prediction for proteins.

    Science.gov (United States)

    Edwards, Richard J; Shields, Denis C

    2004-09-06

    The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike.

  1. On some difference sequence spaces defined by a sequence of Orlicz functions

    Institute of Scientific and Technical Information of China (English)

    ASMA BEKTA(S) (C)i(g)dem

    2006-01-01

    The idea of difference sequence spaces was introduced in (Kizmaz, 1981) and this concept was generalized in (Et and Colak, 1995). In this paper we define some difference sequence spaces by a sequence of Orlicz functions and establish some inclusion relations.

  2. Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

    Science.gov (United States)

    Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

    2017-07-01

    PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.

  3. Finding Common Sequence and Structure Motifs in a set of RNA sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Heyer, Laurie J.; Stormo, Gary D.

    1997-01-01

    We present a computational scheme to search for the most common motif, composed of a combination of sequence and structure constraints, among a collection of RNA sequences. The method uses a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences...

  4. SOME GEOMETRIC PROPERTIES OF A NEW DIFFERENCE SEQUENCE SPACE INVOLVING LACUNARY SEQUENCES

    Institute of Scientific and Technical Information of China (English)

    Murat KARAKAŞ; Mikail ET; Vatan KARAKAYA

    2013-01-01

    In this paper, we define a new generalized difference sequence space involving lacunary sequence. Then, we examine k-NUC property and property (β) for this space and also show that it is not rotund where p=(pr) is a bounded sequence of positive real numbers with pr ≥1 for all r∈N.

  5. Targeted next-generation sequencing can replace Sanger sequencing in clinical diagnostics

    NARCIS (Netherlands)

    Sikkema-Raddatz, B.; Johansson, L.F.; de Boer, E.N.; Almomani, R.; Boven, L.G.; van den Berg, M.P.; van Spaendonck-Zwarts, K.Y.; van Tintelen, J.P.; Sijmons, R.H.; Jongbloed, J.D.H.; Sinke, R.J.

    2013-01-01

    Mutation detection through exome sequencing allows simultaneous analysis of all coding sequences of genes. However, it cannot yet replace Sanger sequencing (SS) in diagnostics because of incomplete representation and coverage of exons leading to missing clinically relevant mutations. Targeted next-g

  6. Sequencing and Analysis of a Genomic Fragment Provide an Insight into the Dunaliella viridis Genomic Sequence

    Institute of Scientific and Technical Information of China (English)

    Xiao-Ming SUN; Yuan-Ping TANG; Xiang-Zong MENG; Wen-Wen ZHANG; Shan LI; Zhi-Rui DENG; Zheng-Kai XU; Ren-Tao SONG

    2006-01-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)n type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features.

  7. Gelada vocal sequences follow Menzerath's linguistic law.

    Science.gov (United States)

    Gustison, Morgan L; Semple, Stuart; Ferrer-I-Cancho, Ramon; Bergman, Thore J

    2016-05-10

    Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath's law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath's law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath's law reflects compression-the principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language.

  8. De novo peptide sequencing by deep learning.

    Science.gov (United States)

    Tran, Ngoc Hieu; Zhang, Xianglilan; Xin, Lei; Shan, Baozhen; Li, Ming

    2017-07-18

    De novo peptide sequencing from tandem MS data is the key technology in proteomics for the characterization of proteins, especially for new sequences, such as mAbs. In this study, we propose a deep neural network model, DeepNovo, for de novo peptide sequencing. DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides. The networks are further integrated with local dynamic programming to solve the complex optimization task of de novo sequencing. We evaluated the method on a wide variety of species and found that DeepNovo considerably outperformed state of the art methods, achieving 7.7-22.9% higher accuracy at the amino acid level and 38.1-64.0% higher accuracy at the peptide level. We further used DeepNovo to automatically reconstruct the complete sequences of antibody light and heavy chains of mouse, achieving 97.5-100% coverage and 97.2-99.5% accuracy, without assisting databases. Moreover, DeepNovo is retrainable to adapt to any sources of data and provides a complete end-to-end training and prediction solution to the de novo sequencing problem. Not only does our study extend the deep learning revolution to a new field, but it also shows an innovative approach in solving optimization problems by using deep learning and dynamic programming.

  9. Fungal genome sequencing: basic biology to biotechnology.

    Science.gov (United States)

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research.

  10. CATEGORIZATION OF EVENT SEQUENCES FOR LICENSE APPLICATION

    Energy Technology Data Exchange (ETDEWEB)

    G.E. Ragan; P. Mecheret; D. Dexheimer

    2005-04-14

    The purposes of this analysis are: (1) Categorize (as Category 1, Category 2, or Beyond Category 2) internal event sequences that may occur before permanent closure of the repository at Yucca Mountain. (2) Categorize external event sequences that may occur before permanent closure of the repository at Yucca Mountain. This includes examining DBGM-1 seismic classifications and upgrading to DBGM-2, if appropriate, to ensure Beyond Category 2 categorization. (3) State the design and operational requirements that are invoked to make the categorization assignments valid. (4) Indicate the amount of material put at risk by Category 1 and Category 2 event sequences. (5) Estimate frequencies of Category 1 event sequences at the maximum capacity and receipt rate of the repository. (6) Distinguish occurrences associated with normal operations from event sequences. It is beyond the scope of the analysis to propose design requirements that may be required to control radiological exposure associated with normal operations. (7) Provide a convenient compilation of the results of the analysis in tabular form. The results of this analysis are used as inputs to the consequence analyses in an iterative design process that is depicted in Figure 1. Categorization of event sequences for permanent retrieval of waste from the repository is beyond the scope of this analysis. Cleanup activities that take place after an event sequence and other responses to abnormal events are also beyond the scope of the analysis.

  11. Exploration of noncoding sequences in metagenomes.

    Directory of Open Access Journals (Sweden)

    Fabián Tobar-Tosse

    Full Text Available Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C content, Codon Usage (Cd, Trinucleotide Usage (Tn, and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.

  12. Value of a newly sequenced bacterial genome

    Institute of Scientific and Technical Information of China (English)

    Eudes; GV; Barbosa; Flavia; F; Aburjaile; Rommel; TJ; Ramos; Adriana; R; Carneiro; Yves; Le; Loir; Jan; Baumbach; Anderson; Miyoshi; Artur; Silva; Vasco; Azevedo

    2014-01-01

    Next-generation sequencing(NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft(partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information.

  13. Revisiting Mendelian disorders through exome sequencing.

    Science.gov (United States)

    Ku, Chee-Seng; Naidoo, Nasheen; Pawitan, Yudi

    2011-04-01

    Over the past several years, more focus has been placed on dissecting the genetic basis of complex diseases and traits through genome-wide association studies. In contrast, Mendelian disorders have received little attention mainly due to the lack of newer and more powerful methods to study these disorders. Linkage studies have previously been the main tool to elucidate the genetics of Mendelian disorders; however, extremely rare disorders or sporadic cases caused by de novo variants are not amendable to this study design. Exome sequencing has now become technically feasible and more cost-effective due to the recent advances in high-throughput sequence capture methods and next-generation sequencing technologies which have offered new opportunities for Mendelian disorder research. Exome sequencing has been swiftly applied to the discovery of new causal variants and candidate genes for a number of Mendelian disorders such as Kabuki syndrome, Miller syndrome and Fowler syndrome. In addition, de novo variants were also identified for sporadic cases, which would have not been possible without exome sequencing. Although exome sequencing has been proven to be a promising approach to study Mendelian disorders, several shortcomings of this method must be noted, such as the inability to capture regulatory or evolutionary conserved sequences in non-coding regions and the incomplete capturing of all exons.

  14. Identification of ancient remains through genomic sequencing

    Science.gov (United States)

    Blow, Matthew J.; Zhang, Tao; Woyke, Tanja; Speller, Camilla F.; Krivoshapkin, Andrei; Yang, Dongya Y.; Derevianko, Anatoly; Rubin, Edward M.

    2008-01-01

    Studies of ancient DNA have been hindered by the preciousness of remains, the small quantities of undamaged DNA accessible, and the limitations associated with conventional PCR amplification. In these studies, we developed and applied a genomewide adapter-mediated emulsion PCR amplification protocol for ancient mammalian samples estimated to be between 45,000 and 69,000 yr old. Using 454 Life Sciences (Roche) and Illumina sequencing (formerly Solexa sequencing) technologies, we examined over 100 megabases of DNA from amplified extracts, revealing unbiased sequence coverage with substantial amounts of nonredundant nuclear sequences from the sample sources and negligible levels of human contamination. We consistently recorded over 500-fold increases, such that nanogram quantities of starting material could be amplified to microgram quantities. Application of our protocol to a 50,000-yr-old uncharacterized bone sample that was unsuccessful in mitochondrial PCR provided sufficient nuclear sequences for comparison with extant mammals and subsequent phylogenetic classification of the remains. The combined use of emulsion PCR amplification and high-throughput sequencing allows for the generation of large quantities of DNA sequence data from ancient remains. Using such techniques, even small amounts of ancient remains with low levels of endogenous DNA preservation may yield substantial quantities of nuclear DNA, enabling novel applications of ancient DNA genomics to the investigation of extinct phyla. PMID:18426903

  15. Enhanced Dynamic Algorithm of Genome Sequence Alignments

    Directory of Open Access Journals (Sweden)

    Arabi E. keshk

    2014-05-01

    Full Text Available The merging of biology and computer science has created a new field called computational biology that explore the capacities of computers to gain knowledge from biological data, bioinformatics. Computational biology is rooted in life sciences as well as computers, information sciences, and technologies. The main problem in computational biology is sequence alignment that is a way of arranging the sequences of DNA, RNA or protein to identify the region of similarity and relationship between sequences. This paper introduces an enhancement of dynamic algorithm of genome sequence alignment, which called EDAGSA. It is filling the three main diagonals without filling the entire matrix by the unused data. It gets the optimal solution with decreasing the execution time and therefore the performance is increased. To illustrate the effectiveness of optimizing the performance of the proposed algorithm, it is compared with the traditional methods such as Needleman-Wunsch, Smith-Waterman and longest common subsequence algorithms. Also, database is implemented for using the algorithm in multi-sequence alignments for searching the optimal sequence that matches the given sequence.

  16. Large Zero Autocorrelation Zone of Golay Sequences and $4^q$-QAM Golay Complementary Sequences

    CERN Document Server

    Gong, Guang; Yang, Yang

    2011-01-01

    Sequences with good correlation properties have been widely adopted in modern communications, radar and sonar applications. In this paper, we present our new findings on some constructions of single $H$-ary Golay sequence and $4^q$-QAM Golay complementary sequence with a large zero autocorrelation zone, where $H\\ge 2$ is an arbitrary even integer and $q\\ge 2$ is an arbitrary integer. Those new results on Golay sequences and QAM Golay complementary sequences can be explored during synchronization and detection at the receiver end and thus improve the performance of the communication system.

  17. Identification of 10 882 porcine microsatellite sequences and virtual mapping of 4528 of these sequences

    DEFF Research Database (Denmark)

    Karlskov-Mortensen, Peter; Hu, Z.L.; Gorodkin, Jan

    2007-01-01

    the human genome (BLAST cut-off threshold = 1 x 10-5). All microsatellite sequences placed on the comparative map are accessible at http://www.animalgenome.org/QTLdb/pig.html . These sequences increase the number of identified microsatellites in the porcine genome by several orders of magnitude......A total of 10 882 porcine microsatelite repeats were identified in genomic shotgun sequences from the Sino-Danish Pig Genome Sequencing Consortium ( http://piggenome.dk ). Of these, 4528 microsatellites were placed on a pig-human comparative map by BLAST analysis of porcine sequences against...

  18. Robot Sequencing and Visualization Program (RSVP)

    Science.gov (United States)

    Cooper, Brian K.; Maxwell,Scott A.; Hartman, Frank R.; Wright, John R.; Yen, Jeng; Toole, Nicholas T.; Gorjian, Zareh; Morrison, Jack C

    2013-01-01

    The Robot Sequencing and Visualization Program (RSVP) is being used in the Mars Science Laboratory (MSL) mission for downlink data visualization and command sequence generation. RSVP reads and writes downlink data products from the operations data server (ODS) and writes uplink data products to the ODS. The primary users of RSVP are members of the Rover Planner team (part of the Integrated Planning and Execution Team (IPE)), who use it to perform traversability/articulation analyses, take activity plan input from the Science and Mission Planning teams, and create a set of rover sequences to be sent to the rover every sol. The primary inputs to RSVP are downlink data products and activity plans in the ODS database. The primary outputs are command sequences to be placed in the ODS for further processing prior to uplink to each rover. RSVP is composed of two main subsystems. The first, called the Robot Sequence Editor (RoSE), understands the MSL activity and command dictionaries and takes care of converting incoming activity level inputs into command sequences. The Rover Planners use the RoSE component of RSVP to put together command sequences and to view and manage command level resources like time, power, temperature, etc. (via a transparent realtime connection to SEQGEN). The second component of RSVP is called HyperDrive, a set of high-fidelity computer graphics displays of the Martian surface in 3D and in stereo. The Rover Planners can explore the environment around the rover, create commands related to motion of all kinds, and see the simulated result of those commands via its underlying tight coupling with flight navigation, motor, and arm software. This software is the evolutionary replacement for the Rover Sequencing and Visualization software used to create command sequences (and visualize the Martian surface) for the Mars Exploration Rover mission.

  19. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  20. Effects of the Ion PGM™ Hi-Q™ sequencing chemistry on sequence data quality.

    Science.gov (United States)

    Churchill, Jennifer D; King, Jonathan L; Chakraborty, Ranajit; Budowle, Bruce

    2016-09-01

    Massively parallel sequencing (MPS) offers substantial improvements over current forensic DNA typing methodologies such as increased resolution, scalability, and throughput. The Ion PGM™ is a promising MPS platform for analysis of forensic biological evidence. The system employs a sequencing-by-synthesis chemistry on a semiconductor chip that measures a pH change due to the release of hydrogen ions as nucleotides are incorporated into the growing DNA strands. However, implementation of MPS into forensic laboratories requires a robust chemistry. Ion Torrent's Hi-Q™ Sequencing Chemistry was evaluated to determine if it could improve on the quality of the generated sequence data in association with selected genetic marker targets. The whole mitochondrial genome and the HID-Ion STR 10-plex panel were sequenced on the Ion PGM™ system with the Ion PGM™ Sequencing 400 Kit and the Ion PGM™ Hi-Q™ Sequencing Kit. Concordance, coverage, strand balance, noise, and deletion ratios were assessed in evaluating the performance of the Ion PGM™ Hi-Q™ Sequencing Kit. The results indicate that reliable, accurate data are generated and that sequencing through homopolymeric regions can be improved with the use of Ion Torrent's Hi-Q™ Sequencing Chemistry. Overall, the quality of the generated sequencing data supports the potential for use of the Ion PGM™ in forensic genetic laboratories.

  1. Statistical properties of filtered pseudorandom digital sequences formed from the sum of maximum-length sequences

    Science.gov (United States)

    Wallace, G. R.; Weathers, G. D.; Graf, E. R.

    1973-01-01

    The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.

  2. How Long is an Aftershock Sequence?

    Science.gov (United States)

    Godano, Cataldo; Tramelli, Anna

    2016-07-01

    The occurrence of a mainschok is always followed by aftershocks spatially distributed within the fault area. The aftershocks rate decay with time is described by the empirical Omori law which was inferred by catalogues analysis. The sequences discrimination within catalogues is not a straightforward operation, especially for low-magnitude mainshocks. Here, we describe the rate decay of the Omori law obtained using different sequence discrimination tools and we discover that, when the background seismicity is excluded, the sequences tend to last for the temporal extension of the catalogue.

  3. Nanopore-CMOS Interfaces for DNA Sequencing.

    Science.gov (United States)

    Magierowski, Sebastian; Huang, Yiyun; Wang, Chengjie; Ghafar-Zadeh, Ebrahim

    2016-08-06

    DNA sequencers based on nanopore sensors present an opportunity for a significant break from the template-based incumbents of the last forty years. Key advantages ushered by nanopore technology include a simplified chemistry and the ability to interface to CMOS technology. The latter opportunity offers substantial promise for improvement in sequencing speed, size and cost. This paper reviews existing and emerging means of interfacing nanopores to CMOS technology with an emphasis on massively-arrayed structures. It presents this in the context of incumbent DNA sequencing techniques, reviews and quantifies nanopore characteristics and models and presents CMOS circuit methods for the amplification of low-current nanopore signals in such interfaces.

  4. Initial retrieval sequence and blending strategy

    Energy Technology Data Exchange (ETDEWEB)

    Pemwell, D.L.; Grenard, C.E.

    1996-09-01

    This report documents the initial retrieval sequence and the methodology used to select it. Waste retrieval, storage, pretreatment and vitrification were modeled for candidate single-shell tank retrieval sequences. Performance of the sequences was measured by a set of metrics (for example,high-level waste glass volume, relative risk and schedule).Computer models were used to evaluate estimated glass volumes,process rates, retrieval dates, and blending strategy effects.The models were based on estimates of component inventories and concentrations, sludge wash factors and timing, retrieval annex limitations, etc.

  5. The double main sequence of Omega Centauri

    CERN Document Server

    Bedin, L R

    2004-01-01

    Recent, high precision photometry of Omega Centauri, the biggest Galactic globular cluster, has been obtained with Hubble Space Telescope. The color magnitude diagram reveals an unexpected bifurcation of colors in the main sequence (MS). The newly found double MS, the multiple turnoffs and subgiant branches, and other sequences discovered in the past along the red giant branch of this cluster add up to a fascinating but frustrating puzzle. Among the possible explanations for the blue main sequence an anomalous overabundance of helium is suggested. The hypothesis will be tested with a set of FLAMES@VLT data we have recently obtained (ESO DDT program), and with forthcoming ACS@HST images.

  6. Association Claims in the Sequencing Era

    Directory of Open Access Journals (Sweden)

    Sara L. Pulit

    2014-03-01

    Full Text Available Since the completion of the Human Genome Project, the field of human genetics has been in great flux, largely due to technological advances in studying DNA sequence variation. Although community-wide adoption of statistical standards was key to the success of genome-wide association studies, similar standards have not yet been globally applied to the processing and interpretation of sequencing data. It has proven particularly challenging to pinpoint unequivocally disease variants in sequencing studies of polygenic traits. Here, we comment on a number of factors that may contribute to irreproducible claims of association in scientific literature and discuss possible steps that we can take towards cultural change.

  7. Persistence and NIP in the characteristic sequence

    CERN Document Server

    Malliaris, M E

    2009-01-01

    For a first-order formula $\\phi(x;y)$ we introduce and study the characteristic sequence $$ of hypergraphs defined by $P_n(y_1,...,y_n) := (\\exists x) \\bigwedge_{i \\leq n} \\phi(x;y_i)$. We show that combinatorial and classification theoretic properties of the characteristic sequence reflect classification theoretic properties of $\\varphi$ and vice versa. Specifically, we show that some tree properties are detected by the presence of certain combinatorial configurations in the characteristic sequence while other properties such as instability and the independence property manifest themselves in the persistence of complicated configurations under localization.

  8. Scale-PC shielding analysis sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bowman, S.M.

    1996-05-01

    The SCALE computational system is a modular code system for analyses of nuclear fuel facility and package designs. With the release of SCALE-PC Version 4.3, the radiation shielding analysis community now has the capability to execute the SCALE shielding analysis sequences contained in the control modules SAS1, SAS2, SAS3, and SAS4 on a MS- DOS personal computer (PC). In addition, SCALE-PC includes two new sequences, QADS and ORIGEN-ARP. The capabilities of each sequence are presented, along with example applications.

  9. Spreadsheet macros for coloring sequence alignments.

    Science.gov (United States)

    Haygood, M G

    1993-12-01

    This article describes a set of Microsoft Excel macros designed to color amino acid and nucleotide sequence alignments for review and preparation of visual aids. The colored alignments can then be modified to emphasize features of interest. Procedures for importing and coloring sequences are described. The macro file adds a new menu to the menu bar containing sequence-related commands to enable users unfamiliar with Excel to use the macros more readily. The macros were designed for use with Macintosh computers but will also run with the DOS version of Excel.

  10. Microbial genomics: from sequence to function.

    OpenAIRE

    Schwartz, I

    2000-01-01

    The era of genomics (the study of genes and their function) began a scant dozen years ago with a suggestion by James Watson that the complete DNA sequence of the human genome be determined. Since that time, the human genome project has attracted a great deal of attention in the scientific world and the general media; the scope of the sequencing effort, and the extraordinary value that it will provide, has served to mask the enormous progress in sequencing other genomes. Microbial genome seque...

  11. Deep-sequencing protocols influence the results obtained in small-RNA sequencing.

    Directory of Open Access Journals (Sweden)

    Joern Toedling

    Full Text Available Second-generation sequencing is a powerful method for identifying and quantifying small-RNA components of cells. However, little attention has been paid to the effects of the choice of sequencing platform and library preparation protocol on the results obtained. We present a thorough comparison of small-RNA sequencing libraries generated from the same embryonic stem cell lines, using different sequencing platforms, which represent the three major second-generation sequencing technologies, and protocols. We have analysed and compared the expression of microRNAs, as well as populations of small RNAs derived from repetitive elements. Despite the fact that different libraries display a good correlation between sequencing platforms, qualitative and quantitative variations in the results were found, depending on the protocol used. Thus, when comparing libraries from different biological samples, it is strongly recommended to use the same sequencing platform and protocol in order to ensure the biological relevance of the comparisons.

  12. DNA Sequence Determination by Hybridization: A Strategy for Efficient Large-Scale Sequencing

    Science.gov (United States)

    Drmanac, R.; Drmanac, S.; Strezoska, Z.; Paunesku, T.; Labat, I.; Zeremski, M.; Snoddy, J.; Funkhouser, W. K.; Koop, B.; Hood, L.; Crkvenjakov, R.

    1993-06-01

    The concept of sequencing by hybridization (SBH) makes use of an array of all possible n-nucleotide oligomers (n-mers) to identify n-mers present in an unknown DNA sequence. Computational approaches can then be used to assemble the complete sequence. As a validation of this concept, the sequences of three DNA fragments, 343 base pairs in length, were determined with octamer oligonucleotides. Possible applications of SBH include physical mapping (ordering) of overlapping DNA clones, sequence checking, DNA fingerprinting comparisons of normal and disease-causing genes, and the identification of DNA fragments with particular sequence motifs in complementary DNA and genomic libraries. The SBH techniques may accelerate the mapping and sequencing phases of the human genome project.

  13. Optimization of a sequence of reactors

    DEFF Research Database (Denmark)

    Vidal, Rene Victor Valqui

    1991-01-01

    Concerns the optimal production of sulphuric acid in a sequence of reactors. Using a suitable approximation to the objective function, this problem can easily be solved using the maximum principle. A numerical example documents the applicability of the suggested approach...

  14. Network of tRNA Gene Sequences

    Institute of Scientific and Technical Information of China (English)

    WEI Fang-ping; LI Sheng; MA Hong-ru

    2008-01-01

    A network of 3719 tRNA gene sequences was constructed using simplest alignment. Its topology, degree distribution and clustering coefficient were studied. The behaviors of the network shift from fluctuated distribution to scale-free distribution when the similarity degree of the tRNA gene sequences increases. The tRNA gene sequences with the same anticodon identity are more self-organized than those with different anticodon identities and form local clusters in the network. Some vertices of the local cluster have a high connection with other local clusters, and the probable reason was given. Moreover, a network constructed by the same number of random tRNA sequences was used to make comparisons. The relationships between the properties of the tRNA similarity network and the characters of tRNA evolutionary history were discussed.

  15. Female-specific DNA sequences in geese.

    Science.gov (United States)

    Huang, M C; Lin, W C; Horng, Y M; Rouvier, R; Huang, C W

    2003-07-01

    1. The OPAE random primers (Operon Technologies, Inc., CA) were used for random amplified polymorphic DNA (RAPD) fingerprinting in Chinese, White Roman and Landaise geese. One of these primers, OPAE-06, produced a 938-bp sex-specific fragment in all females and in no males of Chinese geese only. 2. A novel female-specific DNA sequence in Chinese goose was cloned and sequenced. Two primers, CGSex-F and CGSex-R, were designed in order to amplify a 912-bp sex-specific polymerase chain reaction (PCR) fragment on genomic DNA from female geese. 3. It was shown that a simple and effective PCR-based sexing technique could be used in the three goose breeds studied. 4. Nucleotide sequencing of the sex-specific fragments in White Roman and Landaise geese was performed and sequence differences were observed among these three breeds.

  16. New stopping criteria for segmenting DNA sequences

    CERN Document Server

    Li, W

    2001-01-01

    We propose a solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns. This new stopping criterion is based on Bayesian Information Criterion (BIC) in the model selection framework. When this stopping criterion is applied to a left telomere sequence of yeast Saccharomyces cerevisiae and the complete genome sequence of bacterium Escherichia coli, borders of biologically meaningful units were identified (e.g. subtelomeric units, replication origin, and replication terminus), and a more reasonable number of domains was obtained. We also introduce a measure called segmentation strength which can be used to control the delineation of large domains. The relationship between the average domain size and the threshold of segmentation strength is determined for several genome sequences.

  17. Sequence Matching Analysis for Curriculum Development

    National Research Council Canada - National Science Library

    Liem Yenny Bendatu; Bernardo Nugroho Yahya

    2015-01-01

    .... This study attempts to develop a sequence matching analysis. Considering conformance checking as the basis of this approach, this proposed approach utilizes the current control flow technique in process mining domain...

  18. Supervised Sequence Labelling with Recurrent Neural Networks

    CERN Document Server

    Graves, Alex

    2012-01-01

    Supervised sequence labelling is a vital area of machine learning, encompassing tasks such as speech, handwriting and gesture recognition, protein secondary structure prediction and part-of-speech tagging. Recurrent neural networks are powerful sequence learning tools—robust to input noise and distortion, able to exploit long-range contextual information—that would seem ideally suited to such problems. However their role in large-scale sequence labelling systems has so far been auxiliary.    The goal of this book is a complete framework for classifying and transcribing sequential data with recurrent neural networks only. Three main innovations are introduced in order to realise this goal. Firstly, the connectionist temporal classification output layer allows the framework to be trained with unsegmented target sequences, such as phoneme-level speech transcriptions; this is in contrast to previous connectionist approaches, which were dependent on error-prone prior segmentation. Secondly, multidimensional...

  19. Characterizing leader sequences of CRISPR loci

    DEFF Research Database (Denmark)

    Alkhnbashi, Omer; Shah, Shiraz Ali; Garrett, Roger Antony

    2016-01-01

    The CRISPR-Cas system is an adaptive immune system in many archaea and bacteria, which provides resistance against invading genetic elements. The first phase of CRISPR-Cas immunity is called adaptation, in which small DNA fragments are excised from genetic elements and are inserted into a CRISPR...... array generally adjacent to its so called leader sequence at one end of the array. It has been shown that transcription initiation and adaptation signals of the CRISPR array are located within the leader. However, apart from promoters, there is very little knowledge of sequence or structural motifs...... sequences by focusing on the consensus repeat of the adjacent CRISPR array and weak upstream conservation signals. We applied our tool to the analysis of a comprehensive genomic database and identified several characteristic properties of leader sequences specific to archaea and bacteria, ranging from...

  20. Nanopore DNA sequencing using kinetic proofreading

    Science.gov (United States)

    Ling, Xinsheng

    We propose a method of DNA sequencing by combining the physical method of nanopore electrical measurements and Southern's sequencing-by-hybridization. The new key ingredient, essential to both lowering the costs and increasing the precision, is an asymmetric nanopore sandwich device capable of measuring the DNA hybridization probe twice separated by a designed waiting time. Those incorrect probes appearing only once in nanopore ionic current traces are discriminated from the correct ones that appear twice. This method of discrimination is similar to the principle of kinetic proofreading proposed by Hopfield and Ninio in gene transcription and translation processes. An error analysis is of this nanopore kinetic proofreading (nKP) technique for DNA sequencing is carried out in comparison with the most precise 3' dideoxy termination method developed by Sanger. Nanopore DNA sequencing using kinetic proofreading.

  1. Improved polynomial remainder sequences for Ore polynomials.

    Science.gov (United States)

    Jaroschek, Maximilian

    2013-11-01

    Polynomial remainder sequences contain the intermediate results of the Euclidean algorithm when applied to (non-)commutative polynomials. The running time of the algorithm is dependent on the size of the coefficients of the remainders. Different ways have been studied to make these as small as possible. The subresultant sequence of two polynomials is a polynomial remainder sequence in which the size of the coefficients is optimal in the generic case, but when taking the input from applications, the coefficients are often larger than necessary. We generalize two improvements of the subresultant sequence to Ore polynomials and derive a new bound for the minimal coefficient size. Our approach also yields a new proof for the results in the commutative case, providing a new point of view on the origin of the extraneous factors of the coefficients.

  2. Generalized locally Toeplitz sequences theory and applications

    CERN Document Server

    Garoni, Carlo

    2017-01-01

    Based on their research experience, the authors propose a reference textbook in two volumes on the theory of generalized locally Toeplitz sequences and their applications. This first volume focuses on the univariate version of the theory and the related applications in the unidimensional setting, while the second volume, which addresses the multivariate case, is mainly devoted to concrete PDE applications. This book systematically develops the theory of generalized locally Toeplitz (GLT) sequences and presents some of its main applications, with a particular focus on the numerical discretization of differential equations (DEs). It is the first book to address the relatively new field of GLT sequences, which occur in numerous scientific applications and are especially dominant in the context of DE discretizations. Written for applied mathematicians, engineers, physicists, and scientists who (perhaps unknowingly) encounter GLT sequences in their research, it is also of interest to those working in the fields of...

  3. Extracting biological knowledge from DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    De La Vega, F.M. [CINVESTAV-IPN (Mexico); Thieffry, D. [Universite Libre de Bruxelles, Rhode-Saint-Genese (Belgium)]|[Universidad Nacional Autonoma de Mexico, Morelos (Mexico); Collado-Vides, J. [Universidad Nacional Autonoma de Mexico, Morelos (Mexico)

    1996-12-31

    This session describes the elucidation of information from dna sequences and what challenges computational biologists face in their task of summarizing and deciphering the human genome. Techniques discussed include methods from statistics, information theory, artificial intelligence and linguistics. 1 ref.

  4. Genetics Home Reference: isolated lissencephaly sequence

    Science.gov (United States)

    ... Facebook Share on Twitter Your Guide to Understanding Genetic Conditions Search MENU Toggle navigation Home Page Search ... Conditions Genes Chromosomes & mtDNA Resources Help Me Understand Genetics Home Health Conditions isolated lissencephaly sequence isolated lissencephaly ...

  5. GASSST: global alignment short sequence search tool

    National Research Council Canada - National Science Library

    Rizk, Guillaume; Lavenier, Dominique

    2010-01-01

    .... Our goal with our new aligner GASSST (Global Alignment Short Sequence Search Tool) is thus 2-fold-achieving high performance with no restrictions on the number of indels with a design that is still effective on long reads...

  6. Long range correlations in DNA sequences

    CERN Document Server

    Mohanty, A K

    2002-01-01

    The so called long range correlation properties of DNA sequences are studied using the variance analyses of the density distribution of a single or a group of nucleotides in a model independent way. This new method which was suggested earlier has been applied to extract slope parameters that characterize the correlation properties for several intron containing and intron less DNA sequences. An important aspect of all the DNA sequences is the properties of complimentarity by virtue of which any two complimentary distributions (like GA is complimentary to TC or G is complimentary to ATC) have identical fluctuations at all scales although their distribution functions need not be identical. Due to this complimentarity, the famous DNA walk representation whose statistical interpretation is still unresolved is shown to be a special case of the present formalism with a density distribution corresponding to a purine or a pyrimidine group. Another interesting aspect of most of the DNA sequences is that the factorial m...

  7. Glycome mapping on DNA sequencing equipment.

    Science.gov (United States)

    Laroy, Wouter; Contreras, Roland; Callewaert, Nico

    2006-01-01

    Here we provide a detailed protocol for the analysis of protein-linked glycans on DNA sequencing equipment. This protocol satisfies the glyco-analytical needs of many projects and can form the basis of 'glycomics' studies, in which robustness, high throughput, high sensitivity and reliable quantification are of paramount importance. The protocol routinely resolves isobaric glycan stereoisomers, which is much more difficult by mass spectrometry (MS). Earlier methods made use of polyacrylamide gel-based sequencers, but we have now adapted the technique to multicapillary DNA sequencers, which represent the state of the art today. In addition, we have integrated an option for HPLC-based fractionation of highly anionic 8-amino-1,3,6-pyrenetrisulfonic acid (APTS)-labeled glycans before rapid capillary electrophoretic profiling. This option facilitates either two-dimensional profiling of complex glycan mixtures and exoglycosidase sequencing, or MS analysis of particular compounds of interest rather than of the total pool of glycans in a sample.

  8. Simultaneous sensorimotor adaptation and sequence learning.

    Science.gov (United States)

    Overduin, Simon A; Richardson, Andrew G; Bizzi, Emilio; Press, Daniel Z

    2008-01-01

    Sensorimotor adaptation and sequence learning have often been treated as distinct forms of motor learning. But frequently the motor system must acquire both types of experience simultaneously. Here, we investigated the interaction of these two forms of motor learning by having subjects adapt to predictable forces imposed by a robotic manipulandum while simultaneously reaching to an implicit sequence of targets. We show that adaptation to novel dynamics and learning of a sequence of movements can occur simultaneously and without significant interference or facilitation. When both conditions were presented simultaneously to subjects, their trajectory error and reaction time decreased to the same extent as those of subjects who experienced the force field or sequence independently.

  9. Sequencing Information Management System (SIMS). Final report

    Energy Technology Data Exchange (ETDEWEB)

    Fields, C.

    1996-02-15

    A feasibility study to develop a requirements analysis and functional specification for a data management system for large-scale DNA sequencing laboratories resulted in a functional specification for a Sequencing Information Management System (SIMS). This document reports the results of this feasibility study, and includes a functional specification for a SIMS relational schema. The SIMS is an integrated information management system that supports data acquisition, management, analysis, and distribution for DNA sequencing laboratories. The SIMS provides ad hoc query access to information on the sequencing process and its results, and partially automates the transfer of data between laboratory instruments, analysis programs, technical personnel, and managers. The SIMS user interfaces are designed for use by laboratory technicians, laboratory managers, and scientists. The SIMS is designed to run in a heterogeneous, multiplatform environment in a client/server mode. The SIMS communicates with external computational and data resources via the internet.

  10. The DWPF Melter proposed heat up sequence

    Energy Technology Data Exchange (ETDEWEB)

    Smith, M.E.

    1989-08-11

    Per the request of DWPT supervision, a proposed heatup sequence for the DWPF Melter has been documented in this report. DWPF personnel will use this report as a guide to write the detailed DWPF Melter startup plan. 6 refs.

  11. Identifying driver mutations in sequenced cancer genomes

    DEFF Research Database (Denmark)

    Raphael, Benjamin J; Dobson, Jason R; Oesper, Layla

    2014-01-01

    High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise......, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA...... sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein...

  12. A disruptive sequencer meets disruptive publishing.

    Science.gov (United States)

    Loman, Nick; Goodwin, Sarah; Jansen, Hans; Loose, Matt

    2015-01-01

    Nanopore sequencing was recently made available to users in the form of the Oxford Nanopore MinION. Released to users through an early access programme, the MinION is made unique by its tiny form factor and ability to generate very long sequences from single DNA molecules. The platform is undergoing rapid evolution with three distinct nanopore types and five updates to library preparation chemistry in the last 18 months. To keep pace with the rapid evolution of this sequencing platform, and to provide a space where new analysis methods can be openly discussed, we present a new F1000Research channel devoted to updates to and analysis of nanopore sequence data.

  13. "X"-tending the Fibonacci Sequence.

    Science.gov (United States)

    Moran, Glenn T.

    2002-01-01

    Outlines a lesson on the Fibonacci and Lucas sequences that captures student interest by presenting the opportunity for computation practice, mental mathematics, and proof for algebra students. Discusses an extension for solving simultaneous equations. (YDS)

  14. Fibonacci Sequence and Supramolecular Structure of DNA.

    Science.gov (United States)

    Shabalkin, I P; Grigor'eva, E Yu; Gudkova, M V; Shabalkin, P I

    2016-05-01

    We proposed a new model of supramolecular DNA structure. Similar to the previously developed by us model of primary DNA structure [11-15], 3D structure of DNA molecule is assembled in accordance to a mathematic rule known as Fibonacci sequence. Unlike primary DNA structure, supramolecular 3D structure is assembled from complex moieties including a regular tetrahedron and a regular octahedron consisting of monomers, elements of the primary DNA structure. The moieties of the supramolecular DNA structure forming fragments of regular spatial lattice are bound via linker (joint) sequences of the DNA chain. The lattice perceives and transmits information signals over a considerable distance without acoustic aberrations. Linker sequences expand conformational space between lattice segments allowing their sliding relative to each other under the action of external forces. In this case, sliding is provided by stretching of the stacked linker sequences.

  15. Sequence finishing and mapping of Drosophila melanogasterheterochromatin

    Energy Technology Data Exchange (ETDEWEB)

    Hoskins, Roger A.; Carlson, Joseph W.; Kennedy, Cameron; Acevedo,David; Evans-Holm, Martha; Frise, Erwin; Wan, Kenneth H.; Park, Soo; Mendez-Lago, Maria; Rossi, Fabrizio; Villasante, Alfredo; Dimitri,Patrizio; Karpen, Gary H.; Celniker, Susan E.

    2007-06-15

    Genome sequences for most metazoans are incomplete due tothe presence of repeated DNA in the pericentromeric heterochromatin. Theheterochromatic regions of D. melanogaster contain 20 Mb of sequenceamenable to mapping, sequence assembly and finishing. Here we describethe generation of 15 Mb of finished or improved heterochromatic sequenceusing available clone resources and assembly and mapping methods. We alsoconstructed a BAC-based physical map that spans approximately 13 Mb ofthe pericentromeric heterochromatin, and a cytogenetic map that positionsapproximately 11 Mb of BAC contigs and sequence scaffolds in specificchromosomal locations. The integrated sequence assembly and maps greatlyimprove our understanding of the structure and composition of this poorlyunderstood fraction of a metazoan genome and provide a framework forfunctional analyses.

  16. The complete DNA sequence of vaccinia virus.

    Science.gov (United States)

    Goebel, S J; Johnson, G P; Perkus, M E; Davis, S W; Winslow, J P; Paoletti, E

    1990-11-01

    The complete DNA sequence of the genome of vaccinia virus has been determined. The genome consisted of 191,636 bp with a base composition of 66.6% A + T. We have identified 198 "major" protein-coding regions and 65 overlapping "minor" regions, for a total of 263 potential genes. Genes encoded by the virus were located by examination of DNA sequence characteristics and compared with existing vaccinia virus mapping analyses, sequence data, and transcription data. These genes were found to be compactly organized along the genome with relatively few regions of noncoding sequences. Whereas several similarities to proteins of known function were discerned, the function of the majority of proteins encoded by these open reading frames is as yet undetermined.

  17. Origins of the protein synthesis cycle

    Science.gov (United States)

    Fox, S. W.

    1981-01-01

    Largely derived from experiments in molecular evolution, a theory of protein synthesis cycles has been constructed. The sequence begins with ordered thermal proteins resulting from the self-sequencing of mixed amino acids. Ordered thermal proteins then aggregate to cell-like structures. When they contained proteinoids sufficiently rich in lysine, the structures were able to synthesize offspring peptides. Since lysine-rich proteinoid (LRP) also catalyzes the polymerization of nucleoside triphosphate to polynucleotides, the same microspheres containing LRP could have synthesized both original cellular proteins and cellular nucleic acids. The LRP within protocells would have provided proximity advantageous for the origin and evolution of the genetic code.

  18. ASAP: Amplification, sequencing & annotation of plastomes

    Directory of Open Access Journals (Sweden)

    Folta Kevin M

    2005-12-01

    Full Text Available Abstract Background Availability of DNA sequence information is vital for pursuing structural, functional and comparative genomics studies in plastids. Traditionally, the first step in mining the valuable information within a chloroplast genome requires sequencing a chloroplast plasmid library or BAC clones. These activities involve complicated preparatory procedures like chloroplast DNA isolation or identification of the appropriate BAC clones to be sequenced. Rolling circle amplification (RCA is being used currently to amplify the chloroplast genome from purified chloroplast DNA and the resulting products are sheared and cloned prior to sequencing. Herein we present a universal high-throughput, rapid PCR-based technique to amplify, sequence and assemble plastid genome sequence from diverse species in a short time and at reasonable cost from total plant DNA, using the large inverted repeat region from strawberry and peach as proof of concept. The method exploits the highly conserved coding regions or intergenic regions of plastid genes. Using an informatics approach, chloroplast DNA sequence information from 5 available eudicot plastomes was aligned to identify the most conserved regions. Cognate primer pairs were then designed to generate ~1 – 1.2 kb overlapping amplicons from the inverted repeat region in 14 diverse genera. Results 100% coverage of the inverted repeat region was obtained from Arabidopsis, tobacco, orange, strawberry, peach, lettuce, tomato and Amaranthus. Over 80% coverage was obtained from distant species, including Ginkgo, loblolly pine and Equisetum. Sequence from the inverted repeat region of strawberry and peach plastome was obtained, annotated and analyzed. Additionally, a polymorphic region identified from gel electrophoresis was sequenced from tomato and Amaranthus. Sequence analysis revealed large deletions in these species relative to tobacco plastome thus exhibiting the utility of this method for structural and

  19. Sequencing and Analysis of Neanderthal Genomic DNA

    OpenAIRE

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo, Svante; Pritchard, Jonathan K; Rubin, Edward M.

    2006-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library a...

  20. Inconsistencies in Neanderthal genomic DNA sequences.

    Directory of Open Access Journals (Sweden)

    Jeffrey D Wall

    2007-10-01

    Full Text Available Two recently published papers describe nuclear DNA sequences that were obtained from the same Neanderthal fossil. Our reanalyses of the data from these studies show that they are not consistent with each other and point to serious problems with the data quality in one of the studies, possibly due to modern human DNA contaminants and/or a high rate of sequencing errors.

  1. Recursive Polynomial Remainder Sequence and its Subresultants

    OpenAIRE

    Terui, Akira

    2008-01-01

    We introduce concepts of "recursive polynomial remainder sequence (PRS)" and "recursive subresultant," along with investigation of their properties. A recursive PRS is defined as, if there exists the GCD (greatest common divisor) of initial polynomials, a sequence of PRSs calculated "recursively" for the GCD and its derivative until a constant is derived, and recursive subresultants are defined by determinants representing the coefficients in recursive PRS as functions of coefficients of init...

  2. Inferring phylogenies from RAD sequence data.

    Directory of Open Access Journals (Sweden)

    Benjamin E R Rubin

    Full Text Available Reduced-representation genome sequencing represents a new source of data for systematics, and its potential utility in interspecific phylogeny reconstruction has not yet been explored. One approach that seems especially promising is the use of inexpensive short-read technologies (e.g., Illumina, SOLiD to sequence restriction-site associated DNA (RAD--the regions of the genome that flank the recognition sites of restriction enzymes. In this study, we simulated the collection of RAD sequences from sequenced genomes of different taxa (Drosophila, mammals, and yeasts and developed a proof-of-concept workflow to test whether informative data could be extracted and used to accurately reconstruct "known" phylogenies of species within each group. The workflow consists of three basic steps: first, sequences are clustered by similarity to estimate orthology; second, clusters are filtered by taxonomic coverage; and third, they are aligned and concatenated for "total evidence" phylogenetic analysis. We evaluated the performance of clustering and filtering parameters by comparing the resulting topologies with well-supported reference trees and we were able to identify conditions under which the reference tree was inferred with high support. For Drosophila, whole genome alignments allowed us to directly evaluate which parameters most consistently recovered orthologous sequences. For the parameter ranges explored, we recovered the best results at the low ends of sequence similarity and taxonomic representation of loci; these generated the largest supermatrices with the highest proportion of missing data. Applications of the method to mammals and yeasts were less successful, which we suggest may be due partly to their much deeper evolutionary divergence times compared to Drosophila (crown ages of approximately 100 and 300 versus 60 Mya, respectively. RAD sequences thus appear to hold promise for reconstructing phylogenetic relationships in younger clades in

  3. Inferring phylogenies from RAD sequence data.

    Science.gov (United States)

    Rubin, Benjamin E R; Ree, Richard H; Moreau, Corrie S

    2012-01-01

    Reduced-representation genome sequencing represents a new source of data for systematics, and its potential utility in interspecific phylogeny reconstruction has not yet been explored. One approach that seems especially promising is the use of inexpensive short-read technologies (e.g., Illumina, SOLiD) to sequence restriction-site associated DNA (RAD)--the regions of the genome that flank the recognition sites of restriction enzymes. In this study, we simulated the collection of RAD sequences from sequenced genomes of different taxa (Drosophila, mammals, and yeasts) and developed a proof-of-concept workflow to test whether informative data could be extracted and used to accurately reconstruct "known" phylogenies of species within each group. The workflow consists of three basic steps: first, sequences are clustered by similarity to estimate orthology; second, clusters are filtered by taxonomic coverage; and third, they are aligned and concatenated for "total evidence" phylogenetic analysis. We evaluated the performance of clustering and filtering parameters by comparing the resulting topologies with well-supported reference trees and we were able to identify conditions under which the reference tree was inferred with high support. For Drosophila, whole genome alignments allowed us to directly evaluate which parameters most consistently recovered orthologous sequences. For the parameter ranges explored, we recovered the best results at the low ends of sequence similarity and taxonomic representation of loci; these generated the largest supermatrices with the highest proportion of missing data. Applications of the method to mammals and yeasts were less successful, which we suggest may be due partly to their much deeper evolutionary divergence times compared to Drosophila (crown ages of approximately 100 and 300 versus 60 Mya, respectively). RAD sequences thus appear to hold promise for reconstructing phylogenetic relationships in younger clades in which sufficient

  4. Task sequencing for autonomous robotic vacuum cleaners

    Science.gov (United States)

    Gorbenko, Anna; Popov, Vladimir

    2017-07-01

    Various planning problems for robotic systems are of considerable interest. One of such problems is the problem of task sequencing. In this paper, we consider the problem of task sequencing for autonomous vacuum floor cleaning robots. We consider a graph model for the problem. We propose an efficient approach to solve the problem. In particular, we use an explicit reduction from the decision version of the problem to the satisfiability problem. We present the results of computational experiments for different satisfiability algorithms.

  5. Optimization of a sequence of reactors

    DEFF Research Database (Denmark)

    Vidal, Rene Victor Valqui

    1991-01-01

    Concerns the optimal production of sulphuric acid in a sequence of reactors. Using a suitable approximation to the objective function, this problem can easily be solved using the maximum principle. A numerical example documents the applicability of the suggested approach......Concerns the optimal production of sulphuric acid in a sequence of reactors. Using a suitable approximation to the objective function, this problem can easily be solved using the maximum principle. A numerical example documents the applicability of the suggested approach...

  6. Parallel motif extraction from very long sequences

    KAUST Repository

    Sahli, Majed

    2013-01-01

    Motifs are frequent patterns used to identify biological functionality in genomic sequences, periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that focuses on collections of many short sequences, modern applications require mining of motifs in one very long sequence (i.e., in the order of several gigabytes). For this case, there exist statistical approaches that are fast but inaccurate; or combinatorial methods that are sound and complete. Unfortunately, existing combinatorial methods are serial and very slow. Consequently, they are limited to very short sequences (i.e., a few megabytes), small alphabets (typically 4 symbols for DNA sequences), and restricted types of motifs. This paper presents ACME, a combinatorial method for extracting motifs from a single very long sequence. ACME arranges the search space in contiguous blocks that take advantage of the cache hierarchy in modern architectures, and achieves almost an order of magnitude performance gain in serial execution. It also decomposes the search space in a smart way that allows scalability to thousands of processors with more than 90% speedup. ACME is the only method that: (i) scales to gigabyte-long sequences; (ii) handles large alphabets; (iii) supports interesting types of motifs with minimal additional cost; and (iv) is optimized for a variety of architectures such as multi-core systems, clusters in the cloud, and supercomputers. ACME reduces the extraction time for an exact-length query from 4 hours to 7 minutes on a typical workstation; handles 3 orders of magnitude longer sequences; and scales up to 16, 384 cores on a supercomputer. Copyright is held by the owner/author(s).

  7. A Method to Construct Generalized Fibonacci Sequences

    Directory of Open Access Journals (Sweden)

    Adalberto García-Máynez

    2016-01-01

    Full Text Available The main purpose of this paper is to study the convergence properties of Generalized Fibonacci Sequences and the series of partial sums associated with them. When the proper values of an s×s real matrix A are real and different, we give a necessary and sufficient condition for the convergence of the matrix sequence A,A2,A3,… to a matrix B.

  8. A Method to Construct Generalized Fibonacci Sequences

    OpenAIRE

    Adalberto García-Máynez; Adolfo Pimienta Acosta

    2016-01-01

    The main purpose of this paper is to study the convergence properties of Generalized Fibonacci Sequences and the series of partial sums associated with them. When the proper values of an $s\\timess$ real matrix $A$ are real and different, we give a necessary and sufficient condition for the convergence of the matrix sequence $A,{A}^{\\mathrm{2}},{A}^{\\mathrm{3}},\\dots $ to a matrix $B$ .

  9. Cactus: Algorithms for genome multiple sequence alignment

    OpenAIRE

    Paten, Benedict; Earl, Dent; Nguyen, Ngan; Diekhans, Mark; Zerbino, Daniel; Haussler, David

    2011-01-01

    Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms...

  10. Fibonacci difference sequence spaces for modulus functions

    Directory of Open Access Journals (Sweden)

    Kuldip Raj

    2015-05-01

    Full Text Available In the present paper we introduce Fibonacci difference sequence spaces l(F, Ƒ, p, u and  l_∞(F, Ƒ, p, u by using a sequence of modulus functions and a new band matrix F. We also make an effort to study some inclusion relations, topological and geometric properties of these spaces. Furthermore, the alpha, beta, gamma duals and matrix transformation of the space l(F, Ƒ, p, u are determined.

  11. Determinant Representations of Sequences: A Survey

    Directory of Open Access Journals (Sweden)

    Moghaddamfar A. R.

    2014-01-01

    Full Text Available This is a survey of recent results concerning (integer matrices whose leading principal minors are well-known sequences such as Fibonacci, Lucas, Jacobsthal and Pell (subsequences. There are different ways for constructing such matrices. Some of these matrices are constructed by homogeneous or nonhomogeneous recurrence relations, and others are constructed by convolution of two sequences. In this article, we will illustrate the idea of these methods by constructing some integer matrices of this type.

  12. Probabilistic motor sequence yields greater offline and less online learning than fixed sequence

    Directory of Open Access Journals (Sweden)

    Yue eDu

    2016-03-01

    Full Text Available It is well acknowledged that motor sequences can be learned quickly through online learning. Subsequently, the initial acquisition of a motor sequence is boosted or consolidated by offline learning. However, little is known whether offline learning can drive the fast learning of motor sequences (i.e., initial sequence learning in the first training session. To examine offline learning in the fast learning stage, we asked four groups of young adults to perform the serial reaction time (SRT task with either a fixed or probabilistic sequence and with or without preliminary knowledge of the presence of a sequence. The sequence and instruction types were manipulated to emphasize either procedural (probabilistic sequence; no preliminary knowledge or declarative (fixed sequence; with preliminary knowledge memory that were found to either facilitate or inhibit offline learning. In the SRT task, there were six learning blocks with a two-minute break between each consecutive block. Throughout the session, stimuli followed the same fixed or probabilistic pattern except in Block 5, in which stimuli appeared in a random order. We found that preliminary knowledge facilitated the learning of a fixed sequence, but not a probabilistic sequence. In addition to overall learning measured by the mean reaction time (RT, we examined the progressive changes in RT within and between blocks (i.e., online and offline learning, respectively. It was found that the two groups who performed the fixed sequence, regardless of preliminary knowledge, showed greater online learning than the other two groups who performed the probabilistic sequence. The groups who performed the probabilistic sequence, regardless of preliminary knowledge, did not display online learning, as indicated by a decline in performance within the learning blocks. However, they did demonstrate remarkably greater offline improvement in RT, which suggests that they are learning the probabilistic sequence

  13. A Unified Theoretical Framework for Cognitive Sequencing.

    Science.gov (United States)

    Savalia, Tejas; Shukla, Anuj; Bapi, Raju S

    2016-01-01

    The capacity to sequence information is central to human performance. Sequencing ability forms the foundation stone for higher order cognition related to language and goal-directed planning. Information related to the order of items, their timing, chunking and hierarchical organization are important aspects in sequencing. Past research on sequencing has emphasized two distinct and independent dichotomies: implicit vs. explicit and goal-directed vs. habits. We propose a theoretical framework unifying these two streams. Our proposal relies on brain's ability to implicitly extract statistical regularities from the stream of stimuli and with attentional engagement organizing sequences explicitly and hierarchically. Similarly, sequences that need to be assembled purposively to accomplish a goal require engagement of attentional processes. With repetition, these goal-directed plans become habits with concomitant disengagement of attention. Thus, attention and awareness play a crucial role in the implicit-to-explicit transition as well as in how goal-directed plans become automatic habits. Cortico-subcortical loops basal ganglia-frontal cortex and hippocampus-frontal cortex loops mediate the transition process. We show how the computational principles of model-free and model-based learning paradigms, along with a pivotal role for attention and awareness, offer a unifying framework for these two dichotomies. Based on this framework, we make testable predictions related to the potential influence of response-to-stimulus interval (RSI) on developing awareness in implicit learning tasks.

  14. A Unified Theoretical Framework for Cognitive Sequencing

    Directory of Open Access Journals (Sweden)

    Tejas Savalia

    2016-11-01

    Full Text Available The capacity to sequence information is central to human performance. Sequencing ability forms the foundation stone for higher order cognition related to language and goal-directed planning. Information related to the order of items, their timing, chunking and hierarchical organization are important aspects in sequencing. Past research on sequencing has emphasized two distinct and independent dichotomies: implicit versus explicit and goal-directed versus habits. We propose a theoretical framework unifying these two streams. Our proposal relies on brain's ability to implicitly extract statistical regularities from the stream of stimuli and with attentional engagement organizing sequences explicitly and hierarchically. Similarly, sequences that need to be assembled purposively to accomplish a goal require engagement of attentional processes. With repetition, these goal-directed plans become habits with concomitant disengagement of attention. Thus attention and awareness play a crucial role in the implicit-to-explicit transition as well as in how goal-directed plans become automatic habits. Cortico-subcortical loops ─ basal ganglia-frontal cortex and hippocampus-frontal cortex loops ─ mediate the transition process. We show how the computational principles of model-free and model-based learning paradigms, along with a pivotal role for attention and awareness, offer a unifying framework for these two dichotomies. Based on this framework, we make testable predictions related to the potential influence of response-to-stimulus interval (RSI on developing awareness in implicit learning tasks.

  15. Unlocking short read sequencing for metagenomics.

    Directory of Open Access Journals (Sweden)

    Sébastien Rodrigue

    Full Text Available BACKGROUND: Different high-throughput nucleic acid sequencing platforms are currently available but a trade-off currently exists between the cost and number of reads that can be generated versus the read length that can be achieved. METHODOLOGY/PRINCIPAL FINDINGS: We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read. CONCLUSIONS/SIGNIFICANCE: This strategy is broadly applicable to sequencing applications that benefit from low-cost high-throughput sequencing, but require longer read lengths. We demonstrate that our approach enables metagenomic analyses using the Illumina Genome Analyzer, with low error rates, and at a fraction of the cost of pyrosequencing.

  16. Sequence determinants of human microsatellite variability

    Directory of Open Access Journals (Sweden)

    Jakobsson Mattias

    2009-12-01

    Full Text Available Abstract Background Microsatellite loci are frequently used in genomic studies of DNA sequence repeats and in population studies of genetic variability. To investigate the effect of sequence properties of microsatellites on their level of variability we have analyzed genotypes at 627 microsatellite loci in 1,048 worldwide individuals from the HGDP-CEPH cell line panel together with the DNA sequences of these microsatellites in the human RefSeq database. Results Calibrating PCR fragment lengths in individual genotypes by using the RefSeq sequence enabled us to infer repeat number in the HGDP-CEPH dataset and to calculate the mean number of repeats (as opposed to the mean PCR fragment length, under the assumption that differences in PCR fragment length reflect differences in the numbers of repeats in the embedded repeat sequences. We find the mean and maximum numbers of repeats across individuals to be positively correlated with heterozygosity. The size and composition of the repeat unit of a microsatellite are also important factors in predicting heterozygosity, with tetra-nucleotide repeat units high in G/C content leading to higher heterozygosity. Finally, we find that microsatellites containing more separate sets of repeated motifs generally have higher heterozygosity. Conclusions These results suggest that sequence properties of microsatellites have a significant impact in determining the features of human microsatellite variability.

  17. Sequence characteristics of T4-like bacteriophage IME08 benome termini revealed by high throughput sequencing

    Directory of Open Access Journals (Sweden)

    An Xiaoping

    2011-04-01

    Full Text Available Abstract Background T4 phage is a model species that has contributed broadly to our understanding of molecular biology. T4 DNA replication and packaging share various mechanisms with human double-stranded DNA viruses such as herpes virus. The literature indicates that T4-like phage genomes have permuted terminal sequences, and are generated by a DNA terminase in a sequence-independent manner; Methods genomic DNA of T4-like bacteriophage IME08 was subjected to high throughput sequencing, and the read sequences with extraordinarily high occurrences were analyzed; Results we demonstrate that both the 5' and 3' termini of the IME08 genome starts with base G or A. The presence of a consensus sequence TTGGA|G around the breakpoint of the high frequency read sequences suggests that the terminase cuts the branched pre-genome in a sequence-preferred manner. Our analysis also shows that terminal cleavage is asymmetric, with one end cut at a consensus sequence, and the other end generated randomly. The sequence-preferred cleavage may produce sticky-ends, but with each end being packaged with different efficiencies; Conclusions this study illustrates how high throughput sequencing can be used to probe replication and packaging mechanisms in bacteriophages and/or viruses.

  18. The DNA sequence specificity of bleomycin cleavage in a systematically altered DNA sequence.

    Science.gov (United States)

    Gautam, Shweta D; Chen, Jon K; Murray, Vincent

    2017-08-01

    Bleomycin is an anti-tumour agent that is clinically used to treat several types of cancers. Bleomycin cleaves DNA at specific DNA sequences and recent genome-wide DNA sequencing specificity data indicated that the sequence 5'-RTGT*AY (where T* is the site of bleomycin cleavage, R is G/A and Y is T/C) is preferentially cleaved by bleomycin in human cells. Based on this DNA sequence, we constructed a plasmid clone to explore this bleomycin cleavage preference. By systematic variation of single nucleotides in the 5'-RTGT*AY sequence, we were able to investigate the effect of nucleotide changes on bleomycin cleavage efficiency. We observed that the preferred consensus DNA sequence for bleomycin cleavage in the plasmid clone was 5'-YYGT*AW (where W is A/T). The most highly cleaved sequence was 5'-TCGT*AT and, in fact, the seven most highly cleaved sequences conformed to the consensus sequence 5'-YYGT*AW. A comparison with genome-wide results was also performed and while the core sequence was similar in both environments, the surrounding nucleotides were different.

  19. Monitoring Genomic Sequences during SELEX Using High-Throughput Sequencing: Neutral SELEX

    Science.gov (United States)

    Chen, Doris; Lorenz, Christina; Schroeder, Renée

    2010-01-01

    Background SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. Methodology/Principal Findings To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. Conclusions/Significance Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers. PMID:20161784

  20. Monitoring genomic sequences during SELEX using high-throughput sequencing: neutral SELEX.

    Directory of Open Access Journals (Sweden)

    Bob Zimmermann

    Full Text Available BACKGROUND: SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. METHODOLOGY/PRINCIPAL FINDINGS: To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. CONCLUSIONS/SIGNIFICANCE: Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

  1. Generating matrix and sums of Fibonacci and Pell sequences

    Science.gov (United States)

    Ho, C. K.; Woon, H. S.; Chong, Chin-Yoon

    2014-07-01

    In this paper, we study the Fibonacci sequence and Pell sequence and developed generating matrices for them. First we proved two results on the even sum of the Fibonacci sequence and the Pell sequence, using the generating matrix approach. We then deduce the odd sums, some identities and recursive formulas for these two sequences.

  2. Processing Aftershock Sequences Using Waveform Correlation

    Science.gov (United States)

    Resor, M. E.; Procopio, M. J.; Young, C. J.; Carr, D. B.

    2008-12-01

    For most event monitoring systems, the objective is to keep up with the flow of incoming data, producing a bulletin with some modest, relatively constant, time delay after present time, often a period of a few hours or less. Because the association problem scales exponentially and not linearly with the number of detections, a dramatic increase in seismicity due to an aftershock sequence can easily cause the bulletin delay time to increase dramatically. In some cases, the production of a bulletin may cease altogether, until the automatic system can catch up. For a nuclear monitoring system, the implications of such a delay could be dire. Given the expected similarity between a mainshock and aftershocks, it has been proposed that waveform correlation may provide a powerful means to simultaneously increase the efficiency of processing aftershock sequences, while also lowering the detection threshold and improving the quality of the event solutions. However, many questions remain unanswered. What are the key parameters for achieving the best correlations between waveforms (window length, filtering, etc.), and are they sequence-dependent? What is the overall percentage of similar events in an aftershock sequence, i.e. what is the maximum level of efficiency that a waveform correlation could be expected to achieve? Finally, how does this percentage of events vary among sequences? Using data from the aftershock sequence for the December 26, 2004 Mw 9.1 Sumatra event, we investigate these issues by building and testing a prototype waveform correlation event detection system that automatically expands its library of known events as new signatures are indentified in the aftershock sequence (by traditional signal detection and event processing). Our system tests all incoming data against this dynamic library, thereby identify any similar events before traditional processing takes place. In the region surrounding the Sumatra event, the NEIC EDR contains 4997 events in the 9

  3. Synaptotagmin gene content of the sequenced genomes

    Directory of Open Access Journals (Sweden)

    Craxton Molly

    2004-07-01

    Full Text Available Abstract Background Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. Results I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. Conclusions I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their

  4. Artificial duplicate reads in sequencing data of 454 Genome Sequencer FLX System

    Institute of Scientific and Technical Information of China (English)

    Hui Dong; Yangyi Chen; Yan Shen; Shengyue Wang; Guoping Zhao; Weirong Jin

    2011-01-01

    The 454 Genome Sequencer (GS) FLX System is one of the next-generation sequencing systems featured by long reads, high accuracy, and ultra-high throughput.Based on the mechanism of emulsion PCR, a unique DNA template would only generate a unique sequence read after being amplified and sequenced on GS FLX.However,biased amplification of DNA templates might occur in the process of emulsion PCR, which results in production of artificial duplicate reads.Under the condition that each DNA template is unique to another, 3.49%-18.14% of total reads in GS FLX-sequencing data were found to be artificial duplicate reads.These duplicate reads may lead to misunderstanding of sequencing data and special attention should be paid to the potential biases they introduced to the data.

  5. Compilation of tRNA sequences.

    Science.gov (United States)

    Sprinzl, M; Grueter, F; Spelzhaus, A; Gauss, D H

    1980-01-11

    This compilation presents in a small space the tRNA sequences so far published. The numbering of tRNAPhe from yeast is used following the rules proposed by the participants of the Cold Spring Harbor Meeting on tRNA 1978 (1,2;Fig. 1). This numbering allows comparisons with the three dimensional structure of tRNAPhe. The secondary structure of tRNAs is indicated by specific underlining. In the primary structure a nucleoside followed by a nucleoside in brackets or a modification in brackets denotes that both types of nucleosides can occupy this position. Part of a sequence in brackets designates a piece of sequence not unambiguosly analyzed. Rare nucleosides are named according to the IUPACIUB rules (for complicated rare nucleosides and their identification see Table 1); those with lengthy names are given with the prefix x and specified in the footnotes. Footnotes are numbered according to the coordinates of the corresponding nucleoside and are indicated in the sequence by an asterisk. The references are restricted to the citation of the latest publication in those cases where several papers deal with one sequence. For additional information the reader is referred either to the original literature or to other tRNA sequence compilations (3-7). Mutant tRNAs are dealt with in a compilation by J. Celis (8). The compilers would welcome any information by the readers regarding missing material or erroneous presentation. On the basis of this numbering system computer printed compilations of tRNA sequences in a linear form and in cloverleaf form are in preparation.

  6. Inverted temperature sequences: role of deformation partitioning

    Science.gov (United States)

    Grujic, D.; Ashley, K. T.; Coble, M. A.; Coutand, I.; Kellett, D.; Whynot, N.

    2015-12-01

    The inverted metamorphism associated with the Main Central thrust zone in the Himalaya has been historically attributed to a number of tectonic processes. Here we show that there is actually a composite peak and deformation temperature sequence that formed in succession via different tectonic processes. The deformation partitioning seems to the have played a key role, and the magnitude of each process has varied along strike of the orogen. To explain the formation of the inverted metamorphic sequence across the Lesser Himalayan Sequence (LHS) in eastern Bhutan, we used Raman spectroscopy of carbonaceous material (RSCM) to determine the peak metamorphic temperatures and Ti-in-quartz thermobarometry to determine the deformation temperatures combined with thermochronology including published apatite and zircon U-Th/He and fission-track data and new 40Ar/39Ar dating of muscovite. The dataset was inverted using 3D-thermal-kinematic modeling to constrain the ranges of geological parameters such as fault geometry and slip rates, location and rates of localized basal accretion, and thermal properties of the crust. RSCM results indicate that there are two peak temperature sequences separated by a major thrust within the LHS. The internal temperature sequence shows an inverted peak temperature gradient of 12 °C/km; in the external (southern) sequence, the peak temperatures are constant across the structural sequence. Thermo-kinematic modeling suggest that the thermochronologic and thermobarometric data are compatible with a two-stage scenario: an Early-Middle Miocene phase of fast overthrusting of a hot hanging wall over a downgoing footwall and inversion of the synkinematic isotherms, followed by the formation of the external duplex developed by dominant underthrusting and basal accretion. To reconcile our observations with the experimental data, we suggest that pervasive ductile deformation within the upper LHS and along the Main Central thrust zone at its top stopped at

  7. Target enrichment sequencing in cultivated peanut (Arachis hypogaea L.) using probes designed from transcript sequences.

    Science.gov (United States)

    Peng, Ze; Fan, Wen; Wang, Liping; Paudel, Dev; Leventini, Dante; Tillman, Barry L; Wang, Jianping

    2017-05-10

    Enabled by the next generation sequencing, target enrichment sequencing (TES) is a powerful method to enrich genomic regions of interest and to identify sequence variations. The objective of this study was to explore the feasibility of probe design from transcript sequences for TES application in calling sequence variants in peanut, an important allotetraploid crop with a large genome size. In this study, we applied an in-solution hybridization method to enrich DNA sequences of seven peanut genotypes. Our results showed that it is feasible to apply TES with probes designed from transcript sequences in polyploid peanut. Using a set of 31,123 probes, a total of 5131 and 7521 genes were targeted in peanut A and B genomes, respectively. For each genotype used in this study, the probe target capture regions were efficiently covered with high depth. The average on-target rate of sequencing reads was 42.47%, with a significant amount of off-target reads coming from genomic regions homologous to target regions. In this study, when given predefined genomic regions of interest and the same amount of sequencing data, TES provided the highest coverage of target regions when compared to whole genome sequencing, RNA sequencing, and genotyping by sequencing. Single nucleotide polymorphism (SNP) calling and subsequent validation revealed a high validation rate (85.71%) of homozygous SNPs, providing valuable markers for peanut genotyping. This study demonstrated the success of applying TES for SNP identification in peanut, which shall provide valuable suggestions for TES application in other non-model species without a genome reference available.

  8. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    Science.gov (United States)

    2011-01-01

    Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/. PMID:21385349

  9. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    Directory of Open Access Journals (Sweden)

    Baldwin Stephen A

    2011-03-01

    Full Text Available Abstract Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  10. [A safe an easy method for building consensus HIV sequences from 454 massively parallel sequencing data].

    Science.gov (United States)

    Fernández-Caballero Rico, Jose Ángel; Chueca Porcuna, Natalia; Álvarez Estévez, Marta; Mosquera Gutiérrez, María Del Mar; Marcos Maeso, María Ángeles; García, Federico

    2016-10-03

    To show how to generate a consensus sequence from the information of massive parallel sequences data obtained from routine HIV anti-retroviral resistance studies, and that may be suitable for molecular epidemiology studies. Paired Sanger (Trugene-Siemens) and next-generation sequencing (NGS) (454 GSJunior-Roche) HIV RT and protease sequences from 62 patients were studied. NGS consensus sequences were generated using Mesquite, using 10%, 15%, and 20% thresholds. Molecular evolutionary genetics analysis (MEGA) was used for phylogenetic studies. At a 10% threshold, NGS-Sanger sequences from 17/62 patients were phylogenetically related, with a median bootstrap-value of 88% (IQR83.5-95.5). Association increased to 36/62 sequences, median bootstrap 94% (IQR85.5-98)], using a 15% threshold. Maximum association was at the 20% threshold, with 61/62 sequences associated, and a median bootstrap value of 99% (IQR98-100). A safe method is presented to generate consensus sequences from HIV-NGS data at 20% threshold, which will prove useful for molecular epidemiological studies. Copyright © 2016 Elsevier España, S.L.U. and Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.

  11. Sequencing of BAC pools by different next generation sequencing platforms and strategies

    Directory of Open Access Journals (Sweden)

    Scholz Uwe

    2011-10-01

    Full Text Available Abstract Background Next generation sequencing of BACs is a viable option for deciphering the sequence of even large and highly repetitive genomes. In order to optimize this strategy, we examined the influence of read length on the quality of Roche/454 sequence assemblies, to what extent Illumina/Solexa mate pairs (MPs improve the assemblies by scaffolding and whether barcoding of BACs is dispensable. Results Sequencing four BACs with both FLX and Titanium technologies revealed similar sequencing accuracy, but showed that the longer Titanium reads produce considerably less misassemblies and gaps. The 454 assemblies of 96 barcoded BACs were improved by scaffolding 79% of the total contig length with MPs from a non-barcoded library. Assembly of the unmasked 454 sequences without separation by barcodes revealed chimeric contig formation to be a major problem, encompassing 47% of the total contig length. Masking the sequences reduced this fraction to 24%. Conclusion Optimal BAC pool sequencing should be based on the longest available reads, with barcoding essential for a comprehensive assessment of both repetitive and non-repetitive sequence information. When interest is restricted to non-repetitive regions and repeats are masked prior to assembly, barcoding is non-essential. In any case, the assemblies can be improved considerably by scaffolding with non-barcoded BAC pool MPs.

  12. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Science.gov (United States)

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid...

  13. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Science.gov (United States)

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697

  14. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Directory of Open Access Journals (Sweden)

    Chun-Tien Chang

    2012-01-01

    Full Text Available The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs, insertion-deletions (indels, short tandem repeats (STRs, and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR, which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS; (iii determine human papilloma virus (HPV genotypes by searching current viral databases in cases of double infections; (iv estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4 and its paralog HSPDP3.

  15. Mixed sequence reader: a program for analyzing DNA sequences with heterozygous base calling.

    Science.gov (United States)

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3.

  16. Criteria for defining and recognizing the various orders of sequences in outcrop sequence stratigraphy

    Institute of Scientific and Technical Information of China (English)

    WANG Xunlian

    2004-01-01

    The regional distribution in different depositional facies belts is here regarded as an important criterion for defining and recognizing the various orders of sequences. The third-order sequence is possibly global in nature, which may be discerned in different depositional facies belts in one continental margin and can be correlated over long distances, sometimes even worldwide. Commonly, correlation of subsequence (fourth-order sequence with time interval of 0.5-1.5 Ma) is difficult in different facies belts, although some of them may also be worldwide in distribution. A subsequence should be able to discern and correlate within at least one facies belt.The higher-order sequences, including microsequence (fifth-order sequence) and minisequence (sixth-order sequence), are regional or local in distribution. They may reflect the longer and shorter Milankovitch cycles respectively. Sequence and subsequence are usually recognizable in different facies belts, while microsequence and minisequence may be distinguished only in shallow marine deposits, but not in slope and basin facies deposits. A brief discussion is made on the essential conditions for correct identification of sequences, useful methods of study, and problems meriting special attention in outcrop sequence stratigraphy.

  17. A survey of sequence alignment algorithms for next-generation sequencing.

    Science.gov (United States)

    Li, Heng; Homer, Nils

    2010-09-01

    Rapidly evolving sequencing technologies produce data on an unparalleled scale. A central challenge to the analysis of this data is sequence alignment, whereby sequence reads must be compared to a reference. A wide variety of alignment algorithms and software have been subsequently developed over the past two years. In this article, we will systematically review the current development of these algorithms and introduce their practical applications on different types of experimental data. We come to the conclusion that short-read alignment is no longer the bottleneck of data analyses. We also consider future development of alignment algorithms with respect to emerging long sequence reads and the prospect of cloud computing.

  18. Detection of M-Sequences from Spike Sequence in Neuronal Networks

    Directory of Open Access Journals (Sweden)

    Yoshi Nishitani

    2012-01-01

    Full Text Available In circuit theory, it is well known that a linear feedback shift register (LFSR circuit generates pseudorandom bit sequences (PRBS, including an M-sequence with the maximum period of length. In this study, we tried to detect M-sequences known as a pseudorandom sequence generated by the LFSR circuit from time series patterns of stimulated action potentials. Stimulated action potentials were recorded from dissociated cultures of hippocampal neurons grown on a multielectrode array. We could find several M-sequences from a 3-stage LFSR circuit (M3. These results show the possibility of assembling LFSR circuits or its equivalent ones in a neuronal network. However, since the M3 pattern was composed of only four spike intervals, the possibility of an accidental detection was not zero. Then, we detected M-sequences from random spike sequences which were not generated from an LFSR circuit and compare the result with the number of M-sequences from the originally observed raster data. As a result, a significant difference was confirmed: a greater number of “0–1” reversed the 3-stage M-sequences occurred than would have accidentally be detected. This result suggests that some LFSR equivalent circuits are assembled in neuronal networks.

  19. Improvement of SNR with Chaotic Spreading Sequences for CDMA

    CERN Document Server

    Umeno, K; Umeno, Ken; Kitayama, Ken-ichi

    1999-01-01

    We show that chaotic spreading sequences generated by ergodic mappings of Chebyshev orthogonal polynomials have better correlation properties for CDMA(code division multiple access) than the optimal binary sequences (Gold sequences) in the sense of ensemble average.

  20. Mining olive genome through library sequencing and bioinformatics ...

    African Journals Online (AJOL)

    African Journal of Biotechnology ... with SmaI and cloning the digestion products into pUC19 vector) and randomly picked 83 colonies were sequenced. ... about insert sequences with no hits to any sequence record with a described function.

  1. Fibonacci-like sequences and generalized Pascal's triangles

    Science.gov (United States)

    Vincenzi, G.; Siani, S.

    2014-05-01

    The properties pertaining to diagonals of generalized Pascal's triangles are studied. Combinatorial relationships between Fibonacci-like sequences and Fibonacci sequence itself are determined, using the sequence of diagonals of generalized Pascal's triangle.

  2. SNAD: sequence name annotation-based designer

    Directory of Open Access Journals (Sweden)

    Gorbalenya Alexander E

    2009-08-01

    Full Text Available Abstract Background A growing diversity of biological data is tagged with unique identifiers (UIDs associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Results Here we introduce SNAD (Sequence Name Annotation-based Designer that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. Conclusion A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.

  3. RSAT 2015: Regulatory Sequence Analysis Tools.

    Science.gov (United States)

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/.

  4. Human Genome Sequencing in Health and Disease

    Science.gov (United States)

    Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

    2013-01-01

    Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

  5. Protein sequence classification using feature hashing.

    Science.gov (United States)

    Caragea, Cornelia; Silvescu, Adrian; Mitra, Prasenjit

    2012-06-21

    Recent advances in next-generation sequencing technologies have resulted in an exponential increase in the rate at which protein sequence data are being acquired. The k-gram feature representation, commonly used for protein sequence classification, usually results in prohibitively high dimensional input spaces, for large values of k. Applying data mining algorithms to these input spaces may be intractable due to the large number of dimensions. Hence, using dimensionality reduction techniques can be crucial for the performance and the complexity of the learning algorithms. In this paper, we study the applicability of feature hashing to protein sequence classification, where the original high-dimensional space is "reduced" by hashing the features into a low-dimensional space, using a hash function, i.e., by mapping features into hash keys, where multiple features can be mapped (at random) to the same hash key, and "aggregating" their counts. We compare feature hashing with the "bag of k-grams" approach. Our results show that feature hashing is an effective approach to reducing dimensionality on protein sequence classification tasks.

  6. Sequence analysis by iterated maps, a review.

    Science.gov (United States)

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  7. BLEACHING EUCALYPTUS PULPS WITH SHORT SEQUENCES

    Directory of Open Access Journals (Sweden)

    Flaviana Reis Milagres

    2011-03-01

    Full Text Available Eucalyptus spp kraft pulp, due to its high content of hexenuronic acids, is quite easy to bleach. Therefore, investigations have been made attempting to decrease the number of stages in the bleaching process in order to minimize capital costs. This study focused on the evaluation of short ECF (Elemental Chlorine Free and TCF (Totally Chlorine Free sequences for bleaching oxygen delignified Eucalyptus spp kraft pulp to 90% ISO brightness: PMoDP (Molybdenum catalyzed acid peroxide, chlorine dioxide and hydrogen peroxide, PMoD/P (Molybdenum catalyzed acid peroxide, chlorine dioxide and hydrogen peroxide, without washing PMoD(PO (Molybdenum catalyzed acid peroxide, chlorine dioxide and pressurized peroxide, D(EPODP (chlorine dioxide, extraction oxidative with oxygen and peroxide, chlorine dioxide and hydrogen peroxide, PMoQ(PO (Molybdenum catalyzed acid peroxide, DTPA and pressurized peroxide, and XPMoQ(PO (Enzyme, molybdenum catalyzed acid peroxide, DTPA and pressurized peroxide. Uncommon pulp treatments, such as molybdenum catalyzed acid peroxide (PMo and xylanase (X bleaching stages, were used. Among the ECF alternatives, the two-stage PMoD/P sequence proved highly cost-effective without affecting pulp quality in relation to the traditional D(EPODP sequence and produced better quality effluent in relation to the reference. However, a four stage sequence, XPMoQ(PO, was required to achieve full brightness using the TCF technology. This sequence was highly cost-effective although it only produced pulp of acceptable quality.

  8. On type sequences and Arf rings

    Directory of Open Access Journals (Sweden)

    Dilip Premchand Patil

    2007-06-01

    Full Text Available In this article in Section~2 we give an explicit description to compute the type sequence $mathrm{t}_1,ldots,mathrm{t}_{n}$ of a semigroup $Gamma$ generated by an arithmetic sequence (see 2.7; we show that the $i$-th term $mathrm{t}_i$ is equal to $1$ or to the type $au_Gamma$, depending on its position. In Section 3, for analytically irreducible ring $R$ with the branch sequence $R=R_0 subsetneq R_1 subsetneq ldotssubsetneq R_{m-1} subsetneq R_{m} =overline{R}$, starting from a result proved in [4] we give a characterization (see 3.6 of the ``Arf'' property using the type sequence of $R$ and of the rings $R_j$, $1leq jleq m-1$. Further, we prove (see 3.9, 3.10 some relations among the integers $ell^*(R$ and $ell^*(R_j$, $1leq jleq m-1$. These relations and a result of [6] allow us to obtain a new characterization (see 3.12 of semigroup rings of minimal multiplicity with $ell^*(Rleq au(R$ in terms of the Arf property, type sequences and relations between $ell^*(R$ and $ell^*(R_j$, $1leq jleq m-1$.

  9. Progressive multiple sequence alignments from triplets

    Directory of Open Access Journals (Sweden)

    Stadler Peter F

    2007-07-01

    Full Text Available Abstract Background The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors. Research Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the "once a gap, always a gap" problem of progressive alignment procedures. Conclusion The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mismatch scores.

  10. Viral genome sequencing by random priming methods

    Directory of Open Access Journals (Sweden)

    Zhang Xinsheng

    2008-01-01

    Full Text Available Abstract Background Most emerging health threats are of zoonotic origin. For the overwhelming majority, their causative agents are RNA viruses which include but are not limited to HIV, Influenza, SARS, Ebola, Dengue, and Hantavirus. Of increasing importance therefore is a better understanding of global viral diversity to enable better surveillance and prediction of pandemic threats; this will require rapid and flexible methods for complete viral genome sequencing. Results We have adapted the SISPA methodology 123 to genome sequencing of RNA and DNA viruses. We have demonstrated the utility of the method on various types and sources of viruses, obtaining near complete genome sequence of viruses ranging in size from 3,000–15,000 kb with a median depth of coverage of 14.33. We used this technique to generate full viral genome sequence in the presence of host contaminants, using viral preparations from cell culture supernatant, allantoic fluid and fecal matter. Conclusion The method described is of great utility in generating whole genome assemblies for viruses with little or no available sequence information, viruses from greatly divergent families, previously uncharacterized viruses, or to more fully describe mixed viral infections.

  11. Probabilistic Sequence Learning in Mild Cognitive Impairment

    Directory of Open Access Journals (Sweden)

    Dezso eNemeth

    2013-07-01

    Full Text Available Mild Cognitive Impairment (MCI causes slight but noticeable disruption in cognitive systems, primarily executive and memory functions. However, it is not clear if the development of sequence learning is affected by an impaired cognitive system and, if so, how. The goal of our study was to investigate the development of probabilistic sequence learning, from the initial acquisition to consolidation, in MCI and healthy elderly control groups. We used the Alternating Serial Reaction Time task (ASRT to measure probabilistic sequence learning. Individuals with MCI showed weaker learning performance than the healthy elderly group. However, using the reaction times only from the second half of each learning block – after the reactivation phase - we found intact learning in MCI. Based on the assumption that the first part of each learning block is related to reactivation/recall processes, we suggest that these processes are affected in MCI. The 24-hour offline period showed no effect on sequence-specific learning in either group but did on general skill learning: the healthy elderly group showed offline improvement in general reaction times while individuals with MCI did not. Our findings deepen our understanding regarding the underlying mechanisms and time course of sequence acquisition and consolidation.

  12. Getting started in mapping-by-sequencing.

    Science.gov (United States)

    Candela, Héctor; Casanova-Sáez, Rubén; Micol, José Luis

    2015-07-01

    Next-generation sequencing (NGS) technologies allow the cost-effective sequencing of whole genomes and have expanded the scope of genomics to novel applications, such as the genome-wide characterization of intraspecific polymorphisms and the rapid mapping and identification of point mutations. Next-generation sequencing platforms, such as the Illumina HiSeq2000 platform, are now commercially available at affordable prices and routinely produce an enormous amount of sequence data, but their wide use is often hindered by a lack of knowledge on how to manipulate and process the information produced. In this review, we focus on the strategies that are available to geneticists who wish to incorporate these novel approaches into their research but who are not familiar with the necessary bioinformatic concepts and computational tools. In particular, we comprehensively summarize case studies where the use of NGS technologies has led to the identification of point mutations, a strategy that has been dubbed "mapping-by-sequencing", and review examples from plants and other model species such as Caenorhabditis elegans, Saccharomyces cerevisiae, and Drosophila melanogaster. As these technologies are becoming cheaper and more powerful, their use is also expanding to allow mutation identification in species with larger genomes, such as many crop plants. © 2014 Institute of Botany, Chinese Academy of Sciences.

  13. DNA sequencing by synthesis with degenerate primers

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    The degenerate primer-based sequencing Was developed by a synthesis method(DP-SBS)for high-throughput DNA sequencing,in which a set of degenerate primers are hybridized on the arrayed DNA templates and extended by DNA polymerase on microarrays.In this method,adifferent set of degenerate primers containing a give nnumber(n)of degenerate nucleotides at the 3'-ends were annealed to the sequenced templates that were immobilized on the solid surface.The nucleotides(n+1)on the template sequences were determined by detecting the incorporation of fluorescent labeled nucleotides.The fluorescent labeled nucleotide was incorporated into the primer in a base-specific manner after the enzymatic primer extension reactions and nine-base length were read out accurately.The main advanmge of the DP-SBS is that the method only uses very conventional biochemical reagents and avoids the complicated special chemical reagents for removing the labeled nucleotides and reactivating the primer for further extension.From the present study,it is found that the DP-SBS method is reliable,simple,and cost-effective for laboratory-sequencing a large amount of short DNA fragments.

  14. OTU analysis using metagenomic shotgun sequencing data.

    Directory of Open Access Journals (Sweden)

    Xiaolin Hao

    Full Text Available Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs. Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.

  15. End Sequencing and Finger Printing of Human & Mouse BAC Libraries

    Energy Technology Data Exchange (ETDEWEB)

    Fraser, C

    2005-09-27

    This project provided for continued end sequencing of existing and new BAC libraries constructed to support human sequencing as well as to initiate BAC end sequencing from the mouse BAC libraries constructed to support mouse sequencing. The clones, the sequences, and the fingerprints are now an available resource for the community at large. Research and development of new metaodologies for BAC end sequencing have reduced costs and increase throughput.

  16. Preparation of SELEX Samples for Next-Generation Sequencing.

    Science.gov (United States)

    Tolle, Fabian; Mayer, Günter

    2016-01-01

    Fuelled by massive whole genome sequencing projects such as the human genome project, enormous technological advancements and therefore tremendous price drops could be achieved, rendering next-generation sequencing very attractive for deep sequencing of SELEX libraries. Herein we describe the preparation of SELEX samples for Illumina sequencing, based on the already established whole genome sequencing workflow. We describe the addition of barcode sequences for multiplexing and the adapter ligation, avoiding associated pitfalls.

  17. Comparison study on k-word statistical measures for protein: From sequence to 'sequence space'

    Directory of Open Access Journals (Sweden)

    Wang Tianming

    2008-09-01

    Full Text Available Abstract Background Many proposed statistical measures can efficiently compare protein sequence to further infer protein structure, function and evolutionary information. They share the same idea of using k-word frequencies of protein sequences. Given a protein sequence, the information on its related protein sequences hasn't been used for protein sequence comparison until now. This paper proposed a scheme to construct protein 'sequence space' which was associated with protein sequences related to the given protein, and the performances of statistical measures were compared when they explored the information on protein 'sequence space' or not. This paper also presented two statistical measures for protein: gre.k (generalized relative entropy and gsm.k (gapped similarity measure. Results We tested statistical measures based on protein 'sequence space' or not with three data sets. This not only offers the systematic and quantitative experimental assessment of these statistical measures, but also naturally complements the available comparison of statistical measures based on protein sequence. Moreover, we compared our statistical measures with alignment-based measures and the existing statistical measures. The experiments were grouped into two sets. The first one, performed via ROC (Receiver Operating Curve analysis, aims at assessing the intrinsic ability of the statistical measures to discriminate and classify protein sequences. The second set of the experiments aims at assessing how well our measure does in phylogenetic analysis. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of protein 'sequence space' and statistical measures were obtained. Conclusion Alignment-based measures have a clear advantage when the data is high redundant. The more efficient statistical measure is the novel gsm.k introduced by this article, the cos.k followed. When the data becomes less redundant, gre

  18. Generation of Physical Map Contig-Specific Sequences Useful for Whole Genome Sequence Scaffolding

    Science.gov (United States)

    Jiang, Yanliang; Ninwichian, Parichart; Liu, Shikai; Zhang, Jiaren; Kucuktas, Huseyin; Sun, Fanyue; Kaltenboeck, Ludmilla; Sun, Luyang; Bao, Lisui; Liu, Zhanjiang

    2013-01-01

    Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly) were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge. PMID:24205335

  19. Generation of physical map contig-specific sequences useful for whole genome sequence scaffolding.

    Directory of Open Access Journals (Sweden)

    Yanliang Jiang

    Full Text Available Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge.

  20. Degree sequences of k-multi-hypertournaments

    Institute of Scientific and Technical Information of China (English)

    Pirzada S

    2009-01-01

    Let n and k (n ≥ k > 1) be two non-negative integers. A k-multi-hypertournament on n vertices is a pair (V, A), where V is a set of vertices with |V|= n, and A is a set of k-tuples of vertices, called arcs, such that for any k-subset S of V, A contains at least one (at most k!) of the k! k-tuples whose entries belong to S. The necessary and sufficient conditions for a non-decreasing sequence of non-negative integers to be the out-degree sequence (in-degree sequence) of some k-multi-hypertournament are given.

  1. TRAP Sequence - An Interesting Entity in Twins

    Directory of Open Access Journals (Sweden)

    R H Srinivas Prasad

    2012-01-01

    Full Text Available Twin reversed arterial perfusion (TRAP sequence, is a rare malformation occurring in monozygotic multiple gestations. One well-developed normal (pump twin and the other twin with absent cardiac structure (acardiac, who is hemodynamically dependent on the normal (pump twin are characteristic of this syndrome. The acardiac twin develops multiple anomalies that make survival difficult. The prognosis of the pump twin is variable with mortality rate ranging from 50% to 70%. Complications that affect the prognosis of the pump twin include complications of congestive cardiac failure due to increased cardiac demand, prematurity secondary to preterm delivery, and polyhydramnios. Because of these complications prompt detection, follow-up, and treatment of this condition is very important. We report two cases of TRAP sequence that emphasizes the importance of gray-scale and color Doppler imaging in diagnosis, detection of poor prognostic features, follow-up, and management of TRAP sequence.

  2. A statistical study of aftershock sequences

    Directory of Open Access Journals (Sweden)

    Giorgio Ranalli

    2010-02-01

    Full Text Available A comprehensive statistical study of the phenomenology of aftershock sequences is made in this paper. The spatial distribution of aftershocks indicates that they are mainly crustal events; however, deeper sequences also take place. The analysis of the distribution of aftershocks in 15 sequences with respect to time and magnitude leads to the statistical confirmation of a set of phenomenological laws describing the process, namely, the time-frequency law of hyperbolic decay of aftershock activity with time, the magnitude stability law, and the exponential magnitude- frequency distribution. The hypotheses involved are checked. The grouping of data and the statistical methods employed are chosen according to some basic well·confirmed assumptions regarding the nature of the process.

  3. Inferring interaction partners from protein sequences

    CERN Document Server

    Bitbol, Anne-Florence; Colwell, Lucy J; Wingreen, Ned S

    2016-01-01

    Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multi-protein complexes, and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners. Hence, the sequences of interacting partners are correlated. Here we exploit these correlations to accurately identify which proteins are specific interaction partners from sequence data alone. Our general approach, which employs a pairwise maximum entropy model to infer direct couplings between residues, has been successfully used to predict the three-dimensional structures of proteins from sequences. Building on this approach, we introduce an iterative algorithm to predict specific interaction partners from among the members of two protein families. We assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. The algorithm proves successful without any a pri...

  4. Microbial species delineation using whole genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos; Mukherjee, Supratim; Ivanova, Natalia; Mavrommatics, Kostas; Pati, Amrita; Konstantinidis, Konstantinos

    2014-10-20

    Species assignments in prokaryotes use a manual, poly-phasic approach utilizing both phenotypic traits and sequence information of phylogenetic marker genes. With thousands of genomes being sequenced every year, an automated, uniform and scalable approach exploiting the rich genomic information in whole genome sequences is desired, at least for the initial assignment of species to an organism. We have evaluated pairwise genome-wide Average Nucleotide Identity (gANI) values and alignment fractions (AFs) for nearly 13,000 genomes using our fast implementation of the computation, identifying robust and widely applicable hard cut-offs for species assignments based on AF and gANI. Using these cutoffs, we generated stable species-level clusters of organisms, which enabled the identification of several species mis-assignments and facilitated the assignment of species for organisms without species definitions.

  5. Finding Sequential Patterns from Large Sequence Data

    CERN Document Server

    Esmaeili, Mahdi

    2010-01-01

    Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Sequential pattern mining finds sets of data items that occur together frequently in some sequences. Sequential pattern mining, which extracts frequent subsequences from a sequence database, has attracted a great deal of interest during the recent data mining research because it is the basis of many applications, such as: web user analysis, stock trend prediction, DNA sequence analysis, finding language or linguistic patterns from natural language texts, and using the history of symptoms to predict certain kind of disease. The diversity of the applications may not be possible to apply a single sequential pattern model to all these problems. Each application may require a unique model and solution. A number of research projects were established in recent years to develop meaningful sequential pattern...

  6. Dynamics and Control of DNA Sequence Amplification

    CERN Document Server

    Marimuthu, Karthikeyan

    2014-01-01

    DNA amplification is the process of replication of a specified DNA sequence \\emph{in vitro} through time-dependent manipulation of its external environment. A theoretical framework for determination of the optimal dynamic operating conditions of DNA amplification reactions, for any specified amplification objective, is presented based on first-principles biophysical modeling and control theory. Amplification of DNA is formulated as a problem in control theory with optimal solutions that can differ considerably from strategies typically used in practice. Using the Polymerase Chain Reaction (PCR) as an example, sequence-dependent biophysical models for DNA amplification are cast as control systems, wherein the dynamics of the reaction are controlled by a manipulated input variable. Using these control systems, we demonstrate that there exists an optimal temperature cycling strategy for geometric amplification of any DNA sequence and formulate optimal control problems that can be used to derive the optimal tempe...

  7. Constructing circuit codes by permuting initial sequences

    CERN Document Server

    Wynn, Ed

    2012-01-01

    Two new constructions are presented for coils and snakes in the hypercube. Improvements are made on the best known results for snake-in-the-box coils of dimensions 9, 10 and 11, and for some other circuit codes of dimensions between 8 and 13. In the first construction, circuit codes are generated from permuted copies of an initial transition sequence; the multiple copies constrain the search, so that long codes can be found relatively efficiently. In the second construction, two lower-dimensional paths are joined together with only one or two changes in the highest dimension; this requires a search for a permutation of the second sequence to fit around the first. It is possible to investigate sequences of vertices of the hypercube, including circuit codes, by connecting the corresponding vertices in an extended graph related to the hypercube. As an example of this, invertible circuit codes are briefly discussed.

  8. Transforming clinical microbiology with bacterial genome sequencing.

    Science.gov (United States)

    Didelot, Xavier; Bowden, Rory; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W

    2012-09-01

    Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.

  9. Bias of purine stretches in sequenced chromosomes

    DEFF Research Database (Denmark)

    Ussery, David; Soumpasis, Dikeos Mario; Brunak, Søren

    2002-01-01

    We examined more than 700 DNA sequences (full length chromosomes and plasmids) for stretches of purines (R) or pyrimidines (Y) and alternating YR stretches; such regions will likely adopt structures which are different from the canonical B-form. Since one turn of the DNA helix is roughly 10 bp, we...... measured the fraction of each genome which contains purine (or pyrimidine) tracts of lengths of 10 by or longer (hereafter referred to as 'purine tracts'), as well as stretches of alternating pyrimidines/purine ('pyr/pur tracts') of the same length. Using this criteria, a random sequence would be expected......, in eukaryotes there is an abundance of long stretches of purines or alternating purine/pyrimidine tracts, which cannot be explained in this way; these sequences are likely to play an important role in eukaryotic chromosome organisation....

  10. DNA sequencing by nanopores: advances and challenges

    Science.gov (United States)

    Agah, Shaghayegh; Zheng, Ming; Pasquali, Matteo; Kolomeisky, Anatoly B.

    2016-10-01

    Developing inexpensive and simple DNA sequencing methods capable of detecting entire genomes in short periods of time could revolutionize the world of medicine and technology. It will also lead to major advances in our understanding of fundamental biological processes. It has been shown that nanopores have the ability of single-molecule sensing of various biological molecules rapidly and at a low cost. This has stimulated significant experimental efforts in developing DNA sequencing techniques by utilizing biological and artificial nanopores. In this review, we discuss recent progress in the nanopore sequencing field with a focus on the nature of nanopores and on sensing mechanisms during the translocation. Current challenges and alternative methods are also discussed.

  11. ANGSD: Analysis of Next Generation Sequencing Data.

    Science.gov (United States)

    Korneliussen, Thorfinn Sand; Albrechtsen, Anders; Nielsen, Rasmus

    2014-11-25

    High-throughput DNA sequencing technologies are generating vast amounts of data. Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously. We present a multithreaded program suite called ANGSD. This program can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods. The open source c/c++ program ANGSD is available at http://www.popgen.dk/angsd . The program is tested and validated on GNU/Linux systems. The program facilitates multiple input formats including BAM and imputed beagle genotype probability files. The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.

  12. Nager syndrome and Pierre Robin sequence.

    Science.gov (United States)

    Rosa, Rafael Fabiano Machado; Guimarães, Victória Bernardes; Beltrão, Luciana Amorim; Trombetta, Júlia Santana; Lliguin, Karen Lizeth Puma; de Mattos, Vinicius Freitas; Zen, Paulo Ricardo Gazzola

    2015-04-01

    Nager syndrome is considered a rare genetic syndrome characterized by craniofacial and radial anomalies. Pierre Robin sequence is a triad that includes micrognathia, cleft palate and glossoptosis. The present patient had typical findings of Nager syndrome and Pierre Robin sequence. He progressed to severe respiratory distress, requiring mechanical ventilation and tracheostomy. At 1 year and 11 months, he had episodes of cardiorespiratory arrest and died. In the literature review, we identified the clinical description of 44 patients with Nager syndrome. Among them, 93.1% had micrognathia, 38.6% cleft palate and 11.3% glossoptosis. Only one (2.3%) had all three features, as observed in the present patient. Therefore, despite the fact that the features of Pierre Robin sequence are common, there are few patients who have the complete triad. It is noteworthy, however, that they may be associated with respiratory distress, which may put the patient's life at risk.

  13. Spectral sequences in smooth generalized cohomology

    CERN Document Server

    Grady, Daniel

    2016-01-01

    We consider spectral sequences in smooth generalized cohomology theories, including differential generalized cohomology theories. The main differential spectral sequences will be of the Atiyah-Hirzebruch (AHSS) type, where we provide a filtration by the Cech resolution of smooth manifolds. This allows for systematic study of torsion in differential cohomology. We apply this in detail to smooth Deligne cohomology, differential topological complex K-theory, and to a smooth extension of integral Morava K-theory that we introduce. In each case we explicitly identify the differentials in the corresponding spectral sequences, which exhibit an interesting and systematic interplay between (refinement of) classical cohomology operations, operations involving differential forms, and operations on cohomology with U(1) coefficients.

  14. The Recursion Theorem and Infinite Sequences

    CERN Document Server

    Miller, Arnold W

    2008-01-01

    In this paper we use the Recursion Theorem to show the existence of various infinite sequences and sets. Our main result is that there is an increasing sequence e_0, e_1, e_2 .. such that W_{e_n}={e_{n+1}} for every n. Similarly, we prove that there exists an increasing sequence such that W_{e_n}={e_{n+1},e_{n+2},...} for every n. We call a nonempty computably enumerable set A self-constructing if W_e=A for every e in A. We show that every nonempty computable enumerable set which is disjoint from an infinite computable set is one-one equivalent to a self-constructing set

  15. Developmental sequence of Cambrian embryo Markuelia

    Institute of Scientific and Technical Information of China (English)

    DONG XiPing

    2007-01-01

    Based on more exquisitely preserved specimens of Markuelia hunanensis recently recovered from Middle and Upper Cambrian in western Hunan and in the light of Synchrotron radiation X-ray tomographic microscopy, the developmental sequence from cleavage through organogenesis to the pre-hatching of Cambrian embryo Markuelia, especially the developmental sequence during the pre-hatching stage, i.e. from the earliest period when the scalids and tail spines only took shape to the latest period (just about hatching), is established. This developmental sequence provides a pattern of embryonic development during the pre-hatching stage, which has not been established in the living scalidophorans (priapulids, Ioriciferans and kinorhynchs). Thus, it not only enriches our knowledge on the embryonic development of the extant descendants of Markuelia, but also opens a new window to the evolution and development of the animal.

  16. TWO ASPECTS OF A GENERALIZED FIBONACCI SEQUENCE

    Directory of Open Access Journals (Sweden)

    Johan Matheus Tuwankotta

    2015-05-01

    Full Text Available In this paper we study the so-called generalized Fibonacci sequence: $x_{n+2} = \\alpha x_{n+1} + \\beta x_n, n\\in \\mathbb{N}$.  We derive an open domain around the origin of the parameter space where the sequence converges to $0$.  The limiting behavior on the boundary of this domain are: convergence to a nontrivial limit, $k$-periodic ($k\\in \\mathbb{N}$, or quasi-periodic.  We use the ratio of two consecutive terms of the sequence to construct a rational approximation for algebraic numbers of the form: $\\sqrt{r}, r\\in \\mathbb{Q}$.  Using a similar idea, we extend this to higher dimension to construct a rational approximation for  $\\sqrt[3]{ a + b\\sqrt{c}} +  \\sqrt[3]{ a - b\\sqrt{c}} + d$.

  17. Limits of zeros of polynomial sequences

    CERN Document Server

    Zhu, Xinyun

    2007-01-01

    In the present paper we consider $F_k(x)=x^{k}-\\sum_{t=0}^{k-1}x^t,$ the characteristic polynomial of the $k$-th order Fibonacci sequence, the latter denoted $G(k,l).$ We determine the limits of the real roots of certain odd and even degree polynomials related to the derivatives and integrals of $F_k(x),$ that form infinite sequences of polynomials, of increasing degree. In particular, as $k \\to \\infty,$ the limiting values of the zeros are determined, for both odd and even cases. It is also shown, in both cases, that the convergence is monotone for sufficiently large degree. We give an upper bound for the modulus of the complex zeros of the polynomials for each sequence. This gives a general solution related to problems considered by Dubeau 1989, 1993, Miles 1960, Flores 1967, Miller 1971 and later by the second author in the present paper, and Narayan 1997.

  18. Sequence-controlled supramolecular terpolymerization directed by specific molecular recognitions

    National Research Council Canada - National Science Library

    Takehiro Hirao; Hiroaki Kudo; Tomoko Amimoto; Takeharu Haino

    2017-01-01

    Nature precisely manipulates primary monomer sequences in biopolymers. In synthetic polymer sequences, this precision has been limited because of the lack of polymerization techniques for conventional polymer synthesis...

  19. Cladistic analysis of anuran POMC sequences.

    Science.gov (United States)

    Alrubaian, Jasem; Danielson, Phillip; Walker, David; Dores, Robert M

    2002-03-01

    Procedures for performing cladistic analyses can provide powerful tools for understanding the evolution of neuropeptide and polypeptide hormone coding genes. These analyses can be done on either amino acid data sets or nucleotide data sets and can utilize several different algorithms that are dependent on distinct sets of operating assumptions and constraints. In some cases, the results of these analyses can be used to gauge phylogenetic relationships between taxa. Selecting the proper cladistic analysis strategy is dependent on the taxonomic level of analysis and the rate of evolution within the orthologous genes being evaluated. For example, previous studies have shown that the amino acid sequence of proopiomelanocortin (POMC), the common precursor for the melanocortins and beta-endorphin, can be used to resolve phylogenetic relationships at the class and order level. This study tested the hypothesis that POMC sequences could be used to resolve phylogenetic relationships at the family taxonomic level. Cladistic analyses were performed on amphibian POMC sequences characterized from the marine toad, Bufo marinus (family Bufonidae; this study), the spadefoot toad, Spea multiplicatus (family Pelobatidae), the African clawed frog, Xenopus laevis (family Pipidae) and the laughing frog, Rana ridibunda (family Ranidae). In these analyses the sequence of Australian lungfish POMC was used as the outgroup. The analyses were done at the amino acid level using the maximum parsimony algorithm and at the nucleotide level using the maximum likelihood algorithm. For the anuran POMC genes, analysis at the nucleotide level using the maximum likelihood algorithm generated a cladogram with higher bootstrap values than the maximum parsimony analysis of the POMC amino acid data set. For anuran POMC sequences, analysis of nucleotide sequences using the maximum likelihood algorithm would appear to be the preferred strategy for resolving phylogenetic relationships at the family taxonomic

  20. Dynamic programming algorithms for biological sequence comparison.

    Science.gov (United States)

    Pearson, W R; Miller, W

    1992-01-01

    Efficient dynamic programming algorithms are available for a broad class of protein and DNA sequence comparison problems. These algorithms require computer time proportional to the product of the lengths of the two sequences being compared [O(N2)] but require memory space proportional only to the sum of these lengths [O(N)]. Although the requirement for O(N2) time limits use of the algorithms to the largest computers when searching protein and DNA sequence databases, many other applications of these algorithms, such as calculation of distances for evolutionary trees and comparison of a new sequence to a library of sequence profiles, are well within the capabilities of desktop computers. In particular, the results of library searches with rapid searching programs, such as FASTA or BLAST, should be confirmed by performing a rigorous optimal alignment. Whereas rapid methods do not overlook significant sequence similarities, FASTA limits the number of gaps that can be inserted into an alignment, so that a rigorous alignment may extend the alignment substantially in some cases. BLAST does not allow gaps in the local regions that it reports; a calculation that allows gaps is very likely to extend the alignment substantially. Although a Monte Carlo evaluation of the statistical significance of a similarity score with a rigorous algorithm is much slower than the heuristic approach used by the RDF2 program, the dynamic programming approach should take less than 1 hr on a 386-based PC or desktop Unix workstation. For descriptive purposes, we have limited our discussion to methods for calculating similarity scores and distances that use gap penalties of the form g = rk. Nevertheless, programs for the more general case (g = q+rk) are readily available. Versions of these programs that run either on Unix workstations, IBM-PC class computers, or the Macintosh can be obtained from either of the authors.

  1. Sequence-Based Identification of Aerobic Actinomycetes

    Science.gov (United States)

    Patel, Jean Baldus; Wallace, Richard J.; Brown-Elliott, Barbara A.; Taylor, Tony; Imperatrice, Carol; Leonard, Deborah G. B.; Wilson, Rebecca W.; Mann, Linda; Jost, Kenneth C.; Nachamkin, Irving

    2004-01-01

    We investigated the utility of 500-bp 16S rRNA gene sequencing for identifying clinically significant species of aerobic actinomycetes. A total of 28 reference strains and 71 clinical isolates that included members of the genera Streptomyces, Gordonia, and Tsukamurella and 10 taxa of Nocardia were studied. Methods of nonsequencing analyses included growth and biochemical analysis, PCR-restriction enzyme analysis of the 439-bp Telenti fragment of the 65 hsp gene, susceptibility testing, and, for selected isolates, high-performance liquid chromatography. Many of the isolates were included in prior taxonomic studies. Sequencing of Nocardia species revealed that members of the group were generally most closely related to the American Type Culture Collection (ATCC) type strains. However, the sequences of Nocardia transvalensis, N. otitidiscaviarum, and N. nova isolates were highly variable; and it is likely that each of these species contains multiple species. We propose that these three species be designated complexes until they are more taxonomically defined. The sequences of several taxa did not match any recognized species. Among other aerobic actinomycetes, each group most closely resembled the associated reference strain, but with some divergence. The study demonstrates the ability of partial 16S rRNA gene sequencing to identify members of the aerobic actinomycetes, but the study also shows that a high degree of sequence divergence exists within many species and that many taxa within the Nocardia spp. are unnamed at present. A major unresolved issue is the type strain of N. asteroides, as the present one (ATCC 19247), chosen before the availability of molecular analysis, does not represent any of the common taxa associated with clinical nocardiosis. PMID:15184431

  2. Rates of Convergence of Recursively Defined Sequences

    DEFF Research Database (Denmark)

    Lambov, Branimir Zdravkov

    2005-01-01

    This paper gives a generalization of a result by Matiyasevich which gives explicit rates of convergence for monotone recursively defined sequences. The generalization is motivated by recent developments in fixed point theory and the search for applications of proof mining to the field. It relaxes...... the requirement for monotonicity to the form xn+1 ≤ (1+an)xn+bn where the parameter sequences have to be bounded in sum, and also provides means to treat computational errors. The paper also gives an example result, an application of proof mining to fixed point theory, that can be achieved by the means discussed...

  3. Sequence variability of Campylobacter temperate bacteriophages

    Directory of Open Access Journals (Sweden)

    Ng Lai-King

    2008-03-01

    Full Text Available Abstract Background Prophages integrated within the chromosomes of Campylobacter jejuni isolates have been demonstrated very recently. Prior work with Campylobacter temperate bacteriophages, as well as evidence from prophages in other enteric bacteria, suggests these prophages might have a role in the biology and virulence of the organism. However, very little is known about the genetic variability of Campylobacter prophages which, if present, could lead to differential phenotypes in isolates carrying the phages versus those that do not. As a first step in the characterization of C. jejuni prophages, we investigated the distribution of prophage DNA within a C. jejuni population assessed the DNA and protein sequence variability within a subset of the putative prophages found. Results Southern blotting of C. jejuni DNA using probes from genes within the three putative prophages of the C. jejuni sequenced strain RM 1221 demonstrated the presence of at least one prophage gene in a large proportion (27/35 of isolates tested. Of these, 15 were positive for 5 or more of the 7 Campylobacter Mu-like phage 1 (CMLP 1, also designated Campylobacter jejuni integrated element 1, or CJIE 1 genes tested. Twelve of these putative prophages were chosen for further analysis. DNA sequencing of a 9,000 to 11,000 nucleotide region of each prophage demonstrated a close homology with CMLP 1 in both gene order and nucleotide sequence. Structural and sequence variability, including short insertions, deletions, and allele replacements, were found within the prophage genomes, some of which would alter the protein products of the ORFs involved. No insertions of novel genes were detected within the sequenced regions. The 12 prophages and RM 1221 had a % G+C very similar to C. jejuni sequenced strains, as well as promoter regions characteristic of C. jejuni. None of the putative prophages were successfully induced and propagated, so it is not known if they were functional or

  4. Digital image sequence processing, compression, and analysis

    CERN Document Server

    Reed, Todd R

    2004-01-01

    IntroductionTodd R. ReedCONTENT-BASED IMAGE SEQUENCE REPRESENTATIONPedro M. Q. Aguiar, Radu S. Jasinschi, José M. F. Moura, andCharnchai PluempitiwiriyawejTHE COMPUTATION OF MOTIONChristoph Stiller, Sören Kammel, Jan Horn, and Thao DangMOTION ANALYSIS AND DISPLACEMENT ESTIMATION IN THE FREQUENCY DOMAINLuca Lucchese and Guido Maria CortelazzoQUALITY OF SERVICE ASSESSMENT IN NEW GENERATION WIRELESS VIDEO COMMUNICATIONSGaetano GiuntaERROR CONCEALMENT IN DIGITAL VIDEOFrancesco G.B. De NataleIMAGE SEQUENCE RESTORATION: A WIDER PERSPECTIVEAnil KokaramVIDEO SUMMARIZATIONCuneyt M. Taskiran and Edward

  5. Complete genome sequence of arracacha mottle virus.

    Science.gov (United States)

    Orílio, Anelise F; Lucinda, Natalia; Dusi, André N; Nagata, Tatsuya; Inoue-Nagata, Alice K

    2013-01-01

    Arracacha mottle virus (AMoV) is the only potyvirus reported to infect arracacha (Arracacia xanthorrhiza) in Brazil. Here, the complete genome sequence of an isolate of AMoV was determined to be 9,630 nucleotides in length, excluding the 3' poly-A tail, and encoding a polyprotein of 3,135 amino acids and a putative P3N-PIPO protein. Its genomic organization is typical of a member of the genus Potyvirus, containing all conserved motifs. Its full genome sequence shared 56.2 % nucleotide identity with sunflower chlorotic mottle virus and verbena virus Y, the most closely related viruses.

  6. MPS Editor - An Integrated Sequencing Environment

    Science.gov (United States)

    Streiffert, Barbara A.; O'Reilly, Taifun; Schrock, Mitchell; Catchen, Jaime

    2010-01-01

    In today's operations environment, the teams are smaller and need to be more efficient while still ensuring the safety and success of the mission. In addition, teams often begin working on a mission in its early development phases and continue on the team through actual operations. For these reasons the operations teams want to be presented with a software environment that integrates multiple needed software applications as well as providing them with context sensitive editing support for entering commands and sequences of commands. At Jet Propulsion Laboratory, the Multi-Mission Planning and Sequencing (MPS) Editor provided by the Multi-Mission Ground Systems and Services (MGSS) supports those operational needs.

  7. Characterizing C6+P2-graphic Sequences

    Institute of Scientific and Technical Information of China (English)

    HU Li-li

    2014-01-01

    For a given graph H, a graphic sequence π = (d1, d2, · · · , dn) is said to be potentially H-graphic if π has a realization containing H as a subgraph. In this paper, we characterize the potentially C6 +P2-graphic sequences where C6 +P2 denotes the graph obtained from C6 by adding two adjacent edges to the three pairwise nonadjacent vertices of C6. Moreover, we use the characterization to determine the value ofσ(C6+P2, n).

  8. Limits of zeros of polynomial sequences

    OpenAIRE

    Zhu, Xinyun; Grossman, George

    2007-01-01

    In the present paper we consider $F_k(x)=x^{k}-\\sum_{t=0}^{k-1}x^t,$ the characteristic polynomial of the $k$-th order Fibonacci sequence, the latter denoted $G(k,l).$ We determine the limits of the real roots of certain odd and even degree polynomials related to the derivatives and integrals of $F_k(x),$ that form infinite sequences of polynomials, of increasing degree. In particular, as $k \\to \\infty,$ the limiting values of the zeros are determined, for both odd and even cases. It is also ...

  9. Mathematical methods of analysis of biopolymer sequences

    CERN Document Server

    Gindikin, S G

    1992-01-01

    This collection contains papers by participants in the seminar on mathematical methods in molecular biology who worked for several years at the Laboratory of Molecular Biology and Bioorganic Chemistry (now the Institute of Physical and Chemical Problems in Biology) at Moscow State University. The seminar united mathematicians and biologists around the problems of biological sequences. The collection includes original results as well as expository material and spans a range of perspectives, from purely mathematical problems to algorithms and their computer realizations. For this reason, the book is of interest to mathematicians, statisticians, biologists, and computational scientists who work with biopolymer sequences.

  10. The computational linguistics of biological sequences

    Energy Technology Data Exchange (ETDEWEB)

    Searls, D. [Univ. of Pennsylvania, Philadelphia, PA (United States)

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Protein sequences are analogous in many respects, particularly their folding behavior. Proteins have a much richer variety of interactions, but in theory the same linguistic principles could come to bear in describing dependencies between distant residues that arise by virtue of three-dimensional structure. This tutorial will concentrate on nucleic acid sequences.

  11. Proteomics-grade de novo sequencing approach

    DEFF Research Database (Denmark)

    Savitski, Mikhail M; Nielsen, Michael L; Kjeldsen, Frank

    2005-01-01

    known proteins, complete de novo sequencing of their peptides is desired. The main problems of conventional sequencing based on tandem mass spectrometry are incomplete backbone fragmentation and the frequent overlap of fragment masses. In this work, the first proteomics-grade de novo approach...... is presented, where the above problems are alleviated by the use of complementary fragmentation techniques CAD and ECD. Implementation of a high-current, large-area dispenser cathode as a source of low-energy electrons provided efficient ECD of doubly charged peptides, the most abundant species (65...

  12. Bilateral maculopathy associated with Pierre Robin sequence.

    Science.gov (United States)

    Witmer, Matthew T; Vasan, Ryan; Levy, Richard; Davis, Jessica; Chan, R V Paul

    2012-08-01

    Pierre Robin sequence has been associated with a number of ocular complications, including myopia, strabismus, Möbius syndrome, nasolacrimal duct obstruction, glaucoma, cataract, microphthalmos, coloboma of choroid, and retinal detachment. We report a 10-day-old boy who presented with micrognathia, glossoptosis, and cleft palate as well as multiple congenital anomalies. Ophthalmic examination was notable for bilateral maculopathy, with focal areas of retinal and retinal pigment epithelial atrophy. The association of Pierre Robin sequence and maculopathy has been reported only twice previously.

  13. Queen Charlotte 2001 Earthquake Aftershock Sequence

    Science.gov (United States)

    Mulder, T.; Rogers, G. C.

    2012-12-01

    On Oct 12, 2001, an Mw=6.3 earthquake occurred off the Queen Charlotte Islands, BC. It was felt throughout Haida Gwaii (Queen Charlotte Islands) and the adjoining mainland. It generated a small tsunami recorded on Vancouver Island tide gauges. Moment tensor solutions show almost pure thrust faulting. There was a significant aftershock sequence associated with this event. Relocation of the catalogue aftershock sequence with respect to a key calibration event with various subsets of common stations show significant movement in the event locations. The aftershocks define an ~30 degree dipping fault plane.

  14. Variations on strongly lacunary quasi Cauchy sequences

    Science.gov (United States)

    Kaplan, Huseyin; Cakalli, Huseyin

    2016-08-01

    We introduce a new function space, namely the space of Nθ (p)-ward continuous functions, which turns out to be a closed subspace of the space of continuous functions for each positive integer p. Nθα(p ) -ward continuity is also introduced and investigated for any fixed 0 kr-1, kr], and θ = (kr) is a lacunary sequence, i.e. an increasing sequence of positive integers such that k0 ≠ 0, and hr: kr-kr-1 →∞.

  15. Self-Matching Properties of Beatty Sequences

    Directory of Open Access Journals (Sweden)

    Z. Masáková

    2007-01-01

    Full Text Available We study the selfmatching properties of Beatty sequences, in particular of the graph of the function  ⌊ jβ ⌋ against j for every quadratic unit βϵ (0,1. We show that translation in the argument by an element Gi of a generalized Fibonacci sequence almost always causes the translation of the value of the function by Gi=1. More precisely, for fixed i ϵ ℕ, we have ⌊β(j+Gi⌋ = ⌊βj⌋ + Gi=1, where j ϵ Ui. We determine the set Ui of mismatches and show that it has a low frequency, namely βi.

  16. A formula on linear complexity of highest coordinate sequences from maximal periodic sequences over Galois rings

    Institute of Scientific and Technical Information of China (English)

    HU Lei; SUN Nigang

    2006-01-01

    Using a polynomial expression of the highest coordinate map, we deduce an exact formula on the linear complexity of the highest coordinate sequence derived from a maximal periodic sequence over an arbitrary Galois ring of characteristic p2 , where p is a prime. This generalizes the known result of Udaya and Siddiqi for the case that the Galois ring is Z4.

  17. Sequence embedding for fast construction of guide trees for multiple sequence alignment

    LENUS (Irish Health Repository)

    Blackshields, Gordon

    2010-05-14

    Abstract Background The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http:\\/\\/www.clustal.org\\/mbed.tgz.

  18. Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

    Directory of Open Access Journals (Sweden)

    Miri eMichaeli

    2012-12-01

    Full Text Available High throughput sequencing (HTS yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner, a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier, a program for identifying legitimate and artifact insertions and/or deletions (indels. Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.

  19. From Sequence to Morphology - Long-Range Correlations in Complete Sequenced Genomes

    NARCIS (Netherlands)

    T.A. Knoch (Tobias)

    2004-01-01

    textabstractThe largely unresolved sequential organization, i.e. the relations within DNA sequences, and its connection to the three-dimensional organization of genomes was investigated by correlation analyses of completely sequenced chromosomes from Viroids, Archaea, Bacteria, Arabidopsis thali

  20. Microbiome profiling by illumina sequencing of combinatorial sequence-tagged PCR products

    NARCIS (Netherlands)

    G.B. Gloor (Gregory); R.B.S. Hummelen (Ruben); J.M. Macklaim (Jean); R.J. Dickson (Russell); A.D. Fernandes (Andrew); R.A. MacPhee (Roderick); G. Reid (Gregor)

    2010-01-01

    textabstractWe developed a low-cost, high-throughput microbiome profiling method that uses combinatorial sequence tags attached to PCR primers that amplify the rRNA V6 region. Amplified PCR products are sequenced using an Illumina paired-end protocol to generate millions of overlapping reads.