WorldWideScience

Sample records for short-sequence dna repeats

  1. Repeated DNA sequences in fungi

    Energy Technology Data Exchange (ETDEWEB)

    Dutta, S K

    1974-11-01

    Several fungal species, representatives of all broad groups like basidiomycetes, ascomycetes and phycomycetes, were examined for the nature of repeated DNA sequences by DNA:DNA reassociation studies using hydroxyapatite chromatography. All of the fungal species tested contained 10 to 20 percent repeated DNA sequences. There are approximately 100 to 110 copies of repeated DNA sequences of approximately 4 x 10/sup 7/ daltons piece size of each. Repeated DNA sequence homoduplexes showed on average 5/sup 0/C difference of T/sub e/50 (temperature at which 50 percent duplexes dissociate) values from the corresponding homoduplexes of unfractionated whole DNA. It is suggested that a part of repetitive sequences in fungi constitutes mitochondrial DNA and a part of it constitutes nuclear DNA. (auth)

  2. C-terminal low-complexity sequence repeats of Mycobacterium smegmatis Ku modulate DNA binding.

    Science.gov (United States)

    Kushwaha, Ambuj K; Grove, Anne

    2013-01-24

    Ku protein is an integral component of the NHEJ (non-homologous end-joining) pathway of DSB (double-strand break) repair. Both eukaryotic and prokaryotic Ku homologues have been characterized and shown to bind DNA ends. A unique feature of Mycobacterium smegmatis Ku is its basic C-terminal tail that contains several lysine-rich low-complexity PAKKA repeats that are absent from homologues encoded by obligate parasitic mycobacteria. Such PAKKA repeats are also characteristic of mycobacterial Hlp (histone-like protein) for which they have been shown to confer the ability to appose DNA ends. Unexpectedly, removal of the lysine-rich extension enhances DNA-binding affinity, but an interaction between DNA and the PAKKA repeats is indicated by the observation that only full-length Ku forms multiple complexes with a short stem-loop-containing DNA previously designed to accommodate only one Ku dimer. The C-terminal extension promotes DNA end-joining by T4 DNA ligase, suggesting that the PAKKA repeats also contribute to efficient end-joining. We suggest that low-complexity lysine-rich sequences have evolved repeatedly to modulate the function of unrelated DNA-binding proteins.

  3. DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats.

    Science.gov (United States)

    de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

    2015-11-16

    Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Genus-specific protein binding to the large clusters of DNA repeats (short regularly spaced repeats) present in Sulfolobus genomes

    DEFF Research Database (Denmark)

    Peng, Xu; Brügger, Kim; Shen, Biao

    2003-01-01

    terminally modified and corresponds to SSO454, an open reading frame of previously unassigned function. It binds specifically to DNA fragments carrying double and single repeat sequences, binding on one side of the repeat structure, and producing an opening of the opposite side of the DNA structure. It also...... recognizes both main families of repeat sequences in S. solfataricus. The recombinant protein, expressed in Escherichia coli, showed the same binding properties to the SRSR repeat as the native one. The SSO454 protein exhibits a tripartite internal repeat structure which yields a good sequence match...... with a helix-turn-helix DNA-binding motif. Although this putative motif is shared by other archaeal proteins, orthologs of SSO454 were only detected in species within the Sulfolobus genus and in the closely related Acidianus genus. We infer that the genus-specific protein induces an opening of the structure...

  5. In situ detection of tandem DNA repeat length

    Energy Technology Data Exchange (ETDEWEB)

    Yaar, R.; Szafranski, P.; Cantor, C.R.; Smith, C.L. [Boston Univ., MA (United States)

    1996-11-01

    A simple method for scoring short tandem DNA repeats is presented. An oligonucleotide target, containing tandem repeats embedded in a unique sequence, was hybridized to a set of complementary probes, containing tandem repeats of known lengths. Single-stranded loop structures formed on duplexes containing a mismatched (different) number of tandem repeats. No loop structure formed on duplexes containing a matched (identical) number of tandem repeats. The matched and mismatched loop structures were enzymatically distinguished and differentially labeled by treatment with S1 nuclease and the Klenow fragment of DNA polymerase. 7 refs., 4 figs.

  6. Roles of genes and Alu repeats in nonlinear correlations of HUMHBB DNA sequence

    International Nuclear Information System (INIS)

    Xiao Yi; Huang Yanzhao

    2004-01-01

    DNA sequences of different species and different portion of the DNA of the same species may have completely different correlation properties, but the origin of these correlations is still not very clear and is currently being investigated, especially in different particular cases. We report here a study of the DNA sequence of human beta globin region (HUMHBB) which has strong linear and nonlinear correlations. We studied the roles of two of the typical elements of DNA sequence, genes and Alu repeats, in the nonlinear correlations of HUMHBB. We find that there exist strong nonlinear correlations between the exons or introns in different genes and between the Alu repeats. They may be one of the major sources of the nonlinear correlations in HUMBHB

  7. SeqEntropy: genome-wide assessment of repeats for short read sequencing.

    Directory of Open Access Journals (Sweden)

    Hsueh-Ting Chu

    Full Text Available BACKGROUND: Recent studies on genome assembly from short-read sequencing data reported the limitation of this technology to reconstruct the entire genome even at very high depth coverage. We investigated the limitation from the perspective of information theory to evaluate the effect of repeats on short-read genome assembly using idealized (error-free reads at different lengths. METHODOLOGY/PRINCIPAL FINDINGS: We define a metric H(k to be the entropy of sequencing reads at a read length k and use the relative loss of entropy ΔH(k to measure the impact of repeats for the reconstruction of whole-genome from sequences of length k. In our experiments, we found that entropy loss correlates well with de-novo assembly coverage of a genome, and a score of ΔH(k>1% indicates a severe loss in genome reconstruction fidelity. The minimal read lengths to achieve ΔH(k<1% are different for various organisms and are independent of the genome size. For example, in order to meet the threshold of ΔH(k<1%, a read length of 60 bp is needed for the sequencing of human genome (3.2 10(9 bp and 320 bp for the sequencing of fruit fly (1.8×10(8 bp. We also calculated the ΔH(k scores for 2725 prokaryotic chromosomes and plasmids at several read lengths. Our results indicate that the levels of repeats in different genomes are diverse and the entropy of sequencing reads provides a measurement for the repeat structures. CONCLUSIONS/SIGNIFICANCE: The proposed entropy-based measurement, which can be calculated in seconds to minutes in most cases, provides a rapid quantitative evaluation on the limitation of idealized short-read genome sequencing. Moreover, the calculation can be parallelized to scale up to large euakryotic genomes. This approach may be useful to tune the sequencing parameters to achieve better genome assemblies when a closely related genome is already available.

  8. Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats.

    Science.gov (United States)

    Fungtammasan, Arkarachai; Tomaszkiewicz, Marta; Campos-Sánchez, Rebeca; Eckert, Kristin A; DeGiorgio, Michael; Makova, Kateryna D

    2016-10-01

    Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA-DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  9. Accurate typing of short tandem repeats from genome-wide sequencing data and its applications.

    Science.gov (United States)

    Fungtammasan, Arkarachai; Ananda, Guruprasad; Hile, Suzanne E; Su, Marcia Shu-Wei; Sun, Chen; Harris, Robert; Medvedev, Paul; Eckert, Kristin; Makova, Kateryna D

    2015-05-01

    Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution. © 2015 Fungtammasan et al.; Published by Cold Spring Harbor Laboratory Press.

  10. Tandemly repeated sequence in 5'end of mtDNA control region of ...

    African Journals Online (AJOL)

    Extensive length variability was observed in 5' end sequence of the mitochondrial DNA control region of the Japanese Spanish mackerel (Scomberomorus niphonius). This length variability was due to the presence of varying numbers of a 56-bp tandemly repeated sequence and a 46-bp insertion/deletion (indel).

  11. Detection of short repeated genomic sequences on metaphase chromosomes using padlock probes and target primed rolling circle DNA synthesis

    Directory of Open Access Journals (Sweden)

    Stougaard Magnus

    2007-11-01

    Full Text Available Abstract Background In situ detection of short sequence elements in genomic DNA requires short probes with high molecular resolution and powerful specific signal amplification. Padlock probes can differentiate single base variations. Ligated padlock probes can be amplified in situ by rolling circle DNA synthesis and detected by fluorescence microscopy, thus enhancing PRINS type reactions, where localized DNA synthesis reports on the position of hybridization targets, to potentially reveal the binding of single oligonucleotide-size probe molecules. Such a system has been presented for the detection of mitochondrial DNA in fixed cells, whereas attempts to apply rolling circle detection to metaphase chromosomes have previously failed, according to the literature. Methods Synchronized cultured cells were fixed with methanol/acetic acid to prepare chromosome spreads in teflon-coated diagnostic well-slides. Apart from the slide format and the chromosome spreading everything was done essentially according to standard protocols. Hybridization targets were detected in situ with padlock probes, which were ligated and amplified using target primed rolling circle DNA synthesis, and detected by fluorescence labeling. Results An optimized protocol for the spreading of condensed metaphase chromosomes in teflon-coated diagnostic well-slides was developed. Applying this protocol we generated specimens for target primed rolling circle DNA synthesis of padlock probes recognizing a 40 nucleotide sequence in the male specific repetitive satellite I sequence (DYZ1 on the Y-chromosome and a 32 nucleotide sequence in the repetitive kringle IV domain in the apolipoprotein(a gene positioned on the long arm of chromosome 6. These targets were detected with good efficiency, but the efficiency on other target sites was unsatisfactory. Conclusion Our aim was to test the applicability of the method used on mitochondrial DNA to the analysis of nuclear genomes, in particular as

  12. Clustered regularly interspaced short palindromic repeats (CRISPRs): the hallmark of an ingenious antiviral defense mechanism in prokaryotes

    NARCIS (Netherlands)

    Al-Attar, S.; Westra, E.R.; Oost, van der J.; Brouns, S.J.J.

    2011-01-01

    Many prokaryotes contain the recently discovered defense system against mobile genetic elements. This defense system contains a unique type of repetitive DNA stretches, termed Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs). CRISPRs consist of identical repeated DNA sequences

  13. Filipino DNA variation at 12 X-chromosome short tandem repeat markers.

    Science.gov (United States)

    Salvador, Jazelyn M; Apaga, Dame Loveliness T; Delfin, Frederick C; Calacal, Gayvelline C; Dennis, Sheila Estacio; De Ungria, Maria Corazon A

    2018-06-08

    Demands for solving complex kinship scenarios where only distant relatives are available for testing have risen in the past years. In these instances, other genetic markers such as X-chromosome short tandem repeat (X-STR) markers are employed to supplement autosomal and Y-chromosomal STR DNA typing. However, prior to use, the degree of STR polymorphism in the population requires evaluation through generation of an allele or haplotype frequency population database. This population database is also used for statistical evaluation of DNA typing results. Here, we report X-STR data from 143 unrelated Filipino male individuals who were genotyped via conventional polymerase chain reaction-capillary electrophoresis (PCR-CE) using the 12 X-STR loci included in the Investigator ® Argus X-12 kit (Qiagen) and via massively parallel sequencing (MPS) of seven X-STR loci included in the ForenSeq ™ DNA Signature Prep kit of the MiSeq ® FGx ™ Forensic Genomics System (Illumina). Allele calls between PCR-CE and MPS systems were consistent (100% concordance) across seven overlapping X-STRs. Allele and haplotype frequencies and other parameters of forensic interest were calculated based on length (PCR-CE, 12 X-STRs) and sequence (MPS, seven X-STRs) variations observed in the population. Results of our study indicate that the 12 X-STRs in the PCR-CE system are highly informative for the Filipino population. MPS of seven X-STR loci identified 73 X-STR alleles compared with 55 X-STR alleles that were identified solely by length via PCR-CE. Of the 73 sequence-based alleles observed, six alleles have not been reported in the literature. The population data presented here may serve as a reference Philippine frequency database of X-STRs for forensic casework applications. Copyright © 2018 Elsevier B.V. All rights reserved.

  14. Genome dynamics of short oligonucleotides: the example of bacterial DNA uptake enhancing sequences.

    Directory of Open Access Journals (Sweden)

    Mohammed Bakkali

    Full Text Available Among the many bacteria naturally competent for transformation by DNA uptake-a phenomenon with significant clinical and financial implications- Pasteurellaceae and Neisseriaceae species preferentially take up DNA containing specific short sequences. The genomic overrepresentation of these DNA uptake enhancing sequences (DUES causes preferential uptake of conspecific DNA, but the function(s behind this overrepresentation and its evolution are still a matter for discovery. Here I analyze DUES genome dynamics and evolution and test the validity of the results to other selectively constrained oligonucleotides. I use statistical methods and computer simulations to examine DUESs accumulation in Haemophilus influenzae and Neisseria gonorrhoeae genomes. I analyze DUESs sequence and nucleotide frequencies, as well as those of all their mismatched forms, and prove the dependence of DUESs genomic overrepresentation on their preferential uptake by quantifying and correlating both characteristics. I then argue that mutation, uptake bias, and weak selection against DUESs in less constrained parts of the genome combined are sufficient enough to cause DUESs accumulation in susceptible parts of the genome with no need for other DUES function. The distribution of overrepresentation values across sequences with different mismatch loads compared to the DUES suggests a gradual yet not linear molecular drive of DNA sequences depending on their similarity to the DUES. Other genomically overrepresented sequences, both pro- and eukaryotic, show similar distribution of frequencies suggesting that the molecular drive reported above applies to other frequent oligonucleotides. Rare oligonucleotides, however, seem to be gradually drawn to genomic underrepresentation, thus, suggesting a molecular drag. To my knowledge this work provides the first clear evidence of the gradual evolution of selectively constrained oligonucleotides, including repeated, palindromic and protein

  15. Genomic organization and developmental fate of adjacent repeated sequences in a foldback DNA clone of Tetrahymena thermophila

    International Nuclear Information System (INIS)

    Tschunko, A.H.; Loechel, R.H.; McLaren, N.C.; Allen, S.L.

    1987-01-01

    DNA sequence elimination and rearrangement occurs during the development of somatic cell lineages of eukaryotes and was first discovered over a century ago. However, the significance and mechanism of chromatin elimination are not understood. DNA elimination also occurs during the development of the somatic macronucleus from the germinal micronucleus in unicellular ciliated protozoa such as Tetrahymena thermophila. In this study foldback DNA from the micronucleus was used as a probe to isolate ten clones. All of those tested (4/4) contained sequences that were repetitive in the micronucleus and rearranged in the macronucleus. Inverted repeated sequences were present in one clone. This clone, pTtFBl, was subjected to a detailed analysis of its developmental fate. Subregions were subcloned and used as probes against Southern blots of micronuclear and macronuclear DNA. DNA was labeled with [ 33 P]-labeled dATP. The authors found that all subregions defined repeated sequence families in the micronuclear genome. A minimum of four different families was defined, two of which are retained in the macronucleus and two of which are completely eliminated. The inverted repeat family is retained with little rearrangement. Two of the families, defined by subregions that do not contain parts of the inverted repeat are totally eliminated during macronuclear development-and contain open reading frames. The significance of retained inverted repeats to the process of elimination is discussed

  16. Cytogenetic Analysis of Populus trichocarpa - Ribosomal DNA, Telomere Repeat Sequence, and Marker-selected BACs

    Science.gov (United States)

    M.N. lslam-Faridi; C.D. Nelson; S.P. DiFazio; L.E. Gunter; G.A. Tuskan

    2009-01-01

    The 185-285 rDNA and 55 rDNA loci in Populus trichocarpa were localized using fluorescent in situ hybridization (FISH). Two 185-285 rDNA sites and one 55 rDNA site were identified and located at the ends of 3 different chromosomes. FISH signals from the Arabidopsis-type telomere repeat sequence were observed at the distal ends of each chromosome. Six BAC clones...

  17. Characterization of Erwinia amylovora strains from different host plants using repetitive-sequences PCR analysis, and restriction fragment length polymorphism and short-sequence DNA repeats of plasmid pEA29.

    Science.gov (United States)

    Barionovi, D; Giorgi, S; Stoeger, A R; Ruppitsch, W; Scortichini, M

    2006-05-01

    The three main aims of the study were the assessment of the genetic relationship between a deviating Erwinia amylovora strain isolated from Amelanchier sp. (Maloideae) grown in Canada and other strains from Maloideae and Rosoideae, the investigation of the variability of the PstI fragment of the pEA29 plasmid using restriction fragment length polymorphism (RFLP) analysis and the determination of the number of short-sequence DNA repeats (SSR) by DNA sequence analysis in representative strains. Ninety-three strains obtained from 12 plant genera and different geographical locations were examined by repetitive-sequences PCR using Enterobacterial Repetitive Intergenic Consensus, BOX and Repetitive Extragenic Palindromic primer sets. Upon the unweighted pair group method with arithmetic mean analysis, a deviating strain from Amelanchier sp. was analysed using amplified ribosomal DNA restriction analysis (ARDRA) analysis and the sequencing of the 16S rDNA gene. This strain showed 99% similarity to other E. amylovora strains in the 16S gene and the same banding pattern with ARDRA. The RFLP analysis of pEA29 plasmid using MspI and Sau3A restriction enzymes showed a higher variability than that previously observed and no clear-cut grouping of the strains was possible. The number of SSR units reiterated two to 12 times. The strains obtained from pear orchards showing for the first time symptoms of fire blight had a low number of SSR units. The strains from Maloideae exhibit a wider genetic variability than previously thought. The RFLP analysis of a fragment of the pEA29 plasmid would not seem a reliable method for typing E. amylovora strains. A low number of SSR units was observed with first epidemics of fire blight. The current detection techniques are mainly based on the genetic similarities observed within the strains from the cultivated tree-fruit crops. For a more reliable detection of the fire blight pathogen also in wild and ornamentals Rosaceous plants the genetic

  18. Tools for analyzing genetic variants from sequencing data Case study: short tandem repeats

    OpenAIRE

    Gymrek, Melissa

    2016-01-01

    This was presented as a BitesizeBio Webinar entitled "Tools for analyzing genetic variants from sequencing data Case study: short tandem repeats"Accompanying scripts can be accessed on github:https://github.com/mgymrek/mgymrek-bitesizebio-webinar 

  19. Optimization of sequence alignment for simple sequence repeat regions

    Directory of Open Access Journals (Sweden)

    Ogbonnaya Francis C

    2011-07-01

    Full Text Available Abstract Background Microsatellites, or simple sequence repeats (SSRs, are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs. SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. Findings To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type. When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. Conclusions The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic

  20. TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads.

    Science.gov (United States)

    Novák, Petr; Ávila Robledillo, Laura; Koblížková, Andrea; Vrbová, Iva; Neumann, Pavel; Macas, Jirí

    2017-07-07

    Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence

    NARCIS (Netherlands)

    Semenova, E.V.; Jore, M.M.; Westra, E.R.; Oost, van der J.; Brouns, S.J.J.

    2011-01-01

    Prokaryotic clustered regularly interspaced short palindromic repeat (CRISPR)/Cas (CRISPR-associated sequences) systems provide adaptive immunity against viruses when a spacer sequence of small CRISPR RNA (crRNA) matches a protospacer sequence in the viral genome. Viruses that escape CRISPR/Cas

  2. Structural basis for sequence-specific recognition of DNA by TAL effectors

    KAUST Repository

    Deng, Dong

    2012-01-05

    TAL (transcription activator-like) effectors, secreted by phytopathogenic bacteria, recognize host DNA sequences through a central domain of tandem repeats. Each repeat comprises 33 to 35 conserved amino acids and targets a specific base pair by using two hypervariable residues [known as repeat variable diresidues (RVDs)] at positions 12 and 13. Here, we report the crystal structures of an 11.5-repeat TAL effector in both DNA-free and DNA-bound states. Each TAL repeat comprises two helices connected by a short RVD-containing loop. The 11.5 repeats form a right-handed, superhelical structure that tracks along the sense strand of DNA duplex, with RVDs contacting the major groove. The 12th residue stabilizes the RVD loop, whereas the 13th residue makes a base-specific contact. Understanding DNA recognition by TAL effectors may facilitate rational design of DNA-binding proteins with biotechnological applications.

  3. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    Science.gov (United States)

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  4. Noninvasive prenatal paternity testing (NIPAT) through maternal plasma DNA sequencing

    DEFF Research Database (Denmark)

    Jiang, Haojun; Xie, Yifan; Li, Xuchao

    2016-01-01

    developed a noninvasive prenatal paternity testing (NIPAT) based on SNP typing with maternal plasma DNA sequencing. We evaluated the influence factors (minor allele frequency (MAF), the number of total SNP, fetal fraction and effective sequencing depth) and designed three different selective SNP panels......Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) have been already used to perform noninvasive prenatal paternity testing from maternal plasma DNA. The frequently used technologies were PCR followed by capillary electrophoresis and SNP typing array, respectively. Here, we...... paternity test using STR multiplex system. Our study here proved that the maternal plasma DNA sequencing-based technology is feasible and accurate in determining paternity, which may provide an alternative in forensic application in the future....

  5. Use of short tandem repeat sequences to study Mycobacterium leprae in leprosy patients in Malawi and India.

    Directory of Open Access Journals (Sweden)

    Saroj K Young

    2008-04-01

    Full Text Available Inadequate understanding of the transmission of Mycobacterium leprae makes it difficult to predict the impact of leprosy control interventions. Genotypic tests that allow tracking of individual bacterial strains would strengthen epidemiological studies and contribute to our understanding of the disease.Genotyping assays based on variation in the copy number of short tandem repeat sequences were applied to biopsies collected in population-based epidemiological studies of leprosy in northern Malawi, and from members of multi-case households in Hyderabad, India. In the Malawi series, considerable genotypic variability was observed between patients, and also within patients, when isolates were collected at different times or from different tissues. Less within-patient variability was observed when isolates were collected from similar tissues at the same time. Less genotypic variability was noted amongst the closely related Indian patients than in the Malawi series.Lineages of M. leprae undergo changes in their pattern of short tandem repeat sequences over time. Genetic divergence is particularly likely between bacilli inhabiting different (e.g., skin and nerve tissues. Such variability makes short tandem repeat sequences unsuitable as a general tool for population-based strain typing of M. leprae, or for distinguishing relapse from reinfection. Careful use of these markers may provide insights into the development of disease within individuals and for tracking of short transmission chains.

  6. Rapid Multiplex Small DNA Sequencing on the MinION Nanopore Sequencing Platform

    Directory of Open Access Journals (Sweden)

    Shan Wei

    2018-05-01

    Full Text Available Real-time sequencing of short DNA reads has a wide variety of clinical and research applications including screening for mutations, target sequences and aneuploidy. We recently demonstrated that MinION, a nanopore-based DNA sequencing device the size of a USB drive, could be used for short-read DNA sequencing. In this study, an ultra-rapid multiplex library preparation and sequencing method for the MinION is presented and applied to accurately test normal diploid and aneuploidy samples’ genomic DNA in under three hours, including library preparation and sequencing. This novel method shows great promise as a clinical diagnostic test for applications requiring rapid short-read DNA sequencing.

  7. DNA Fingerprint Analysis of Three Short Tandem Repeat (STR) Loci for Biochemistry and Forensic Science Laboratory Courses

    Science.gov (United States)

    McNamara-Schroeder, Kathleen; Olonan, Cheryl; Chu, Simon; Montoya, Maria C.; Alviri, Mahta; Ginty, Shannon; Love, John J.

    2006-01-01

    We have devised and implemented a DNA fingerprinting module for an upper division undergraduate laboratory based on the amplification and analysis of three of the 13 short tandem repeat loci that are required by the Federal Bureau of Investigation Combined DNA Index System (FBI CODIS) data base. Students first collect human epithelial (cheek)…

  8. Clustered regularly interspaced short palindromic repeats (CRISPRs): the hallmark of an ingenious antiviral defense mechanism in prokaryotes.

    Science.gov (United States)

    Al-Attar, Sinan; Westra, Edze R; van der Oost, John; Brouns, Stan J J

    2011-04-01

    Many prokaryotes contain the recently discovered defense system against mobile genetic elements. This defense system contains a unique type of repetitive DNA stretches, termed Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs). CRISPRs consist of identical repeated DNA sequences (repeats), interspaced by highly variable sequences referred to as spacers. The spacers originate from either phages or plasmids and comprise the prokaryotes' 'immunological memory'. CRISPR-associated (cas) genes encode conserved proteins that together with CRISPRs make-up the CRISPR/Cas system, responsible for defending the prokaryotic cell against invaders. CRISPR-mediated resistance has been proposed to involve three stages: (i) CRISPR-Adaptation, the invader DNA is encountered by the CRISPR/Cas machinery and an invader-derived short DNA fragment is incorporated in the CRISPR array. (ii) CRISPR-Expression, the CRISPR array is transcribed and the transcript is processed by Cas proteins. (iii) CRISPR-Interference, the invaders' nucleic acid is recognized by complementarity to the crRNA and neutralized. An application of the CRISPR/Cas system is the immunization of industry-relevant prokaryotes (or eukaryotes) against mobile-genetic invasion. In addition, the high variability of the CRISPR spacer content can be exploited for phylogenetic and evolutionary studies. Despite impressive progress during the last couple of years, the elucidation of several fundamental details will be a major challenge in future research.

  9. Comparison of the degree of homology of DNA and quantity of repeated sequences in an intact plant and cell structure

    International Nuclear Information System (INIS)

    Solov'yan, V.T.; Kunaleh, V.A.; Shumnyl, V.K.; Vershinin, A.V.

    1986-01-01

    This paper attempts to assess the quantity of repeated sequences and degree of homology of DNA in the intact plant and two lines of callus tissue of Rauwolfia serpentina Benth maintained for 20 years, which differ among themselves in the level of biosynthesis of the pharmacologically valuable alkaloid ajmaline. The tritium-labeled repeats of plants and calli were used in direct and reverse hybridization on nitrocellulose filters. Hybridization of H 3-labeled repeats with phage 17 DNA was used as control. The radioactivity of filters after washing was measured in a liquid scintillation counter

  10. A novel rat genomic simple repeat DNA with RNA-homology shows triplex (H-DNA)-like structure and tissue-specific RNA expression

    International Nuclear Information System (INIS)

    Dey, Indranil; Rath, Pramod C.

    2005-01-01

    Mammalian genome contains a wide variety of repetitive DNA sequences of relatively unknown function. We report a novel 227 bp simple repeat DNA (3.3 DNA) with a d {(GA) 7 A (AG) 7 } dinucleotide mirror repeat from the rat (Rattus norvegicus) genome. 3.3 DNA showed 75-85% homology with several eukaryotic mRNAs due to (GA/CU) n dinucleotide repeats by nBlast search and a dispersed distribution in the rat genome by Southern blot hybridization with [ 32 P]3.3 DNA. The d {(GA) 7 A (AG) 7 } mirror repeat formed a triplex (H-DNA)-like structure in vitro. Two large RNAs of 9.1 and 7.5 kb were detected by [ 32 P]3.3 DNA in rat brain by Northern blot hybridization indicating expression of such simple sequence repeats at RNA level in vivo. Further, several cDNAs were isolated from a rat cDNA library by [ 32 P]3.3 DNA probe. Three such cDNAs showed tissue-specific RNA expression in rat. pRT 4.1 cDNA showed strong expression of a 2.39 kb RNA in brain and spleen, pRT 5.5 cDNA showed strong expression of a 2.8 kb RNA in brain and a 3.9 kb RNA in lungs, and pRT 11.4 cDNA showed weak expression of a 2.4 kb RNA in lungs. Thus, genomic simple sequence repeats containing d (GA/CT) n dinucleotides are transcriptionally expressed and regulated in rat tissues. Such d (GA/CT) n dinucleotide repeats may form structural elements (e.g., triplex) which may be sites for functional regulation of genomic coding sequences as well as RNAs. This may be a general function of such transcriptionally active simple sequence repeats widely dispersed in mammalian genome

  11. Repeat-aware modeling and correction of short read errors.

    Science.gov (United States)

    Yang, Xiao; Aluru, Srinivas; Dorman, Karin S

    2011-02-15

    High-throughput short read sequencing is revolutionizing genomics and systems biology research by enabling cost-effective deep coverage sequencing of genomes and transcriptomes. Error detection and correction are crucial to many short read sequencing applications including de novo genome sequencing, genome resequencing, and digital gene expression analysis. Short read error detection is typically carried out by counting the observed frequencies of kmers in reads and validating those with frequencies exceeding a threshold. In case of genomes with high repeat content, an erroneous kmer may be frequently observed if it has few nucleotide differences with valid kmers with multiple occurrences in the genome. Error detection and correction were mostly applied to genomes with low repeat content and this remains a challenging problem for genomes with high repeat content. We develop a statistical model and a computational method for error detection and correction in the presence of genomic repeats. We propose a method to infer genomic frequencies of kmers from their observed frequencies by analyzing the misread relationships among observed kmers. We also propose a method to estimate the threshold useful for validating kmers whose estimated genomic frequency exceeds the threshold. We demonstrate that superior error detection is achieved using these methods. Furthermore, we break away from the common assumption of uniformly distributed errors within a read, and provide a framework to model position-dependent error occurrence frequencies common to many short read platforms. Lastly, we achieve better error correction in genomes with high repeat content. The software is implemented in C++ and is freely available under GNU GPL3 license and Boost Software V1.0 license at "http://aluru-sun.ece.iastate.edu/doku.php?id = redeem". We introduce a statistical framework to model sequencing errors in next-generation reads, which led to promising results in detecting and correcting errors

  12. DNA triplet repeats mediate heterochromatin-protein-1-sensitive variegated gene silencing.

    Science.gov (United States)

    Saveliev, Alexander; Everett, Christopher; Sharpe, Tammy; Webster, Zoë; Festenstein, Richard

    2003-04-24

    Gene repression is crucial to the maintenance of differentiated cell types in multicellular organisms, whereas aberrant silencing can lead to disease. The organization of DNA into chromatin and heterochromatin is implicated in gene silencing. In chromatin, DNA wraps around histones, creating nucleosomes. Further condensation of chromatin, associated with large blocks of repetitive DNA sequences, is known as heterochromatin. Position effect variegation (PEV) occurs when a gene is located abnormally close to heterochromatin, silencing the affected gene in a proportion of cells. Here we show that the relatively short triplet-repeat expansions found in myotonic dystrophy and Friedreich's ataxia confer variegation of expression on a linked transgene in mice. Silencing was correlated with a decrease in promoter accessibility and was enhanced by the classical PEV modifier heterochromatin protein 1 (HP1). Notably, triplet-repeat-associated variegation was not restricted to classical heterochromatic regions but occurred irrespective of chromosomal location. Because the phenomenon described here shares important features with PEV, the mechanisms underlying heterochromatin-mediated silencing might have a role in gene regulation at many sites throughout the mammalian genome and modulate the extent of gene silencing and hence severity in several triplet-repeat diseases.

  13. Long span DNA paired-end-tag (DNA-PET sequencing strategy for the interrogation of genomic structural mutations and fusion-point-guided reconstruction of amplicons.

    Directory of Open Access Journals (Sweden)

    Fei Yao

    Full Text Available Structural variations (SVs contribute significantly to the variability of the human genome and extensive genomic rearrangements are a hallmark of cancer. While genomic DNA paired-end-tag (DNA-PET sequencing is an attractive approach to identify genomic SVs, the current application of PET sequencing with short insert size DNA can be insufficient for the comprehensive mapping of SVs in low complexity and repeat-rich genomic regions. We employed a recently developed procedure to generate PET sequencing data using large DNA inserts of 10-20 kb and compared their characteristics with short insert (1 kb libraries for their ability to identify SVs. Our results suggest that although short insert libraries bear an advantage in identifying small deletions, they do not provide significantly better breakpoint resolution. In contrast, large inserts are superior to short inserts in providing higher physical genome coverage for the same sequencing cost and achieve greater sensitivity, in practice, for the identification of several classes of SVs, such as copy number neutral and complex events. Furthermore, our results confirm that large insert libraries allow for the identification of SVs within repetitive sequences, which cannot be spanned by short inserts. This provides a key advantage in studying rearrangements in cancer, and we show how it can be used in a fusion-point-guided-concatenation algorithm to study focally amplified regions in cancer.

  14. Comparative effectiveness of inter-simple sequence repeat and ...

    African Journals Online (AJOL)

    A study to compare the effectiveness of inter-simple sequence repeats (ISSR) and randomly amplified polymorphic DNA (RAPD) profiling was carried out with a total of 65 DNA samples using 12 species of Indian Garcinia. ISSR and RAPD profiling were performed with 19 and 12 primers, respectively. ISSR markers ...

  15. APE1 incision activity at abasic sites in tandem repeat sequences.

    Science.gov (United States)

    Li, Mengxia; Völker, Jens; Breslauer, Kenneth J; Wilson, David M

    2014-05-29

    Repetitive DNA sequences, such as those present in microsatellites and minisatellites, telomeres, and trinucleotide repeats (linked to fragile X syndrome, Huntington disease, etc.), account for nearly 30% of the human genome. These domains exhibit enhanced susceptibility to oxidative attack to yield base modifications, strand breaks, and abasic sites; have a propensity to adopt non-canonical DNA forms modulated by the positions of the lesions; and, when not properly processed, can contribute to genome instability that underlies aging and disease development. Knowledge on the repair efficiencies of DNA damage within such repetitive sequences is therefore crucial for understanding the impact of such domains on genomic integrity. In the present study, using strategically designed oligonucleotide substrates, we determined the ability of human apurinic/apyrimidinic endonuclease 1 (APE1) to cleave at apurinic/apyrimidinic (AP) sites in a collection of tandem DNA repeat landscapes involving telomeric and CAG/CTG repeat sequences. Our studies reveal the differential influence of domain sequence, conformation, and AP site location/relative positioning on the efficiency of APE1 binding and strand incision. Intriguingly, our data demonstrate that APE1 endonuclease efficiency correlates with the thermodynamic stability of the DNA substrate. We discuss how these results have both predictive and mechanistic consequences for understanding the success and failure of repair protein activity associated with such oxidatively sensitive, conformationally plastic/dynamic repetitive DNA domains. Published by Elsevier Ltd.

  16. Utilization of a cloned alphoid repeating sequence of human DNA in the study of polymorphism of chromosomal heterochromatin regions

    International Nuclear Information System (INIS)

    Kruminya, A.R.; Kroshkina, V.G.; Yurov, Yu.B.; Aleksandrov, I.A.; Mitkevich, S.P.; Gindilis, V.M.

    1988-01-01

    The chromosomal distribution of the cloned PHS05 fragment of human alphoid DNA was studied by in situ hybridization in 38 individuals. It was shown that this DNA fraction is primarily localized in the pericentric regions of practically all chromosomes of the set. Significant interchromosomal differences and a weakly expressed interindividual polymorphism were discovered in the copying ability of this class of repeating DNA sequences; associations were not found between the results of hybridization and the pattern of Q-polymorphism

  17. Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats

    Directory of Open Access Journals (Sweden)

    Graner Andreas

    2008-10-01

    Full Text Available Abstract Background Barley has one of the largest and most complex genomes of all economically important food crops. The rise of new short read sequencing technologies such as Illumina/Solexa permits such large genomes to be effectively sampled at relatively low cost. Based on the corresponding sequence reads a Mathematically Defined Repeat (MDR index can be generated to map repetitive regions in genomic sequences. Results We have generated 574 Mbp of Illumina/Solexa sequences from barley total genomic DNA, representing about 10% of a genome equivalent. From these sequences we generated an MDR index which was then used to identify and mark repetitive regions in the barley genome. Comparison of the MDR plots with expert repeat annotation drawing on the information already available for known repetitive elements revealed a significant correspondence between the two methods. MDR-based annotation allowed for the identification of dozens of novel repeat sequences, though, which were not recognised by hand-annotation. The MDR data was also used to identify gene-containing regions by masking of repetitive sequences in eight de-novo sequenced bacterial artificial chromosome (BAC clones. For half of the identified candidate gene islands indeed gene sequences could be identified. MDR data were only of limited use, when mapped on genomic sequences from the closely related species Triticum monococcum as only a fraction of the repetitive sequences was recognised. Conclusion An MDR index for barley, which was obtained by whole-genome Illumina/Solexa sequencing, proved as efficient in repeat identification as manual expert annotation. Circumventing the labour-intensive step of producing a specific repeat library for expert annotation, an MDR index provides an elegant and efficient resource for the identification of repetitive and low-copy (i.e. potentially gene-containing sequences regions in uncharacterised genomic sequences. The restriction that a particular

  18. Alu repeats as markers for forensic DNA analyses

    Energy Technology Data Exchange (ETDEWEB)

    Batzer, M.A.; Alegria-Hartman, M. [Lawrence Livermore National Lab., CA (United States); Kass, D.H. [Louisiana State Univ., New Orleans, LA (United States)] [and others

    1994-01-01

    The Human-Specific (HS) subfamily of Alu sequences is comprised of a group of 500 nearly identical members which are almost exclusively restricted to the human genome. Individual subfamily members share an average of 98.9% nucleotide identity with the HS subfamily consensus sequence, and have an average age of 2.8 million years. We have developed a Polymerase Chain Reaction (PCR) based assay using primers complementary to the 5 inch and 3 inch unique flanking DNA sequences from each HS Alu that allow the locus to be assayed for the presence or absence of the Alu repeat. The dimorphic HS Alu sequences probably inserted in the human genome after the radiation of modem humans (within the last 200,000-one million years) and represent a unique source of information for human population genetics and forensic DNA analyses. These sites can be developed into Dimorphic Alu Sequence Tagged Sites (DASTS) for the Human Genome Project. HS Alu family member insertions differ from other types of polymorphism (e.g. Variable Number of Tandem Repeat [VNTR] or Restriction Fragment Length Polymorphism [RFLP]) in that polymorphisms due to Alu insertions arise as a result of a unique event which has occurred only one time in the human population and spread through the population from that point. Therefore, individuals that share HS Alu repeats inherited these elements from a common ancestor. Most VNTR and RFLP polymorphisms may arise multiple times in parallel within a population.

  19. The mitochondrial genome of the legume Vigna radiata and the analysis of recombination across short mitochondrial repeats.

    Directory of Open Access Journals (Sweden)

    Andrew J Alverson

    2011-01-01

    Full Text Available The mitochondrial genomes of seed plants are exceptionally fluid in size, structure, and sequence content, with the accumulation and activity of repetitive sequences underlying much of this variation. We report the first fully sequenced mitochondrial genome of a legume, Vigna radiata (mung bean, and show that despite its unexceptional size (401,262 nt, the genome is unusually depauperate in repetitive DNA and "promiscuous" sequences from the chloroplast and nuclear genomes. Although Vigna lacks the large, recombinationally active repeats typical of most other seed plants, a PCR survey of its modest repertoire of short (38-297 nt repeats nevertheless revealed evidence for recombination across all of them. A set of novel control assays showed, however, that these results could instead reflect, in part or entirely, artifacts of PCR-mediated recombination. Consequently, we recommend that other methods, especially high-depth genome sequencing, be used instead of PCR to infer patterns of plant mitochondrial recombination. The average-sized but repeat- and feature-poor mitochondrial genome of Vigna makes it ever more difficult to generalize about the factors shaping the size and sequence content of plant mitochondrial genomes.

  20. DNA fingerprinting, DNA barcoding, and next generation sequencing technology in plants.

    Science.gov (United States)

    Sucher, Nikolaus J; Hennell, James R; Carles, Maria C

    2012-01-01

    DNA fingerprinting of plants has become an invaluable tool in forensic, scientific, and industrial laboratories all over the world. PCR has become part of virtually every variation of the plethora of approaches used for DNA fingerprinting today. DNA sequencing is increasingly used either in combination with or as a replacement for traditional DNA fingerprinting techniques. A prime example is the use of short, standardized regions of the genome as taxon barcodes for biological identification of plants. Rapid advances in "next generation sequencing" (NGS) technology are driving down the cost of sequencing and bringing large-scale sequencing projects into the reach of individual investigators. We present an overview of recent publications that demonstrate the use of "NGS" technology for DNA fingerprinting and DNA barcoding applications.

  1. Chlamydomonas chloroplasts can use short dispersed repeats and multiple pathways to repair a double-strand break in the genome.

    Science.gov (United States)

    Odom, Obed W; Baek, Kwang-Hyun; Dani, Radhika N; Herrin, David L

    2008-03-01

    Certain group I introns insert into intronless DNA via an endonuclease that creates a double-strand break (DSB). There are two models for intron homing in phage: synthesis-dependent strand annealing (SDSA) and double-strand break repair (DSBR). The Cr.psbA4 intron homes efficiently from a plasmid into the chloroplast psbA gene in Chlamydomonas, but little is known about the mechanism. Analysis of co-transformants selected using a spectinomycin-resistant 16S gene (16S(spec)) provided evidence for both pathways. We also examined the consequences of the donor DNA having only one-sided or no homology with the psbA gene. When there was no homology with the donor DNA, deletions of up to 5 kb involving direct repeats that flank the psbA gene were obtained. Remarkably, repeats as short as 15 bp were used for this repair, which is consistent with the single-strand annealing (SSA) pathway. When the donor had one-sided homology, the DSB in most co-transformants was repaired using two DNAs, the donor and the 16S(spec) plasmid, which, coincidentally, contained a region that is repeated upstream of psbA. DSB repair using two separate DNAs provides further evidence for the SDSA pathway. These data show that the chloroplast can repair a DSB using short dispersed repeats located proximally, distally, or even on separate molecules relative to the DSB. They also provide a rationale for the extensive repertoire of repeated sequences in this genome.

  2. Application of synthetic DNA probes to the analysis of DNA sequence variants in man

    International Nuclear Information System (INIS)

    Wallace, R.B.; Petz, L.D.; Yam, P.Y.

    1986-01-01

    Oligonucleotide probes provide a tool to discriminate between any two alleles on the basis of hybridization. Random sampling of the genome with different oligonucleotide probes should reveal polymorphism in a certain percentage of the cases. In the hope of identifying polymorphic regions more efficiently, we chose to take advantage of the proposed hypermutability of repeated DNA sequences and the specificity of oligonucleotide hybridization. Since, under appropriate conditions, oligonucleotide probes require complete base pairing for hybridization to occur, they will only hybridize to a subset of the members of a repeat family when all members of the family are not identical. The results presented here suggest that oligonucleotide hybridization can be used to extend the genomic sequences that can be tested for the presence of RFLPs. This expands the tools available to human genetics. In addition, the results suggest that repeated DNA sequences are indeed more polymorphic than single-copy sequences. 28 references, 2 figures

  3. Always look on both sides: phylogenetic information conveyed by simple sequence repeat allele sequences.

    Directory of Open Access Journals (Sweden)

    Stéphanie Barthe

    Full Text Available Simple sequence repeat (SSR markers are widely used tools for inferences about genetic diversity, phylogeography and spatial genetic structure. Their applications assume that variation among alleles is essentially caused by an expansion or contraction of the number of repeats and that, accessorily, mutations in the target sequences follow the stepwise mutation model (SMM. Generally speaking, PCR amplicon sizes are used as direct indicators of the number of SSR repeats composing an allele with the data analysis either ignoring the extent of allele size differences or assuming that there is a direct correlation between differences in amplicon size and evolutionary distance. However, without precisely knowing the kind and distribution of polymorphism within an allele (SSR and the associated flanking region (FR sequences, it is hard to say what kind of evolutionary message is conveyed by such a synthetic descriptor of polymorphism as DNA amplicon size. In this study, we sequenced several SSR alleles in multiple populations of three divergent tree genera and disentangled the types of polymorphisms contained in each portion of the DNA amplicon containing an SSR. The patterns of diversity provided by amplicon size variation, SSR variation itself, insertions/deletions (indels, and single nucleotide polymorphisms (SNPs observed in the FRs were compared. Amplicon size variation largely reflected SSR repeat number. The amount of variation was as large in FRs as in the SSR itself. The former contributed significantly to the phylogenetic information and sometimes was the main source of differentiation among individuals and populations contained by FR and SSR regions of SSR markers. The presence of mutations occurring at different rates within a marker's sequence offers the opportunity to analyse evolutionary events occurring on various timescales, but at the same time calls for caution in the interpretation of SSR marker data when the distribution of within

  4. Novel expressed sequence tag- simple sequence repeats (EST ...

    African Journals Online (AJOL)

    Using different bioinformatic criteria, the SUCEST database was used to mine for simple sequence repeat (SSR) markers. Among 42,189 clusters, 1,425 expressed sequence tag- simple sequence repeats (EST-SSRs) were identified in silico. Trinucleotide repeats were the most abundant SSRs detected. Of 212 primer pairs ...

  5. Entropic fluctuations in DNA sequences

    Science.gov (United States)

    Thanos, Dimitrios; Li, Wentian; Provata, Astero

    2018-03-01

    The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.

  6. Origin-Dependent Inverted-Repeat Amplification: Tests of a Model for Inverted DNA Amplification.

    Directory of Open Access Journals (Sweden)

    Bonita J Brewer

    2015-12-01

    Full Text Available DNA replication errors are a major driver of evolution--from single nucleotide polymorphisms to large-scale copy number variations (CNVs. Here we test a specific replication-based model to explain the generation of interstitial, inverted triplications. While no genetic information is lost, the novel inversion junctions and increased copy number of the included sequences create the potential for adaptive phenotypes. The model--Origin-Dependent Inverted-Repeat Amplification (ODIRA-proposes that a replication error at pre-existing short, interrupted, inverted repeats in genomic sequences generates an extrachromosomal, inverted dimeric, autonomously replicating intermediate; subsequent genomic integration of the dimer yields this class of CNV without loss of distal chromosomal sequences. We used a combination of in vitro and in vivo approaches to test the feasibility of the proposed replication error and its downstream consequences on chromosome structure in the yeast Saccharomyces cerevisiae. We show that the proposed replication error-the ligation of leading and lagging nascent strands to create "closed" forks-can occur in vitro at short, interrupted inverted repeats. The removal of molecules with two closed forks results in a hairpin-capped linear duplex that we show replicates in vivo to create an inverted, dimeric plasmid that subsequently integrates into the genome by homologous recombination, creating an inverted triplication. While other models have been proposed to explain inverted triplications and their derivatives, our model can also explain the generation of human, de novo, inverted amplicons that have a 2:1 mixture of sequences from both homologues of a single parent--a feature readily explained by a plasmid intermediate that arises from one homologue and integrates into the other homologue prior to meiosis. Our tests of key features of ODIRA lend support to this mechanism and suggest further avenues of enquiry to unravel the origins

  7. Origin-Dependent Inverted-Repeat Amplification: Tests of a Model for Inverted DNA Amplification.

    Science.gov (United States)

    Brewer, Bonita J; Payen, Celia; Di Rienzi, Sara C; Higgins, Megan M; Ong, Giang; Dunham, Maitreya J; Raghuraman, M K

    2015-12-01

    DNA replication errors are a major driver of evolution--from single nucleotide polymorphisms to large-scale copy number variations (CNVs). Here we test a specific replication-based model to explain the generation of interstitial, inverted triplications. While no genetic information is lost, the novel inversion junctions and increased copy number of the included sequences create the potential for adaptive phenotypes. The model--Origin-Dependent Inverted-Repeat Amplification (ODIRA)-proposes that a replication error at pre-existing short, interrupted, inverted repeats in genomic sequences generates an extrachromosomal, inverted dimeric, autonomously replicating intermediate; subsequent genomic integration of the dimer yields this class of CNV without loss of distal chromosomal sequences. We used a combination of in vitro and in vivo approaches to test the feasibility of the proposed replication error and its downstream consequences on chromosome structure in the yeast Saccharomyces cerevisiae. We show that the proposed replication error-the ligation of leading and lagging nascent strands to create "closed" forks-can occur in vitro at short, interrupted inverted repeats. The removal of molecules with two closed forks results in a hairpin-capped linear duplex that we show replicates in vivo to create an inverted, dimeric plasmid that subsequently integrates into the genome by homologous recombination, creating an inverted triplication. While other models have been proposed to explain inverted triplications and their derivatives, our model can also explain the generation of human, de novo, inverted amplicons that have a 2:1 mixture of sequences from both homologues of a single parent--a feature readily explained by a plasmid intermediate that arises from one homologue and integrates into the other homologue prior to meiosis. Our tests of key features of ODIRA lend support to this mechanism and suggest further avenues of enquiry to unravel the origins of interstitial

  8. Instability of (CTGn•(CAGn trinucleotide repeats and DNA synthesis

    Directory of Open Access Journals (Sweden)

    Liu Guoqi

    2012-02-01

    Full Text Available Abstract Expansion of (CTGn•(CAGn trinucleotide repeat (TNR microsatellite sequences is the cause of more than a dozen human neurodegenerative diseases. (CTGn and (CAGn repeats form imperfectly base paired hairpins that tend to expand in vivo in a length-dependent manner. Yeast, mouse and human models confirm that (CTGn•(CAGn instability increases with repeat number, and implicate both DNA replication and DNA damage response mechanisms in (CTGn•(CAGn TNR expansion and contraction. Mutation and knockdown models that abrogate the expression of individual genes might also mask more subtle, cumulative effects of multiple additional pathways on (CTGn•(CAGn instability in whole animals. The identification of second site genetic modifiers may help to explain the variability of (CTGn•(CAGn TNR instability patterns between tissues and individuals, and offer opportunities for prognosis and treatment.

  9. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Science.gov (United States)

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697

  10. Programmable DNA-binding proteins from Burkholderia provide a fresh perspective on the TALE-like repeat domain.

    Science.gov (United States)

    de Lange, Orlando; Wolf, Christina; Dietze, Jörn; Elsaesser, Janett; Morbitzer, Robert; Lahaye, Thomas

    2014-06-01

    The tandem repeats of transcription activator like effectors (TALEs) mediate sequence-specific DNA binding using a simple code. Naturally, TALEs are injected by Xanthomonas bacteria into plant cells to manipulate the host transcriptome. In the laboratory TALE DNA binding domains are reprogrammed and used to target a fused functional domain to a genomic locus of choice. Research into the natural diversity of TALE-like proteins may provide resources for the further improvement of current TALE technology. Here we describe TALE-like proteins from the endosymbiotic bacterium Burkholderia rhizoxinica, termed Bat proteins. Bat repeat domains mediate sequence-specific DNA binding with the same code as TALEs, despite less than 40% sequence identity. We show that Bat proteins can be adapted for use as transcription factors and nucleases and that sequence preferences can be reprogrammed. Unlike TALEs, the core repeats of each Bat protein are highly polymorphic. This feature allowed us to explore alternative strategies for the design of custom Bat repeat arrays, providing novel insights into the functional relevance of non-RVD residues. The Bat proteins offer fertile grounds for research into the creation of improved programmable DNA-binding proteins and comparative insights into TALE-like evolution. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Solution properties of the archaeal CRISPR DNA repeat-binding homeodomain protein Cbp2

    DEFF Research Database (Denmark)

    Kenchappa, Chandra; Heiðarsson, Pétur Orri; Kragelund, Birthe

    2013-01-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) form the basis of diverse adaptive immune systems directed primarily against invading genetic elements of archaea and bacteria. Cbp1 of the crenarchaeal thermoacidophilic order Sulfolobales, carrying three imperfect repeats, binds...... specifically to CRISPR DNA repeats and has been implicated in facilitating production of long transcripts from CRISPR loci. Here, a second related class of CRISPR DNA repeat-binding protein, denoted Cbp2, is characterized that contains two imperfect repeats and is found amongst members of the crenarchaeal...... in facilitating high affinity DNA binding of Cbp2 by tethering the two domains. Structural studies on mutant proteins provide support for Cys(7) and Cys(28) enhancing high thermal stability of Cbp2(Hb) through disulphide bridge formation. Consistent with their proposed CRISPR transcriptional regulatory role, Cbp2...

  12. Genetic Analysis of Eight X-Chromosomal Short Tandem Repeat ...

    African Journals Online (AJOL)

    X-Chromosome short tandem repeat (STR) typing can complement existing DNA profiling protocols and can also offer useful information in cases of complex kinship analysis. This is the first population study of 8 X-linked STRs in Iraq. The purpose of this work was to provide a basic data of allele and haplotype frequency for ...

  13. Potentials and limitations of histone repeat sequences for phylogenetic reconstruction of Sophophora.

    Science.gov (United States)

    Baldo, A M; Les, D H; Strausbaugh, L D

    1999-11-01

    Simplified DNA sequence acquisition has provided many new data sets that are useful for phylogenetic reconstruction, including single- and multiple-copy nuclear and organellar genes. Although transcribed regions receive much attention, nontranscribed regions have recently been added to the repertoire of sequences suitable for phylogenetic studies, especially for closely related taxa. We evaluated the efficacy of a small portion of the histone repeat for phylogenetic reconstruction among Drosophila species. Histone repeats in invertebrates offer distinct advantages similar to those of widely used ribosomal repeats. First, the units are tandemly repeated and undergo concerted evolution. Second, histone repeats include both highly conserved coding and variable intergenic regions. This composition facilitates application of "universal" primers spanning potentially informative sites. We examined a small region of the histone repeat, including the intergenic spacer segments of coding regions from the divergently transcribed H2A and H2B histone genes. The spacer (about 230 bp) exists as a mosaic with highly conserved functional motifs interspersed with rapidly diverging regions; the former aid in alignment of the spacer. There are no ambiguities in alignment of coding regions. Coding and noncoding regions were analyzed together and separately for phylogenetic information. Parsimony, distance, and maximum-likelihood methods successfully retrieve the corroborated phylogeny for the taxa examined. This study demonstrates the resolving power of a small histone region which may now be added to the growing collection of phylogenetically useful DNA sequences.

  14. [Progress of genome engineering technology via clustered regularly interspaced short palindromic repeats--a review].

    Science.gov (United States)

    Li, Hao; Qiu, Shaofu; Song, Hongbin

    2013-10-04

    In survival competition with phage, bacteria and archaea gradually evolved the acquired immune system--Clustered regularly interspaced short palindromic repeats (CRISPR), presenting the trait of transcribing the crRNA and the CRISPR-associated protein (Cas) to silence or cleaving the foreign double-stranded DNA specifically. In recent years, strong interest arises in prokaryotes primitive immune system and many in-depth researches are going on. Recently, researchers successfully repurposed CRISPR as an RNA-guided platform for sequence-specific gene expression, which provides a simple approach for selectively perturbing gene expression on a genome-wide scale. It will undoubtedly bring genome engineering into a more convenient and accurate new era.

  15. Genetic variation and DNA fingerprinting of durian types in Malaysia using simple sequence repeat (SSR) markers.

    Science.gov (United States)

    Siew, Ging Yang; Ng, Wei Lun; Tan, Sheau Wei; Alitheen, Noorjahan Banu; Tan, Soon Guan; Yeap, Swee Keong

    2018-01-01

    Durian ( Durio zibethinus ) is one of the most popular tropical fruits in Asia. To date, 126 durian types have been registered with the Department of Agriculture in Malaysia based on phenotypic characteristics. Classification based on morphology is convenient, easy, and fast but it suffers from phenotypic plasticity as a direct result of environmental factors and age. To overcome the limitation of morphological classification, there is a need to carry out genetic characterization of the various durian types. Such data is important for the evaluation and management of durian genetic resources in producing countries. In this study, simple sequence repeat (SSR) markers were used to study the genetic variation in 27 durian types from the germplasm collection of Universiti Putra Malaysia. Based on DNA sequences deposited in Genbank, seven pairs of primers were successfully designed to amplify SSR regions in the durian DNA samples. High levels of variation among the 27 durian types were observed (expected heterozygosity, H E  = 0.35). The DNA fingerprinting power of SSR markers revealed by the combined probability of identity (PI) of all loci was 2.3×10 -3 . Unique DNA fingerprints were generated for 21 out of 27 durian types using five polymorphic SSR markers (the other two SSR markers were monomorphic). We further tested the utility of these markers by evaluating the clonal status of shared durian types from different germplasm collection sites, and found that some were not clones. The findings in this preliminary study not only shows the feasibility of using SSR markers for DNA fingerprinting of durian types, but also challenges the current classification of durian types, e.g., on whether the different types should be called "clones", "varieties", or "cultivars". Such matters have a direct impact on the regulation and management of durian genetic resources in the region.

  16. Short tandem repeat analysis in Japanese population.

    Science.gov (United States)

    Hashiyada, M

    2000-01-01

    Short tandem repeats (STRs), known as microsatellites, are one of the most informative genetic markers for characterizing biological materials. Because of the relatively small size of STR alleles (generally 100-350 nucleotides), amplification by polymerase chain reaction (PCR) is relatively easy, affording a high sensitivity of detection. In addition, STR loci can be amplified simultaneously in a multiplex PCR. Thus, substantial information can be obtained in a single analysis with the benefits of using less template DNA, reducing labor, and reducing the contamination. We investigated 14 STR loci in a Japanese population living in Sendai by three multiplex PCR kits, GenePrint PowerPlex 1.1 and 2.2. Fluorescent STR System (Promega, Madison, WI, USA) and AmpF/STR Profiler (Perkin-Elmer, Norwalk, CT, USA). Genomic DNA was extracted using sodium dodecyl sulfate (SDS) proteinase K or Chelex 100 treatment followed by the phenol/chloroform extraction. PCR was performed according to the manufacturer's protocols. Electrophoresis was carried out on an ABI 377 sequencer and the alleles were determined by GeneScan 2.0.2 software (Perkin-Elmer). In 14 STRs loci, statistical parameters indicated a relatively high rate, and no significant deviation from Hardy-Weinberg equilibrium was detected. We apply this STR system to paternity testing and forensic casework, e.g., personal identification in rape cases. This system is an effective tool in the forensic sciences to obtain information on individual identification.

  17. DNA dynamics is likely to be a factor in the genomic nucleotide repeats expansions related to diseases.

    Directory of Open Access Journals (Sweden)

    Boian S Alexandrov

    Full Text Available Trinucleotide repeats sequences (TRS represent a common type of genomic DNA motif whose expansion is associated with a large number of human diseases. The driving molecular mechanisms of the TRS ongoing dynamic expansion across generations and within tissues and its influence on genomic DNA functions are not well understood. Here we report results for a novel and notable collective breathing behavior of genomic DNA of tandem TRS, leading to propensity for large local DNA transient openings at physiological temperature. Our Langevin molecular dynamics (LMD and Markov Chain Monte Carlo (MCMC simulations demonstrate that the patterns of openings of various TRSs depend specifically on their length. The collective propensity for DNA strand separation of repeated sequences serves as a precursor for outsized intermediate bubble states independently of the G/C-content. We report that repeats have the potential to interfere with the binding of transcription factors to their consensus sequence by altered DNA breathing dynamics in proximity of the binding sites. These observations might influence ongoing attempts to use LMD and MCMC simulations for TRS-related modeling of genomic DNA functionality in elucidating the common denominators of the dynamic TRS expansion mutation with potential therapeutic applications.

  18. Analysis of an "off-ladder" allele at the Penta D short tandem repeat locus.

    Science.gov (United States)

    Yang, Y L; Wang, J G; Wang, D X; Zhang, W Y; Liu, X J; Cao, J; Yang, S L

    2015-11-25

    Kinship testing of a father and his son from Guangxi, China, the location of the Zhuang minority people, was performed using the PowerPlex® 18D System with a short tandem repeat typing kit. The results indicated that both the father and his son had an off-ladder allele at the Penta D locus, with a genetic size larger than that of the maximal standard allelic ladder. To further identify this locus, monogenic amplification, gene cloning, and genetic sequencing were performed. Sequencing analysis demonstrated that the fragment size of the Penta D-OL locus was 469 bp and the core sequence was [AAAGA]21, also called Penta D-21. The rare Penta D-21 allele was found to be distributed among the Zhuang population from the Guangxi Zhuang Autonomous Region of China; therefore, this study improved the range of DNA data available for this locus and enhanced our ability for individual identification of gene loci.

  19. Distribution and sequence homogeneity of an abundant satellite DNA in the beetle, Tenebrio molitor.

    Science.gov (United States)

    Davis, C A; Wyatt, G R

    1989-01-01

    The mealworm beetle, Tenebrio molitor, contains an unusually abundant and homogeneous satellite DNA which constitutes up to 60% of its genome. The satellite DNA is shown to be present in all of the chromosomes by in situ hybridization. 18 dimers of the repeat unit were cloned and sequenced. The consensus sequence is 142 nt long and lacks any internal repeat structure. Monomers of the sequence are very similar, showing on average a 2% divergence from the calculated consensus. Variant nucleotides are scattered randomly throughout the sequence although some variants are more common than others. Neighboring repeat units are no more alike than randomly chosen ones. The results suggest that some mechanism, perhaps gene conversion, is acting to maintain the homogeneity of the satellite DNA despite its abundance and distribution on all of the chromosomes. Images PMID:2762148

  20. Local chromatin structure of heterochromatin regulates repeated DNA stability, nucleolus structure, and genome integrity

    Energy Technology Data Exchange (ETDEWEB)

    Peng, Jamy C. [Univ. of California, Berkeley, CA (United States)

    2007-01-01

    Heterochromatin constitutes a significant portion of the genome in higher eukaryotes; approximately 30% in Drosophila and human. Heterochromatin contains a high repeat DNA content and a low density of protein-encoding genes. In contrast, euchromatin is composed mostly of unique sequences and contains the majority of single-copy genes. Genetic and cytological studies demonstrated that heterochromatin exhibits regulatory roles in chromosome organization, centromere function and telomere protection. As an epigenetically regulated structure, heterochromatin formation is not defined by any DNA sequence consensus. Heterochromatin is characterized by its association with nucleosomes containing methylated-lysine 9 of histone H3 (H3K9me), heterochromatin protein 1 (HP1) that binds H3K9me, and Su(var)3-9, which methylates H3K9 and binds HP1. Heterochromatin formation and functions are influenced by HP1, Su(var)3-9, and the RNA interference (RNAi) pathway. My thesis project investigates how heterochromatin formation and function impact nuclear architecture, repeated DNA organization, and genome stability in Drosophila melanogaster. H3K9me-based chromatin reduces extrachromosomal DNA formation; most likely by restricting the access of repair machineries to repeated DNAs. Reducing extrachromosomal ribosomal DNA stabilizes rDNA repeats and the nucleolus structure. H3K9me-based chromatin also inhibits DNA damage in heterochromatin. Cells with compromised heterochromatin structure, due to Su(var)3-9 or dcr-2 (a component of the RNAi pathway) mutations, display severe DNA damage in heterochromatin compared to wild type. In these mutant cells, accumulated DNA damage leads to chromosomal defects such as translocations, defective DNA repair response, and activation of the G2-M DNA repair and mitotic checkpoints that ensure cellular and animal viability. My thesis research suggests that DNA replication, repair, and recombination mechanisms in heterochromatin differ from those in

  1. Assembling the Streptococcus thermophilus clustered regularly interspaced short palindromic repeats (CRISPR) array for multiplex DNA targeting.

    Science.gov (United States)

    Guo, Lijun; Xu, Kun; Liu, Zhiyuan; Zhang, Cunfang; Xin, Ying; Zhang, Zhiying

    2015-06-01

    In addition to the advantages of scalable, affordable, and easy to engineer, the clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein (Cas) technology is superior for multiplex targeting, which is laborious and inconvenient when achieved by cloning multiple gRNA expressing cassettes. Here, we report a simple CRISPR array assembling method which will facilitate multiplex targeting usage. First, the Streptococcus thermophilus CRISPR3/Cas locus was cloned. Second, different CRISPR arrays were assembled with different crRNA spacers. Transformation assays using different Escherichia coli strains demonstrated efficient plasmid DNA targeting, and we achieved targeting efficiency up to 95% with an assembled CRISPR array with three crRNA spacers. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Identification of apple cultivars on the basis of simple sequence repeat markers.

    Science.gov (United States)

    Liu, G S; Zhang, Y G; Tao, R; Fang, J G; Dai, H Y

    2014-09-12

    DNA markers are useful tools that play an important role in plant cultivar identification. They are usually based on polymerase chain reaction (PCR) and include simple sequence repeats (SSRs), inter-simple sequence repeats, and random amplified polymorphic DNA. However, DNA markers were not used effectively in the complete identification of plant cultivars because of the lack of known DNA fingerprints. Recently, a novel approach called the cultivar identification diagram (CID) strategy was developed to facilitate the use of DNA markers for separate plant individuals. The CID was designed whereby a polymorphic maker was generated from each PCR that directly allowed for cultivar sample separation at each step. Therefore, it could be used to identify cultivars and varieties easily with fewer primers. In this study, 60 apple cultivars, including a few main cultivars in fields and varieties from descendants (Fuji x Telamon) were examined. Of the 20 pairs of SSR primers screened, 8 pairs gave reproducible, polymorphic DNA amplification patterns. The banding patterns obtained from these 8 primers were used to construct a CID map. Each cultivar or variety in this study was distinguished from the others completely, indicating that this method can be used for efficient cultivar identification. The result contributed to studies on germplasm resources and the seedling industry in fruit trees.

  3. Structural basis for sequence-specific recognition of DNA by TAL effectors

    KAUST Repository

    Deng, Dong; Yan, Chuangye; Pan, Xiaojing; Mahfouz, Magdy M.; Wang, Jiawei; Zhu, Jiankang; Shi, Yi Gong; Yan, Nieng

    2012-01-01

    TAL (transcription activator-like) effectors, secreted by phytopathogenic bacteria, recognize host DNA sequences through a central domain of tandem repeats. Each repeat comprises 33 to 35 conserved amino acids and targets a specific base pair

  4. Determination of allele frequencies in nine short tandem repeat loci ...

    African Journals Online (AJOL)

    SERVER

    2008-04-17

    Apr 17, 2008 ... out the human genome. These loci are a rich source of highly polymorphic markers that may be detected using the polymerase chain reaction (PCR). PCR is a mimic of the normal cellular process of replication of DNA molecules. Each STR is distinguished by the number of times a sequence is repeated, ...

  5. Simple sequence repeat marker loci discovery using SSR primer.

    Science.gov (United States)

    Robinson, Andrew J; Love, Christopher G; Batley, Jacqueline; Barker, Gary; Edwards, David

    2004-06-12

    Simple sequence repeats (SSRs) have become important molecular markers for a broad range of applications, such as genome mapping and characterization, phenotype mapping, marker assisted selection of crop plants and a range of molecular ecology and diversity studies. With the increase in the availability of DNA sequence information, an automated process to identify and design PCR primers for amplification of SSR loci would be a useful tool in plant breeding programs. We report an application that integrates SPUTNIK, an SSR repeat finder, with Primer3, a PCR primer design program, into one pipeline tool, SSR Primer. On submission of multiple FASTA formatted sequences, the script screens each sequence for SSRs using SPUTNIK. The results are parsed to Primer3 for locus-specific primer design. The script makes use of a Web-based interface, enabling remote use. This program has been written in PERL and is freely available for non-commercial users by request from the authors. The Web-based version may be accessed at http://hornbill.cspp.latrobe.edu.au/

  6. Sequence homology at the breakpoint and clinical phenotype of mitochondrial DNA deletion syndromes.

    Science.gov (United States)

    Sadikovic, Bekim; Wang, Jing; El-Hattab, Ayman W; Landsverk, Megan; Douglas, Ganka; Brundage, Ellen K; Craigen, William J; Schmitt, Eric S; Wong, Lee-Jun C

    2010-12-20

    Mitochondrial DNA (mtDNA) deletions are a common cause of mitochondrial disorders. Large mtDNA deletions can lead to a broad spectrum of clinical features with different age of onset, ranging from mild mitochondrial myopathies (MM), progressive external ophthalmoplegia (PEO), and Kearns-Sayre syndrome (KSS), to severe Pearson syndrome. The aim of this study is to investigate the molecular signatures surrounding the deletion breakpoints and their association with the clinical phenotype and age at onset. MtDNA deletions in 67 patients were characterized using array comparative genomic hybridization (aCGH) followed by PCR-sequencing of the deletion junctions. Sequence homology including both perfect and imperfect short repeats flanking the deletion regions were analyzed and correlated with clinical features and patients' age group. In all age groups, there was a significant increase in sequence homology flanking the deletion compared to mtDNA background. The youngest patient group (deletion distribution in size and locations, with a significantly lower sequence homology flanking the deletion, and the highest percentage of deletion mutant heteroplasmy. The older age groups showed rather discrete pattern of deletions with 44% of all patients over 6 years old carrying the most common 5 kb mtDNA deletion, which was found mostly in muscle specimens (22/41). Only 15% (3/20) of the young patients (deletion, which is usually present in blood rather than muscle. This group of patients predominantly (16 out of 17) exhibit multisystem disorder and/or Pearson syndrome, while older patients had predominantly neuromuscular manifestations including KSS, PEO, and MM. In conclusion, sequence homology at the deletion flanking regions is a consistent feature of mtDNA deletions. Decreased levels of sequence homology and increased levels of deletion mutant heteroplasmy appear to correlate with earlier onset and more severe disease with multisystem involvement.

  7. Human β satellite DNA: Genomic organization and sequence definition of a class of highly repetitive tandem DNA

    International Nuclear Information System (INIS)

    Waye, J.S.; Willard, H.F.

    1989-01-01

    The authors describe a class of human repetitive DNA, called β satellite, that, at a most fundamental level, exists as tandem arrays of diverged ∼68-base-pair monomer repeat units. The monomer units are organized as distinct subsets, each characterized by a multimeric higher-order repeat unit that is tandemly reiterated and represents a recent unit of amplification. They have cloned, characterized, and determined the sequence of two β satellite higher-order repeat units: one located on chromosome 9, the other on the acrocentric chromosomes (13, 14, 15, 21, and 22) and perhaps other sites in the genome. Analysis by pulsed-field gel electrophoresis reveals that these tandem arrays are localized in large domains that are marked by restriction fragment length polymorphisms. In total, β-satellite sequences comprise several million base pairs of DNA in the human genome. Analysis of this DNA family should permit insights into the nature of chromosome-specific and nonspecific modes of satellite DNA evolution and provide useful tools for probing the molecular organization and concerted evolution of the acrocentric chromosomes

  8. Genome-wide tracking of unmethylated DNA Alu repeats in normal and cancer cells

    DEFF Research Database (Denmark)

    Rodriguez, Jairo; Vives, Laura; Jordà, Mireia

    2008-01-01

    Methylation of the cytosine is the most frequent epigenetic modification of DNA in mammalian cells. In humans, most of the methylated cytosines are found in CpG-rich sequences within tandem and interspersed repeats that make up to 45% of the human genome, being Alu repeats the most common family....

  9. DNA Nucleotide Sequence Restricted by the RI Endonuclease

    Science.gov (United States)

    Hedgpeth, Joe; Goodman, Howard M.; Boyer, Herbert W.

    1972-01-01

    The sequence of DNA base pairs adjacent to the phosphodiester bonds cleaved by the RI restriction endonuclease in unmodified DNA from coliphage λ has been determined. The 5′-terminal nucleotide labeled with 32P and oligonucleotides up to the heptamer were analyzed from a pancreatic DNase digest. The following sequence of nucleotides adjacent to the RI break made in λ DNA was deduced from these data and from the 3′-dinucleotide sequence and nearest-neighbor analysis obtained from repair synthesis with the DNA polymerase of Rous sarcoma virus [Formula: see text] The RI endonuclease cleavage of the phosphodiester bonds (indicated by arrows) generates 5′-phosphoryls and short cohesive termini of four nucleotides, pApApTpT. The most striking feature of the sequence is its symmetry. PMID:4343974

  10. Tandemly repeated sequence in 5'end of mtDNA control region of ...

    African Journals Online (AJOL)

    STORAGESEVER

    2008-12-17

    Dec 17, 2008 ... chain reaction (PCR). Japanese Spanish ... mainly covered general ecology and fishery biology. No study concerning the ... Conserved sequence blocks and the repeat units are indicated by boxes. performed using the exact ...

  11. [Bioinformatics Analysis of Clustered Regularly Interspaced Short Palindromic Repeats in the Genomes of Shigella].

    Science.gov (United States)

    Wang, Pengfei; Wang, Yingfang; Duan, Guangcai; Xue, Zerun; Wang, Linlin; Guo, Xiangjiao; Yang, Haiyan; Xi, Yuanlin

    2015-04-01

    This study was aimed to explore the features of clustered regularly interspaced short palindromic repeats (CRISPR) structures in Shigella by using bioinformatics. We used bioinformatics methods, including BLAST, alignment and RNA structure prediction, to analyze the CRISPR structures of Shigella genomes. The results showed that the CRISPRs existed in the four groups of Shigella, and the flanking sequences of upstream CRISPRs could be classified into the same group with those of the downstream. We also found some relatively conserved palindromic motifs in the leader sequences. Repeat sequences had the same group with corresponding flanking sequences, and could be classified into two different types by their RNA secondary structures, which contain "stem" and "ring". Some spacers were found to homologize with part sequences of plasmids or phages. The study indicated that there were correlations between repeat sequences and flanking sequences, and the repeats might act as a kind of recognition mechanism to mediate the interaction between foreign genetic elements and Cas proteins.

  12. Characteristics of palindromic sequences in DNA of the sea urchin Stronglyocentrotus intermedius

    International Nuclear Information System (INIS)

    Brykov, V.A.; Kukhlevskii, A.D.

    1986-01-01

    The fraction of palindromic sequences in the nuclear DNA of the sea urchin S. intermedius was characterized. Using chromatography on hydroxyapatite and treatment with S1 nuclease, it was shown that the fraction of palindromic sequences more than doubles when the sodium concentration in solution is increased or the temperature of reassociation is lowered. The increase is due to the involvement of inverted repeats in reassociation, which are characterized by a substantial nonhomologous character and/or the presence of an extended intervening DNA sequence. It was found by the method of reassociation of a nicked palindrome fraction with an excess of total homologous DNA that most of the inverted repeats in the sea urchin genome are unique sequences. The complexity of the palindrome fraction was estimated at 8.2 x 10 7 nucleotide pairs, and the number of palindromes per haploid genome ∼ 500,000

  13. Complete DNA sequence of the linear mitochondrial genome of the pathogenic yeast Candida parapsilosis

    DEFF Research Database (Denmark)

    Nosek, J.; Novotna, M.; Hlavatovicova, Z.

    2004-01-01

    The complete sequence of the mitochondrial DNA of the opportunistic yeast pathogen Candida parapsilosis was determined. The mitochondrial genome is represented by linear DNA molecules terminating with tandem repeats of a 738-bp unit. The number of repeats varies, thus generating a population...

  14. Plasmid P1 replication: negative control by repeated DNA sequences.

    OpenAIRE

    Chattoraj, D; Cordes, K; Abeles, A

    1984-01-01

    The incompatibility locus, incA, of the unit-copy plasmid P1 is contained within a fragment that is essentially a set of nine 19-base-pair repeats. One or more copies of the fragment destabilizes the plasmid when present in trans. Here we show that extra copies of incA interfere with plasmid DNA replication and that a deletion of most of incA increases plasmid copy number. Thus, incA is not essential for replication but is required for its control. When cloned in a high-copy-number vector, pi...

  15. Massively parallel sequencing of forensic STRs

    DEFF Research Database (Denmark)

    Parson, Walther; Ballard, David; Budowle, Bruce

    2016-01-01

    The DNA Commission of the International Society for Forensic Genetics (ISFG) is reviewing factors that need to be considered ahead of the adoption by the forensic community of short tandem repeat (STR) genotyping by massively parallel sequencing (MPS) technologies. MPS produces sequence data that...

  16. A Sequence-Specific Interaction between the Saccharomyces cerevisiae rRNA Gene Repeats and a Locus Encoding an RNA Polymerase I Subunit Affects Ribosomal DNA Stability

    Science.gov (United States)

    Cahyani, Inswasti; Cridge, Andrew G.; Engelke, David R.; Ganley, Austen R. D.

    2014-01-01

    The spatial organization of eukaryotic genomes is linked to their functions. However, how individual features of the global spatial structure contribute to nuclear function remains largely unknown. We previously identified a high-frequency interchromosomal interaction within the Saccharomyces cerevisiae genome that occurs between the intergenic spacer of the ribosomal DNA (rDNA) repeats and the intergenic sequence between the locus encoding the second largest RNA polymerase I subunit and a lysine tRNA gene [i.e., RPA135-tK(CUU)P]. Here, we used quantitative chromosome conformation capture in combination with replacement mapping to identify a 75-bp sequence within the RPA135-tK(CUU)P intergenic region that is involved in the interaction. We demonstrate that the RPA135-IGS1 interaction is dependent on the rDNA copy number and the Msn2 protein. Surprisingly, we found that the interaction does not govern RPA135 transcription. Instead, replacement of a 605-bp region within the RPA135-tK(CUU)P intergenic region results in a reduction in the RPA135-IGS1 interaction level and fluctuations in rDNA copy number. We conclude that the chromosomal interaction that occurs between the RPA135-tK(CUU)P and rDNA IGS1 loci stabilizes rDNA repeat number and contributes to the maintenance of nucleolar stability. Our results provide evidence that the DNA loci involved in chromosomal interactions are composite elements, sections of which function in stabilizing the interaction or mediating a functional outcome. PMID:25421713

  17. Toward a Better Compression for DNA Sequences Using Huffman Encoding.

    Science.gov (United States)

    Al-Okaily, Anas; Almarri, Badar; Al Yami, Sultan; Huang, Chun-Hsi

    2017-04-01

    Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as the construction of multiple Huffman trees when encoding. The implementations demonstrate improvements on the compression ratios for five genomes with lengths ranging from 5 to 50 Mbp, compared with the standard Huffman tree algorithm. The research hence suggests an improvement on all such DNA sequence compression algorithms that use the conventional Huffman encoding. The research suggests an improvement on all DNA sequence compression algorithms that use the conventional Huffman encoding. Accompanying software is publicly available (AL-Okaily, 2016 ).

  18. In silico analysis of Simple Sequence Repeats from chloroplast genomes of Solanaceae species

    Directory of Open Access Journals (Sweden)

    Evandro Vagner Tambarussi

    2009-01-01

    Full Text Available The availability of chloroplast genome (cpDNA sequences of Atropa belladonna, Nicotiana sylvestris, N.tabacum, N. tomentosiformis, Solanum bulbocastanum, S. lycopersicum and S. tuberosum, which are Solanaceae species,allowed us to analyze the organization of cpSSRs in their genic and intergenic regions. In general, the number of cpSSRs incpDNA ranged from 161 in S. tuberosum to 226 in N. tabacum, and the number of intergenic cpSSRs was higher than geniccpSSRs. The mononucleotide repeats were the most frequent in studied species, but we also identified di-, tri-, tetra-, pentaandhexanucleotide repeats. Multiple alignments of all cpSSRs sequences from Solanaceae species made the identification ofnucleotide variability possible and the phylogeny was estimated by maximum parsimony. Our study showed that the plastomedatabase can be exploited for phylogenetic analysis and biotechnological approaches.

  19. Phylogeny of the Serrasalmidae (Characiformes based on mitochondrial DNA sequences

    Directory of Open Access Journals (Sweden)

    Guillermo Ortí

    2008-01-01

    Full Text Available Previous studies based on DNA sequences of mitochondrial (mt rRNA genes showed three main groups within the subfamily Serrasalminae: (1 a "pacu" clade of herbivores (Colossoma, Mylossoma, Piaractus; (2 the "Myleus" clade (Myleus, Mylesinus, Tometes, Ossubtus; and (3 the "piranha" clade (Serrasalmus, Pygocentrus, Pygopristis, Pristobrycon, Catoprion, Metynnis. The genus Acnodon was placed as the sister taxon of clade (2+3. However, poor resolution within each clade was obtained due to low levels of variation among rRNA gene sequences. Complete sequences of the hypervariable mtDNA control region for a total of 45 taxa, and additional sequences of 12S and 16S rRNA from a total of 74 taxa representing all genera in the family are now presented to address intragroup relationships. Control region sequences of several serrasalmid species exhibit tandem repeats of short motifs (12 to 33 bp in the 3' end of this region, accounting for substantial length variation. Bayesian inference and maximum parsimony analyses of these sequences identify the same groupings as before and provide further evidence to support the following observations: (a Serrasalmus gouldingi and species of Pristobrycon (non-striolatus form a monophyletic group that is the sister group to other species of Serrasalmus and Pygocentrus; (b Catoprion, Pygopristis, and Pristobrycon striolatus form a well supported clade, sister to the group described above; (c some taxa assigned to the genus Myloplus (M. asterias, M tiete, M ternetzi, and M rubripinnis form a well supported group whereas other Myloplus species remain with uncertain affinities (d Mylesinus, Tometes and Myleus setiger form a monophyletic group.

  20. SSRscanner: a program for reporting distribution and exact location of simple sequence repeats.

    Science.gov (United States)

    Anwar, Tamanna; Khan, Asad U

    2006-02-20

    Simple sequence repeats (SSRs) have become important molecular markers for a broad range of applications, such as genome mapping and characterization, phenotype mapping, marker assisted selection of crop plants and a range of molecular ecology and diversity studies. These repeated DNA sequences are found in both prokaryotes and eukaryotes. They are distributed almost at random throughout the genome, ranging from mononucleotide to trinucleotide repeats. They are also found at longer lengths (> 6 repeating units) of tracts. Most of the computer programs that find SSRs do not report its exact position. A computer program SSRscanner was written to find out distribution, frequency and exact location of each SSR in the genome. SSRscanner is user friendly. It can search repeats of any length and produce outputs with their exact position on chromosome and their frequency of occurrence in the sequence. This program has been written in PERL and is freely available for non-commercial users by request from the authors. Please contact the authors by E-mail: huzzi99@hotmail.com.

  1. Read length and repeat resolution: Exploring prokaryote genomes using next-generation sequencing technologies

    KAUST Repository

    Cahill, Matt J.

    2010-07-12

    Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.

  2. Read length and repeat resolution: exploring prokaryote genomes using next-generation sequencing technologies.

    Directory of Open Access Journals (Sweden)

    Matt J Cahill

    Full Text Available BACKGROUND: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. METHODOLOGY/PRINCIPAL FINDINGS: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. CONCLUSIONS: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length.

  3. Read length and repeat resolution: Exploring prokaryote genomes using next-generation sequencing technologies

    KAUST Repository

    Cahill, Matt J.; Kö ser, Claudio U.; Ross, Nicholas E.; Archer, John A.C.

    2010-01-01

    Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.

  4. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within...... in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H. influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions....

  5. The role of DNA repair in herpesvirus pathogenesis.

    Science.gov (United States)

    Brown, Jay C

    2014-10-01

    In cells latently infected with a herpesvirus, the viral DNA is present in the cell nucleus, but it is not extensively replicated or transcribed. In this suppressed state the virus DNA is vulnerable to mutagenic events that affect the host cell and have the potential to destroy the virus' genetic integrity. Despite the potential for genetic damage, however, herpesvirus sequences are well conserved after reactivation from latency. To account for this apparent paradox, I have tested the idea that host cell-encoded mechanisms of DNA repair are able to control genetic damage to latent herpesviruses. Studies were focused on homologous recombination-dependent DNA repair (HR). Methods of DNA sequence analysis were employed to scan herpesvirus genomes for DNA features able to activate HR. Analyses were carried out with a total of 39 herpesvirus DNA sequences, a group that included viruses from the alpha-, beta- and gamma-subfamilies. The results showed that all 39 genome sequences were enriched in two or more of the eight recombination-initiating features examined. The results were interpreted to indicate that HR can stabilize latent herpesvirus genomes. The results also showed, unexpectedly, that repair-initiating DNA features differed in alpha- compared to gamma-herpesviruses. Whereas inverted and tandem repeats predominated in alpha-herpesviruses, gamma-herpesviruses were enriched in short, GC-rich initiation sequences such as CCCAG and depleted in repeats. In alpha-herpesviruses, repair-initiating repeat sequences were found to be concentrated in a specific region (the S segment) of the genome while repair-initiating short sequences were distributed more uniformly in gamma-herpesviruses. The results suggest that repair pathways are activated differently in alpha- compared to gamma-herpesviruses. Copyright © 2014. Published by Elsevier Inc.

  6. Microsatellite DNA in genomic survey sequences and UniGenes of loblolly pine

    Science.gov (United States)

    Craig S Echt; Surya Saha; Dennis L Deemer; C Dana Nelson

    2011-01-01

    Genomic DNA sequence databases are a potential and growing resource for simple sequence repeat (SSR) marker development in loblolly pine (Pinus taeda L.). Loblolly pine also has many expressed sequence tags (ESTs) available for microsatellite (SSR) marker development. We compared loblolly pine SSR densities in genome survey sequences (GSSs) to those in non-redundant...

  7. Next-generation sequencing offers new insights into DNA degradation

    DEFF Research Database (Denmark)

    Overballe-Petersen, Søren; Orlando, Ludovic Antoine Alexandre; Willerslev, Eske

    2012-01-01

    The processes underlying DNA degradation are central to various disciplines, including cancer research, forensics and archaeology. The sequencing of ancient DNA molecules on next-generation sequencing platforms provides direct measurements of cytosine deamination, depurination and fragmentation...... rates that previously were obtained only from extrapolations of results from in vitro kinetic experiments performed over short timescales. For example, recent next-generation sequencing of ancient DNA reveals purine bases as one of the main targets of postmortem hydrolytic damage, through base...... elimination and strand breakage. It also shows substantially increased rates of DNA base-loss at guanosine. In this review, we argue that the latter results from an electron resonance structure unique to guanosine rather than adenosine having an extra resonance structure over guanosine as previously suggested....

  8. DNA interactions with a Methylene Blue redox indicator depend on the DNA length and are sequence specific.

    Science.gov (United States)

    Farjami, Elaheh; Clima, Lilia; Gothelf, Kurt V; Ferapontova, Elena E

    2010-06-01

    A DNA molecular beacon approach was used for the analysis of interactions between DNA and Methylene Blue (MB) as a redox indicator of a hybridization event. DNA hairpin structures of different length and guanine (G) content were immobilized onto gold electrodes in their folded states through the alkanethiol linker at the 5'-end. Binding of MB to the folded hairpin DNA was electrochemically studied and compared with binding to the duplex structure formed by hybridization of the hairpin DNA to a complementary DNA strand. Variation of the electrochemical signal from the DNA-MB complex was shown to depend primarily on the DNA length and sequence used: the G-C base pairs were the preferential sites of MB binding in the duplex. For short 20 nts long DNA sequences, the increased electrochemical response from MB bound to the duplex structure was consistent with the increased amount of bound and electrochemically readable MB molecules (i.e. MB molecules that are available for the electron transfer (ET) reaction with the electrode). With longer DNA sequences, the balance between the amounts of the electrochemically readable MB molecules bound to the hairpin DNA and to the hybrid was opposite: a part of the MB molecules bound to the long-sequence DNA duplex seem to be electrochemically mute due to long ET distance. The increasing electrochemical response from MB bound to the short-length DNA hybrid contrasts with the decreasing signal from MB bound to the long-length DNA hybrid and allows an "off"-"on" genosensor development.

  9. Direct and inverted repeats elicit genetic instability by both exploiting and eluding DNA double-strand break repair systems in mycobacteria.

    Directory of Open Access Journals (Sweden)

    Ewelina A Wojcik

    Full Text Available Repetitive DNA sequences with the potential to form alternative DNA conformations, such as slipped structures and cruciforms, can induce genetic instability by promoting replication errors and by serving as a substrate for DNA repair proteins, which may lead to DNA double-strand breaks (DSBs. However, the contribution of each of the DSB repair pathways, homologous recombination (HR, non-homologous end-joining (NHEJ and single-strand annealing (SSA, to this sort of genetic instability is not fully understood. Herein, we assessed the genome-wide distribution of repetitive DNA sequences in the Mycobacterium smegmatis, Mycobacterium tuberculosis and Escherichia coli genomes, and determined the types and frequencies of genetic instability induced by direct and inverted repeats, both in the presence and in the absence of HR, NHEJ, and SSA. All three genomes are strongly enriched in direct repeats and modestly enriched in inverted repeats. When using chromosomally integrated constructs in M. smegmatis, direct repeats induced the perfect deletion of their intervening sequences ~1,000-fold above background. Absence of HR further enhanced these perfect deletions, whereas absence of NHEJ or SSA had no influence, suggesting compromised replication fidelity. In contrast, inverted repeats induced perfect deletions only in the absence of SSA. Both direct and inverted repeats stimulated excision of the constructs from the attB integration sites independently of HR, NHEJ, or SSA. With episomal constructs, direct and inverted repeats triggered DNA instability by activating nucleolytic activity, and absence of the DSB repair pathways (in the order NHEJ>HR>SSA exacerbated this instability. Thus, direct and inverted repeats may elicit genetic instability in mycobacteria by 1 directly interfering with replication fidelity, 2 stimulating the three main DSB repair pathways, and 3 enticing L5 site-specific recombination.

  10. Direct and inverted repeats elicit genetic instability by both exploiting and eluding DNA double-strand break repair systems in mycobacteria.

    Science.gov (United States)

    Wojcik, Ewelina A; Brzostek, Anna; Bacolla, Albino; Mackiewicz, Pawel; Vasquez, Karen M; Korycka-Machala, Malgorzata; Jaworski, Adam; Dziadek, Jaroslaw

    2012-01-01

    Repetitive DNA sequences with the potential to form alternative DNA conformations, such as slipped structures and cruciforms, can induce genetic instability by promoting replication errors and by serving as a substrate for DNA repair proteins, which may lead to DNA double-strand breaks (DSBs). However, the contribution of each of the DSB repair pathways, homologous recombination (HR), non-homologous end-joining (NHEJ) and single-strand annealing (SSA), to this sort of genetic instability is not fully understood. Herein, we assessed the genome-wide distribution of repetitive DNA sequences in the Mycobacterium smegmatis, Mycobacterium tuberculosis and Escherichia coli genomes, and determined the types and frequencies of genetic instability induced by direct and inverted repeats, both in the presence and in the absence of HR, NHEJ, and SSA. All three genomes are strongly enriched in direct repeats and modestly enriched in inverted repeats. When using chromosomally integrated constructs in M. smegmatis, direct repeats induced the perfect deletion of their intervening sequences ~1,000-fold above background. Absence of HR further enhanced these perfect deletions, whereas absence of NHEJ or SSA had no influence, suggesting compromised replication fidelity. In contrast, inverted repeats induced perfect deletions only in the absence of SSA. Both direct and inverted repeats stimulated excision of the constructs from the attB integration sites independently of HR, NHEJ, or SSA. With episomal constructs, direct and inverted repeats triggered DNA instability by activating nucleolytic activity, and absence of the DSB repair pathways (in the order NHEJ>HR>SSA) exacerbated this instability. Thus, direct and inverted repeats may elicit genetic instability in mycobacteria by 1) directly interfering with replication fidelity, 2) stimulating the three main DSB repair pathways, and 3) enticing L5 site-specific recombination.

  11. Directed PCR-free engineering of highly repetitive DNA sequences

    Directory of Open Access Journals (Sweden)

    Preissler Steffen

    2011-09-01

    Full Text Available Abstract Background Highly repetitive nucleotide sequences are commonly found in nature e.g. in telomeres, microsatellite DNA, polyadenine (poly(A tails of eukaryotic messenger RNA as well as in several inherited human disorders linked to trinucleotide repeat expansions in the genome. Therefore, studying repetitive sequences is of biological, biotechnological and medical relevance. However, cloning of such repetitive DNA sequences is challenging because specific PCR-based amplification is hampered by the lack of unique primer binding sites resulting in unspecific products. Results For the PCR-free generation of repetitive DNA sequences we used antiparallel oligonucleotides flanked by restriction sites of Type IIS endonucleases. The arrangement of recognition sites allowed for stepwise and seamless elongation of repetitive sequences. This facilitated the assembly of repetitive DNA segments and open reading frames encoding polypeptides with periodic amino acid sequences of any desired length. By this strategy we cloned a series of polyglutamine encoding sequences as well as highly repetitive polyadenine tracts. Such repetitive sequences can be used for diverse biotechnological applications. As an example, the polyglutamine sequences were expressed as His6-SUMO fusion proteins in Escherichia coli cells to study their aggregation behavior in vitro. The His6-SUMO moiety enabled affinity purification of the polyglutamine proteins, increased their solubility, and allowed controlled induction of the aggregation process. We successfully purified the fusions proteins and provide an example for their applicability in filter retardation assays. Conclusion Our seamless cloning strategy is PCR-free and allows the directed and efficient generation of highly repetitive DNA sequences of defined lengths by simple standard cloning procedures.

  12. The mitochondrial and plastid genomes of Volvox carteri: bloated molecules rich in repetitive DNA

    Directory of Open Access Journals (Sweden)

    Lee Robert W

    2009-03-01

    Full Text Available Abstract Background The magnitude of noncoding DNA in organelle genomes can vary significantly; it is argued that much of this variation is attributable to the dissemination of selfish DNA. The results of a previous study indicate that the mitochondrial DNA (mtDNA of the green alga Volvox carteri abounds with palindromic repeats, which appear to be selfish elements. We became interested in the evolution and distribution of these repeats when, during a cursory exploration of the V. carteri nuclear DNA (nucDNA and plastid DNA (ptDNA sequences, we found palindromic repeats with similar structural features to those of the mtDNA. Upon this discovery, we decided to investigate the diversity and evolutionary implications of these palindromic elements by sequencing and characterizing large portions of mtDNA and ptDNA and then comparing these data to the V. carteri draft nuclear genome sequence. Results We sequenced 30 and 420 kilobases (kb of the mitochondrial and plastid genomes of V. carteri, respectively – resulting in partial assemblies of these genomes. The mitochondrial genome is the most bloated green-algal mtDNA observed to date: ~61% of the sequence is noncoding, most of which is comprised of short palindromic repeats spread throughout the intergenic and intronic regions. The plastid genome is the largest (>420 kb and most expanded (>80% noncoding ptDNA sequence yet discovered, with a myriad of palindromic repeats in the noncoding regions, which have a similar size and secondary structure to those of the mtDNA. We found that 15 kb (~0.01% of the nuclear genome are homologous to the palindromic elements of the mtDNA, and 50 kb (~0.05% are homologous to those of the ptDNA. Conclusion Selfish elements in the form of short palindromic repeats have propagated in the V. carteri mtDNA and ptDNA, resulting in the distension of these genomes. Copies of these same repeats are also found in a small fraction of the nucDNA, but appear to be inert in this

  13. RECG maintains plastid and mitochondrial genome stability by suppressing extensive recombination between short dispersed repeats.

    Directory of Open Access Journals (Sweden)

    Masaki Odahara

    2015-03-01

    Full Text Available Maintenance of plastid and mitochondrial genome stability is crucial for photosynthesis and respiration, respectively. Recently, we have reported that RECA1 maintains mitochondrial genome stability by suppressing gross rearrangements induced by aberrant recombination between short dispersed repeats in the moss Physcomitrella patens. In this study, we studied a newly identified P. patens homolog of bacterial RecG helicase, RECG, some of which is localized in both plastid and mitochondrial nucleoids. RECG partially complements recG deficiency in Escherichia coli cells. A knockout (KO mutation of RECG caused characteristic phenotypes including growth delay and developmental and mitochondrial defects, which are similar to those of the RECA1 KO mutant. The RECG KO cells showed heterogeneity in these phenotypes. Analyses of RECG KO plants showed that mitochondrial genome was destabilized due to a recombination between 8-79 bp repeats and the pattern of the recombination partly differed from that observed in the RECA1 KO mutants. The mitochondrial DNA (mtDNA instability was greater in severe phenotypic RECG KO cells than that in mild phenotypic ones. This result suggests that mitochondrial genomic instability is responsible for the defective phenotypes of RECG KO plants. Some of the induced recombination caused efficient genomic rearrangements in RECG KO mitochondria. Such loci were sometimes associated with a decrease in the levels of normal mtDNA and significant decrease in the number of transcripts derived from the loci. In addition, the RECG KO mutation caused remarkable plastid abnormalities and induced recombination between short repeats (12-63 bp in the plastid DNA. These results suggest that RECG plays a role in the maintenance of both plastid and mitochondrial genome stability by suppressing aberrant recombination between dispersed short repeats; this role is crucial for plastid and mitochondrial functions.

  14. Sequence context effects on 8-methoxypsoralen photobinding to defined DNA fragments

    International Nuclear Information System (INIS)

    Sage, E.; Moustacchi, E.

    1987-01-01

    The photoreaction of 8-methoxypsoralen (8-MOP) with DNA fragments of defined sequence was studied. The authors took advantage of the blockage by bulky adducts of the 3'-5'-exonuclease activity associated with the T4 DNA polymerase. The action of the exonuclease is stopped by biadducts as well as by monoadducts. The termination products were analyzed on sequencing gels. A strong sequence specificity was observed in the DNA photobinding of 8-MOP. The exonuclease terminates its digestion near thymine residues, mainly at potentially cross-linkable sites. There is an increasing reactivity of thymine residues in the order T < TT << TTT in a GC environment. For thymine residues in cross-linkable sites, the reactivity follows the order AT << TA ∼ TAT << ATA < ATAT < ATATAA. Repeated A-T sequences are hot spots for the photochemical reaction of 8-MOP with DNA. Both monoadducts and interstrand cross-links are formed preferentially in 5'-TpA sites. The results highlight the role of the sequence and consequently of the conformation around a potential site in the photobinding of 8-MOP to DNA

  15. Simple sequence repeat (SSR) markers are effective for identifying ...

    African Journals Online (AJOL)

    DNA was extracted from newly formed leaves and amplified using 21 simple sequence repeat (SSR) markers (NH001c, NH002b, NH005b, NH007b, NH008b, NH009b, NH011b, NH013b, NH012a, NH014a, NH015a, NH017a, KA4b, KA5, KA14, KA16, KB16, KU10, BGA35, BGT23b and HGA8b). The data was analyzed by ...

  16. Transcription arrest by a G quadruplex forming-trinucleotide repeat sequence from the human c-myb gene.

    Science.gov (United States)

    Broxson, Christopher; Beckett, Joshua; Tornaletti, Silvia

    2011-05-17

    Non canonical DNA structures correspond to genomic regions particularly susceptible to genetic instability. The transcription process facilitates formation of these structures and plays a major role in generating the instability associated with these genomic sites. However, little is known about how non canonical structures are processed when encountered by an elongating RNA polymerase. Here we have studied the behavior of T7 RNA polymerase (T7RNAP) when encountering a G quadruplex forming-(GGA)(4) repeat located in the human c-myb proto-oncogene. To make direct correlations between formation of the structure and effects on transcription, we have taken advantage of the ability of the T7 polymerase to transcribe single-stranded substrates and of G4 DNA to form in single-stranded G-rich sequences in the presence of potassium ions. Under physiological KCl concentrations, we found that T7 RNAP transcription was arrested at two sites that mapped to the c-myb (GGA)(4) repeat sequence. The extent of arrest did not change with time, indicating that the c-myb repeat represented an absolute block and not a transient pause to T7 RNAP. Consistent with G4 DNA formation, arrest was not observed in the absence of KCl or in the presence of LiCl. Furthermore, mutations in the c-myb (GGA)(4) repeat, expected to prevent transition to G4, also eliminated the transcription block. We show T7 RNAP arrest at the c-myb repeat in double-stranded DNA under conditions mimicking the cellular concentration of biomolecules and potassium ions, suggesting that the G4 structure formed in the c-myb repeat may represent a transcription roadblock in vivo. Our results support a mechanism of transcription-coupled DNA repair initiated by arrest of transcription at G4 structures.

  17. An ultra-high discrimination Y chromosome short tandem repeat multiplex DNA typing system.

    Directory of Open Access Journals (Sweden)

    Erin K Hanson

    Full Text Available In forensic casework, Y chromosome short tandem repeat markers (Y-STRs are often used to identify a male donor DNA profile in the presence of excess quantities of female DNA, such as is found in many sexual assault investigations. Commercially available Y-STR multiplexes incorporating 12-17 loci are currently used in forensic casework (Promega's PowerPlex Y and Applied Biosystems' AmpFlSTR Yfiler. Despite the robustness of these commercial multiplex Y-STR systems and the ability to discriminate two male individuals in most cases, the coincidence match probabilities between unrelated males are modest compared with the standard set of autosomal STR markers. Hence there is still a need to develop new multiplex systems to supplement these for those cases where additional discriminatory power is desired or where there is a coincidental Y-STR match between potential male participants. Over 400 Y-STR loci have been identified on the Y chromosome. While these have the potential to increase the discrimination potential afforded by the commercially available kits, many have not been well characterized. In the present work, 91 loci were tested for their relative ability to increase the discrimination potential of the commonly used 'core' Y-STR loci. The result of this extensive evaluation was the development of an ultra high discrimination (UHD multiplex DNA typing system that allows for the robust co-amplification of 14 non-core Y-STR loci. Population studies with a mixed African American and American Caucasian sample set (n = 572 indicated that the overall discriminatory potential of the UHD multiplex was superior to all commercial kits tested. The combined use of the UHD multiplex and the Applied Biosystems' AmpFlSTR Yfiler kit resulted in 100% discrimination of all individuals within the sample set, which presages its potential to maximally augment currently available forensic casework markers. It could also find applications in human evolutionary

  18. Duplication in DNA Sequences

    Science.gov (United States)

    Ito, Masami; Kari, Lila; Kincaid, Zachary; Seki, Shinnosuke

    The duplication and repeat-deletion operations are the basis of a formal language theoretic model of errors that can occur during DNA replication. During DNA replication, subsequences of a strand of DNA may be copied several times (resulting in duplications) or skipped (resulting in repeat-deletions). As formal language operations, iterated duplication and repeat-deletion of words and languages have been well studied in the literature. However, little is known about single-step duplications and repeat-deletions. In this paper, we investigate several properties of these operations, including closure properties of language families in the Chomsky hierarchy and equations involving these operations. We also make progress toward a characterization of regular languages that are generated by duplicating a regular language.

  19. Spreadsheet-based program for alignment of overlapping DNA sequences.

    Science.gov (United States)

    Anbazhagan, R; Gabrielson, E

    1999-06-01

    Molecular biology laboratories frequently face the challenge of aligning small overlapping DNA sequences derived from a long DNA segment. Here, we present a short program that can be used to adapt Excel spreadsheets as a tool for aligning DNA sequences, regardless of their orientation. The program runs on any Windows or Macintosh operating system computer with Excel 97 or Excel 98. The program is available for use as an Excel file, which can be downloaded from the BioTechniques Web site. Upon execution, the program opens a specially designed customized workbook and is capable of identifying overlapping regions between two sequence fragments and displaying the sequence alignment. It also performs a number of specialized functions such as recognition of restriction enzyme cutting sites and CpG island mapping without costly specialized software.

  20. Bisulfite sequencing reveals that Aspergillus flavus holds a hollow in DNA methylation.

    Directory of Open Access Journals (Sweden)

    Si-Yang Liu

    Full Text Available Aspergillus flavus first gained scientific attention for its production of aflatoxin. The underlying regulation of aflatoxin biosynthesis has been serving as a theoretical model for biosynthesis of other microbial secondary metabolites. Nevertheless, for several decades, the DNA methylation status, one of the important epigenomic modifications involved in gene regulation, in A. flavus remains to be controversial. Here, we applied bisulfite sequencing in conjunction with a biological replicate strategy to investigate the DNA methylation profiling of A. flavus genome. Both the bisulfite sequencing data and the methylome comparisons with other fungi confirm that the DNA methylation level of this fungus is negligible. Further investigation into the DNA methyltransferase of Aspergillus uncovers its close relationship with RID-like enzymes as well as its divergence with the methyltransferase of species with validated DNA methylation. The lack of repeat contents of the A. flavus' genome and the high RIP-index of the small amount of remanent repeat potentially support our speculation that DNA methylation may be absent in A. flavus or that it may possess de novo DNA methylation which occurs very transiently during the obscure sexual stage of this fungal species. This work contributes to our understanding on the DNA methylation status of A. flavus, as well as reinforces our views on the DNA methylation in fungal species. In addition, our strategy of applying bisulfite sequencing to DNA methylation detection in species with low DNA methylation may serve as a reference for later scientific investigations in other hypomethylated species.

  1. DNA breaks and repair in interstitial telomere sequences: Influence of chromatin structure

    International Nuclear Information System (INIS)

    Revaud, D.

    2009-06-01

    Interstitial Telomeric Sequences (ITS) are over-involved in spontaneous and radiationinduced chromosome aberrations in chinese hamster cells. We have performed a study to investigate the origin of their instability, spontaneously or after low doses irradiation. Our results demonstrate that ITS have a particular chromatin structure: short nucleotide repeat length, less compaction of the 30 nm chromatin fiber, presence of G-quadruplex structures. These features would modulate breaks production and would favour the recruitment of alternative DNA repair mechanisms, which are prone to produce chromosome aberrations. These pathways could be at the origin of chromosome aberrations in ITS whereas NHEJ and HR Double Strand Break repair pathways are rather required for a correct repair in these regions. (author)

  2. Detection and quantitative characterization of artificial extra peaks following polymerase chain reaction amplification of 14 short tandem repeat systems used in forensic investigations

    DEFF Research Database (Denmark)

    Meldgaard, Michael; Morling, N

    1997-01-01

    Detection on automated DNA sequencers of polymerase chain reaction (PCR) products of tetra- and penta-nucleotide short tandem repeat (STR) loci frequently reveals one or more extra peaks along with the true, major allele peak. The most frequent extra peak pattern is a single smaller peak which...... is one repeat unit shorter than the true allele peak. The existence of such artificial peaks is of special importance when the methods are used for forensic investigations because the artificial extra peaks may simulate true alleles when samples containing mixtures of DNA from different individuals...... are analyzed. We have investigated the relative levels of formation of extra peaks in 14 STR marker systems. We found that not only the parameters of the PCR but also factors determining the stringency during the post-PCR and pre-electrophoresis handling of samples were of importance for the formation of extra...

  3. DNA sequence analysis of X-ray induced Adh null mutations in Drosophila melanogaster

    International Nuclear Information System (INIS)

    Mahmoud, J.; Fossett, N.G.; Arbour-Reily, P.; McDaniel, M.; Tucker, A.; Chang, S.H.; Lee, W.R.

    1991-01-01

    The mutational spectrum for 28 X-ray induced mutations and 2 spontaneous mutations, previously determined by genetic and cytogenetic methods, consisted of 20 multilocus deficiencies (19 induced and 1 spontaneous) and 10 intragenic mutations (9 induced and 1 spontaneous). One of the X-ray induced intragenic mutations was lost, and another was determined to be a recombinant with the allele used in the recovery scheme. The DNA sequence of two X-ray induced intragenic mutations has been published. This paper reports the results of DNA sequence analysis of the remaining intragenic mutations and a summary of the X-ray induced mutational spectrum. The combination of DNA sequence analysis with genetic complementation analysis shows a continuous distribution in size of deletions rather than two different types of mutations consisting of deletions and 'point mutations'. Sequencing is shown to be essential for detecting intragenic deletions. Of particular importance for future studies is the observation that all of the intragenic deletions consist of a direct repeat adjacent to the breakpoint with one of the repeats deleted

  4. PERF: an exhaustive algorithm for ultra-fast and efficient identification of microsatellites from large DNA sequences.

    Science.gov (United States)

    Avvaru, Akshay Kumar; Sowpati, Divya Tej; Mishra, Rakesh Kumar

    2018-03-15

    Microsatellites or Simple Sequence Repeats (SSRs) are short tandem repeats of DNA motifs present in all genomes. They have long been used for a variety of purposes in the areas of population genetics, genotyping, marker-assisted selection and forensics. Numerous studies have highlighted their functional roles in genome organization and gene regulation. Though several tools are currently available to identify SSRs from genomic sequences, they have significant limitations. We present a novel algorithm called PERF for extremely fast and comprehensive identification of microsatellites from DNA sequences of any size. PERF is several fold faster than existing algorithms and uses up to 5-fold lesser memory. It provides a clean and flexible command-line interface to change the default settings, and produces output in an easily-parseable tab-separated format. In addition, PERF generates an interactive and stand-alone HTML report with charts and tables for easy downstream analysis. PERF is implemented in the Python programming language. It is freely available on PyPI under the package name perf_ssr, and can be installed directly using pip or easy_install. The documentation of PERF is available at https://github.com/rkmlab/perf. The source code of PERF is deposited in GitHub at https://github.com/rkmlab/perf under an MIT license. tej@ccmb.res.in. Supplementary data are available at Bioinformatics online.

  5. Generation of sequence signatures from DNA amplification fingerprints with mini-hairpin and microsatellite primers.

    Science.gov (United States)

    Caetano-Anollés, G; Gresshoff, P M

    1996-06-01

    DNA amplification fingerprinting (DAF) with mini-hairpins harboring arbitrary "core" sequences at their 3' termini were used to fingerprint a variety of templates, including PCR products and whole genomes, to establish genetic relationships between plant tax at the interspecific and intraspecific level, and to identify closely related fungal isolates and plant accessions. No correlation was observed between the sequence of the arbitrary core, the stability of the mini-hairpin structure and DAF efficiency. Mini-hairpin primers with short arbitrary cores and primers complementary to simple sequence repeats present in microsatellites were also used to generate arbitrary signatures from amplification profiles (ASAP). The ASAP strategy is a dual-step amplification procedure that uses at least one primer in each fingerprinting stage. ASAP was able to reproducibly amplify DAF products (representing about 10-15 kb of sequence) following careful optimization of amplification parameters such as primer and template concentration. Avoidance of primer sequences partially complementary to DAF product termini was necessary in order to produce distinct fingerprints. This allowed the combinatorial use of oligomers in nucleic acid screening, with numerous ASAP fingerprinting reactions based on a limited number of primer sequences. Mini-hairpin primers and ASAP analysis significantly increased detection of polymorphic DNA, separating closely related bermudagrass (Cynodon) cultivars and detecting putatively linked markers in bulked segregant analysis of the soybean (Glycine max) supernodulation (nitrate-tolerant symbiosis) locus.

  6. Short tandem repeat (STR) DNA markers are hypervariable and informative in Cannabis sativa: implications for forensic investigations.

    Science.gov (United States)

    Gilmore, Simon; Peakall, Rod; Robertson, James

    2003-01-09

    Short tandem repeat (STR) markers are the DNA marker of choice in forensic analysis of human DNA. Here we extend the application of STR markers to Cannabis sativa and demonstrate their potential for forensic investigations. Ninety-three individual cannabis plants, representing drug and fibre accessions of widespread origin were profiled with five STR makers. A total of 79 alleles were detected across the five loci. All but four individuals from a single drug-type accession had a unique multilocus genotype. An analysis of molecular variance (AMOVA) revealed significant genetic variation among accessions, with an average of 25% genetic differentiation. By contrast, only 6% genetic difference was detected between drug and fibre crop accessions and it was not possible to unequivocally assign plants as either drug or fibre type. However, our results suggest that drug strains may typically possess lower genetic diversity than fibre strains, which may ultimately provide a means of genetic delineation. Our findings demonstrate the promise of cannabis STR markers to provide information on: (1) agronomic type, (2) the geographical origin of drug seizures, and (3) evidence of conspiracy in production of clonally propagated drug crops.

  7. Evolutional dynamics of 45S and 5S ribosomal DNA in ancient allohexaploid Atropa belladonna.

    Science.gov (United States)

    Volkov, Roman A; Panchuk, Irina I; Borisjuk, Nikolai V; Hosiawa-Baranska, Marta; Maluszynska, Jolanta; Hemleben, Vera

    2017-01-23

    Polyploid hybrids represent a rich natural resource to study molecular evolution of plant genes and genomes. Here, we applied a combination of karyological and molecular methods to investigate chromosomal structure, molecular organization and evolution of ribosomal DNA (rDNA) in nightshade, Atropa belladonna (fam. Solanaceae), one of the oldest known allohexaploids among flowering plants. Because of their abundance and specific molecular organization (evolutionarily conserved coding regions linked to variable intergenic spacers, IGS), 45S and 5S rDNA are widely used in plant taxonomic and evolutionary studies. Molecular cloning and nucleotide sequencing of A. belladonna 45S rDNA repeats revealed a general structure characteristic of other Solanaceae species, and a very high sequence similarity of two length variants, with the only difference in number of short IGS subrepeats. These results combined with the detection of three pairs of 45S rDNA loci on separate chromosomes, presumably inherited from both tetraploid and diploid ancestor species, example intensive sequence homogenization that led to substitution/elimination of rDNA repeats of one parent. Chromosome silver-staining revealed that only four out of six 45S rDNA sites are frequently transcriptionally active, demonstrating nucleolar dominance. For 5S rDNA, three size variants of repeats were detected, with the major class represented by repeats containing all functional IGS elements required for transcription, the intermediate size repeats containing partially deleted IGS sequences, and the short 5S repeats containing severe defects both in the IGS and coding sequences. While shorter variants demonstrate increased rate of based substitution, probably in their transition into pseudogenes, the functional 5S rDNA variants are nearly identical at the sequence level, pointing to their origin from a single parental species. Localization of the 5S rDNA genes on two chromosome pairs further supports uniparental

  8. Integration of hepatitis B virus DNA in chromosome-specific satellite sequences

    International Nuclear Information System (INIS)

    Shaul, Y.; Garcia, P.D.; Schonberg, S.; Rutter, W.J.

    1986-01-01

    The authors previously reported the cloning and detailed analysis of the integrated hepatitis B virus sequences in a human hepatoma cell line. They report here the integration of at least one of hepatitis B virus at human satellite DNA sequences. The majority of the cellular sequences identified by this satellite were organized as a multimeric composition of a 0.6-kilobase EcoRI fragment. This clone hybridized in situ almost exclusively to the centromeric heterochromatin of chromosomes 1 and 16 and to a lower extent to chromosome 2 and to the heterochromatic region of the Y chromosome. The immediate flanking host sequence appeared as a hierarchy of repeating units which were almost identical to a previously reported human satellite III DNA sequence

  9. Sequence composition and gene content of the short arm of rye (Secale cereale chromosome 1.

    Directory of Open Access Journals (Sweden)

    Silvia Fluch

    Full Text Available BACKGROUND: The purpose of the study is to elucidate the sequence composition of the short arm of rye chromosome 1 (Secale cereale with special focus on its gene content, because this portion of the rye genome is an integrated part of several hundreds of bread wheat varieties worldwide. METHODOLOGY/PRINCIPAL FINDINGS: Multiple Displacement Amplification of 1RS DNA, obtained from flow sorted 1RS chromosomes, using 1RS ditelosomic wheat-rye addition line, and subsequent Roche 454FLX sequencing of this DNA yielded 195,313,589 bp sequence information. This quantity of sequence information resulted in 0.43× sequence coverage of the 1RS chromosome arm, permitting the identification of genes with estimated probability of 95%. A detailed analysis revealed that more than 5% of the 1RS sequence consisted of gene space, identifying at least 3,121 gene loci representing 1,882 different gene functions. Repetitive elements comprised about 72% of the 1RS sequence, Gypsy/Sabrina (13.3% being the most abundant. More than four thousand simple sequence repeat (SSR sites mostly located in gene related sequence reads were identified for possible marker development. The existence of chloroplast insertions in 1RS has been verified by identifying chimeric chloroplast-genomic sequence reads. Synteny analysis of 1RS to the full genomes of Oryza sativa and Brachypodium distachyon revealed that about half of the genes of 1RS correspond to the distal end of the short arm of rice chromosome 5 and the proximal region of the long arm of Brachypodium distachyon chromosome 2. Comparison of the gene content of 1RS to 1HS barley chromosome arm revealed high conservation of genes related to chromosome 5 of rice. CONCLUSIONS: The present study revealed the gene content and potential gene functions on this chromosome arm and demonstrated numerous sequence elements like SSRs and gene-related sequences, which can be utilised for future research as well as in breeding of wheat and rye.

  10. Simulating efficiently the evolution of DNA sequences.

    Science.gov (United States)

    Schöniger, M; von Haeseler, A

    1995-02-01

    Two menu-driven FORTRAN programs are described that simulate the evolution of DNA sequences in accordance with a user-specified model. This general stochastic model allows for an arbitrary stationary nucleotide composition and any transition-transversion bias during the process of base substitution. In addition, the user may define any hypothetical model tree according to which a family of sequences evolves. The programs suggest the computationally most inexpensive approach to generate nucleotide substitutions. Either reproducible or non-repeatable simulations, depending on the method of initializing the pseudo-random number generator, can be performed. The corresponding options are offered by the interface menu.

  11. The chloroplast genome sequence of the green alga Leptosira terrestris: multiple losses of the inverted repeat and extensive genome rearrangements within the Trebouxiophyceae

    Directory of Open Access Journals (Sweden)

    Turmel Monique

    2007-07-01

    Full Text Available Abstract Background In the Chlorophyta – the green algal phylum comprising the classes Prasinophyceae, Ulvophyceae, Trebouxiophyceae and Chlorophyceae – the chloroplast genome displays a highly variable architecture. While chlorophycean chloroplast DNAs (cpDNAs deviate considerably from the ancestral pattern described for the prasinophyte Nephroselmis olivacea, the degree of remodelling sustained by the two ulvophyte cpDNAs completely sequenced to date is intermediate relative to those observed for chlorophycean and trebouxiophyte cpDNAs. Chlorella vulgaris (Chlorellales is currently the only photosynthetic trebouxiophyte whose complete cpDNA sequence has been reported. To gain insights into the evolutionary trends of the chloroplast genome in the Trebouxiophyceae, we sequenced cpDNA from the filamentous alga Leptosira terrestris (Ctenocladales. Results The 195,081-bp Leptosira chloroplast genome resembles the 150,613-bp Chlorella genome in lacking a large inverted repeat (IR but differs greatly in gene order. Six of the conserved genes present in Chlorella cpDNA are missing from the Leptosira gene repertoire. The 106 conserved genes, four introns and 11 free standing open reading frames (ORFs account for 48.3% of the genome sequence. This is the lowest gene density yet observed among chlorophyte cpDNAs. Contrary to the situation in Chlorella but similar to that in the chlorophycean Scenedesmus obliquus, the gene distribution is highly biased over the two DNA strands in Leptosira. Nine genes, compared to only three in Chlorella, have significantly expanded coding regions relative to their homologues in ancestral-type green algal cpDNAs. As observed in chlorophycean genomes, the rpoB gene is fragmented into two ORFs. Short repeats account for 5.1% of the Leptosira genome sequence and are present mainly in intergenic regions. Conclusion Our results highlight the great plasticity of the chloroplast genome in the Trebouxiophyceae and indicate

  12. [Comparative analysis of clustered regularly interspaced short palindromic repeats (CRISPRs) loci in the genomes of halophilic archaea].

    Science.gov (United States)

    Zhang, Fan; Zhang, Bing; Xiang, Hua; Hu, Songnian

    2009-11-01

    Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a widespread system that provides acquired resistance against phages in bacteria and archaea. Here we aim to genome-widely analyze the CRISPR in extreme halophilic archaea, of which the whole genome sequences are available at present time. We used bioinformatics methods including alignment, conservation analysis, GC content and RNA structure prediction to analyze the CRISPR structures of 7 haloarchaeal genomes. We identified the CRISPR structures in 5 halophilic archaea and revealed a conserved palindromic motif in the flanking regions of these CRISPR structures. In addition, we found that the repeat sequences of large CRISPR structures in halophilic archaea were greatly conserved, and two types of predicted RNA secondary structures derived from the repeat sequences were likely determined by the fourth base of the repeat sequence. Our results support the proposal that the leader sequence may function as recognition site by having palindromic structures in flanking regions, and the stem-loop secondary structure formed by repeat sequences may function in mediating the interaction between foreign genetic elements and CAS-encoded proteins.

  13. A MapReduce Framework for DNA Sequencing Data Processing

    Directory of Open Access Journals (Sweden)

    Samy Ghoneimy

    2016-12-01

    Full Text Available Genomics and Next Generation Sequencers (NGS like Illumina Hiseq produce data in the order of ‎‎200 billion base pairs in a single one-week run for a 60x human genome coverage, which ‎requires modern high-throughput experimental technologies that can ‎only be tackled with high performance computing (HPC and specialized software algorithms called ‎‎“short read aligners”. This paper focuses on the implementation of the DNA sequencing as a set of MapReduce programs that will accept a DNA data set as a FASTQ file and finally generate a VCF (variant call format file, which has variants for a given DNA data set. In this paper MapReduce/Hadoop along with Burrows-Wheeler Aligner (BWA, Sequence Alignment/Map (SAM ‎tools, are fully utilized to provide various utilities for manipulating alignments, including sorting, merging, indexing, ‎and generating alignments. The Map-Sort-Reduce process is designed to be suited for a Hadoop framework in ‎which each cluster is a traditional N-node Hadoop cluster to utilize all of the Hadoop features like HDFS, program ‎management and fault tolerance. The Map step performs multiple instances of the short read alignment algorithm ‎‎(BoWTie that run in parallel in Hadoop. The ordered list of the sequence reads are used as input tuples and the ‎output tuples are the alignments of the short reads. In the Reduce step many parallel instances of the Short ‎Oligonucleotide Analysis Package for SNP (SOAPsnp algorithm run in the cluster. Input tuples are sorted ‎alignments for a partition and the output tuples are SNP calls. Results are stored via HDFS, and then archived in ‎SOAPsnp format. ‎ The proposed framework enables extremely fast discovering somatic mutations, inferring population genetical ‎parameters, and performing association tests directly based on sequencing data without explicit genotyping or ‎linkage-based imputation. It also demonstrate that this method achieves comparable

  14. ASAP: Amplification, sequencing & annotation of plastomes

    Directory of Open Access Journals (Sweden)

    Folta Kevin M

    2005-12-01

    Full Text Available Abstract Background Availability of DNA sequence information is vital for pursuing structural, functional and comparative genomics studies in plastids. Traditionally, the first step in mining the valuable information within a chloroplast genome requires sequencing a chloroplast plasmid library or BAC clones. These activities involve complicated preparatory procedures like chloroplast DNA isolation or identification of the appropriate BAC clones to be sequenced. Rolling circle amplification (RCA is being used currently to amplify the chloroplast genome from purified chloroplast DNA and the resulting products are sheared and cloned prior to sequencing. Herein we present a universal high-throughput, rapid PCR-based technique to amplify, sequence and assemble plastid genome sequence from diverse species in a short time and at reasonable cost from total plant DNA, using the large inverted repeat region from strawberry and peach as proof of concept. The method exploits the highly conserved coding regions or intergenic regions of plastid genes. Using an informatics approach, chloroplast DNA sequence information from 5 available eudicot plastomes was aligned to identify the most conserved regions. Cognate primer pairs were then designed to generate ~1 – 1.2 kb overlapping amplicons from the inverted repeat region in 14 diverse genera. Results 100% coverage of the inverted repeat region was obtained from Arabidopsis, tobacco, orange, strawberry, peach, lettuce, tomato and Amaranthus. Over 80% coverage was obtained from distant species, including Ginkgo, loblolly pine and Equisetum. Sequence from the inverted repeat region of strawberry and peach plastome was obtained, annotated and analyzed. Additionally, a polymorphic region identified from gel electrophoresis was sequenced from tomato and Amaranthus. Sequence analysis revealed large deletions in these species relative to tobacco plastome thus exhibiting the utility of this method for structural and

  15. D20S16 is a complex interspersed repeated sequence: Genetic and physical analysis of the locus

    Energy Technology Data Exchange (ETDEWEB)

    Bowden, D.W.; Krawchuk, M.D.; Howard, T.D. [Wake Forest Univ., Winston-Salem, NC (United States)] [and others

    1995-01-20

    The genomic structure of the D20S16 locus has been evaluated using genetic and physical methods. D20S16, originally detected with the probe CRI-L1214, is a highly informative, complex restriction fragment length polymorphism consisting of two separate allelic systems. The allelic systems have the characteristics of conventional VNTR polymorphisms and are separated by recombination ({theta} = 0.02, Z{sub max} = 74.82), as demonstrated in family studies. Most of these recombination events are meiotic crossovers and are maternal in origin, but two, including deletion of the locus in a cell line from a CEPH family member, occur without evidence for exchange of flanking markers. DNA sequence analysis suggests that the basis of the polymorphism is variable numbers of a 98-bp sequence tandemly repeated with 87 to 90% sequence similarity between repeats. The 98-bp repeat is a dimer of 49 bp sequence with 45 to 98% identity between the elements. In addition, nonpolymorphic genomic sequences adjacent to the polymorphic 98-bp repeat tracts are also repeated but are not polymorphic, i.e., show no individual to individual variation. Restriction enzyme mapping of cosmids containing the CRI-L1214 sequence suggests that there are multiple interspersed repeats of the CRI-L1214 sequence on chromosome 20. The results of dual-color fluorescence in situ hybridization experiments with interphase nuclei are also consistent with multiple repeats of an interspersed sequence on chromosome 20. 23 refs., 6 figs.

  16. Sequence specific electronic conduction through polyion-stabilized double-stranded DNA in nanoscale break junctions

    International Nuclear Information System (INIS)

    Mahapatro, Ajit K; Jeong, Kyung J; Lee, Gil U; Janes, David B

    2007-01-01

    This paper presents a study of sequence specific electronic conduction through short (15-base-pair) double-stranded (ds) DNA molecules, measured by immobilizing 3 ' -thiol-derivatized DNAs in nanometre scale gaps between gold electrodes. The polycation spermidine was used to stabilize the ds-DNA structure, allowing electrical measurements to be performed in a dry state. For specific sequences, the conductivity was observed to scale with the surface density of immobilized DNA, which can be controlled by the buffer concentration. A series of 15-base DNA oligonucleotide pairs, in which the centre sequence of five base pairs was changed from G:C to A:T pairs, has been studied. The conductivity per molecule is observed to decrease exponentially with the number of adjacent A:T pairs replacing G:C pairs, consistent with a barrier at the A:T sites. Conductance-based devices for short DNA sequences could provide sensing approaches with direct electrical readout, as well as label-free detection

  17. Dispersed repetitive sequences in eukaryotic genomes and their possible biological significance

    International Nuclear Information System (INIS)

    Georgiev, G.P.; Kramerov, D.A.; Ryskov, A.P.; Skryabin, K.G.; Lukanidin, E.M.

    1983-01-01

    In this paper is described the properties of a novel mouse mdg-like element, the A2 sequence, which is the most abundant repetitive sequence. We also characterized an ubiquitous B2 sequence that represents, after B1, the dominant family among the short interspersed repeats of the mouse genome. The existence of some putative transposition intermediates was shown for repeats of both A and B types of the mouse genome. These are closed circular DNA of the A type and small polyadenylated B + RNAs. The fundamental question that arises is whether these sequences are simply selfish DNA capable of transpositions or do they fulfill some useful biological functions within the genome. 66 references, 11 figures, 1 table

  18. Order and correlations in genomic DNA sequences. The spectral approach

    International Nuclear Information System (INIS)

    Lobzin, Vasilii V; Chechetkin, Vladimir R

    2000-01-01

    The structural analysis of genomic DNA sequences is discussed in the framework of the spectral approach, which is sufficiently universal due to the reciprocal correspondence and mutual complementarity of Fourier transform length scales. The spectral characteristics of random sequences of the same nucleotide composition possess the property of self-averaging for relatively short sequences of length M≥100-300. Comparison with the characteristics of random sequences determines the statistical significance of the structural features observed. Apart from traditional applications to the search for hidden periodicities, spectral methods are also efficient in studying mutual correlations in DNA sequences. By combining spectra for structure factors and correlation functions, not only integral correlations can be estimated but also their origin identified. Using the structural spectral entropy approach, the regularity of a sequence can be quantitatively assessed. A brief introduction to the problem is also presented and other major methods of DNA sequence analysis described. (reviews of topical problems)

  19. Using long ssDNA polynucleotides to amplify STRs loci in degraded DNA samples

    Science.gov (United States)

    Pérez Santángelo, Agustín; Corti Bielsa, Rodrigo M.; Sala, Andrea; Ginart, Santiago; Corach, Daniel

    2017-01-01

    Obtaining informative short tandem repeat (STR) profiles from degraded DNA samples is a challenging task usually undermined by locus or allele dropouts and peak-high imbalances observed in capillary electrophoresis (CE) electropherograms, especially for those markers with large amplicon sizes. We hereby show that the current STR assays may be greatly improved for the detection of genetic markers in degraded DNA samples by using long single stranded DNA polynucleotides (ssDNA polynucleotides) as surrogates for PCR primers. These long primers allow a closer annealing to the repeat sequences, thereby reducing the length of the template required for the amplification in fragmented DNA samples, while at the same time rendering amplicons of larger sizes suitable for multiplex assays. We also demonstrate that the annealing of long ssDNA polynucleotides does not need to be fully complementary in the 5’ region of the primers, thus allowing for the design of practically any long primer sequence for developing new multiplex assays. Furthermore, genotyping of intact DNA samples could also benefit from utilizing long primers since their close annealing to the target STR sequences may overcome wrong profiling generated by insertions/deletions present between the STR region and the annealing site of the primers. Additionally, long ssDNA polynucleotides might be utilized in multiplex PCR assays for other types of degraded or fragmented DNA, e.g. circulating, cell-free DNA (ccfDNA). PMID:29099837

  20. Chromosome-specific DNA Repeat Probes

    Energy Technology Data Exchange (ETDEWEB)

    Baumgartner, Adolf; Weier, Jingly Fung; Weier, Heinz-Ulrich G.

    2006-03-16

    In research as well as in clinical applications, fluorescence in situ hybridization (FISH) has gained increasing popularity as a highly sensitive technique to study cytogenetic changes. Today, hundreds of commercially available DNA probes serve the basic needs of the biomedical research community. Widespread applications, however, are often limited by the lack of appropriately labeled, specific nucleic acid probes. We describe two approaches for an expeditious preparation of chromosome-specific DNAs and the subsequent probe labeling with reporter molecules of choice. The described techniques allow the preparation of highly specific DNA repeat probes suitable for enumeration of chromosomes in interphase cell nuclei or tissue sections. In addition, there is no need for chromosome enrichment by flow cytometry and sorting or molecular cloning. Our PCR-based method uses either bacterial artificial chromosomes or human genomic DNA as templates with {alpha}-satellite-specific primers. Here we demonstrate the production of fluorochrome-labeled DNA repeat probes specific for human chromosomes 17 and 18 in just a few days without the need for highly specialized equipment and without the limitation to only a few fluorochrome labels.

  1. Exploiting BAC-end sequences for the mining, characterization and utility of new short sequences repeat (SSR) markers in Citrus.

    Science.gov (United States)

    Biswas, Manosh Kumar; Chai, Lijun; Mayer, Christoph; Xu, Qiang; Guo, Wenwu; Deng, Xiuxin

    2012-05-01

    The aim of this study was to develop a large set of microsatellite markers based on publicly available BAC-end sequences (BESs), and to evaluate their transferability, discriminating capacity of genotypes and mapping ability in Citrus. A set of 1,281 simple sequence repeat (SSR) markers were developed from the 46,339 Citrus clementina BAC-end sequences (BES), of them 20.67% contained SSR longer than 20 bp, corresponding to roughly one perfect SSR per 2.04 kb. The most abundant motifs were di-nucleotide (16.82%) repeats. Among all repeat motifs (TA/AT)n is the most abundant (8.38%), followed by (AG/CT)n (4.51%). Most of the BES-SSR are located in the non-coding region, but 1.3% of BES-SSRs were found to be associated with transposable element (TE). A total of 400 novel SSR primer pairs were synthesized and their transferability and polymorphism tested on a set of 16 Citrus and Citrus relative's species. Among these 333 (83.25%) were successfully amplified and 260 (65.00%) showed cross-species transferability with Poncirus trifoliata and Fortunella sp. These cross-species transferable markers could be useful for cultivar identification, for genomic study of Citrus, Poncirus and Fortunella sp. Utility of the developed SSR marker was demonstrated by identifying a set of 118 markers each for construction of linkage map of Citrus reticulata and Poncirus trifoliata. Genetic diversity and phylogenetic relationship among 40 Citrus and its related species were conducted with the aid of 25 randomly selected SSR primer pairs and results revealed that citrus genomic SSRs are superior to genic SSR for genetic diversity and germplasm characterization of Citrus spp.

  2. Targeting and tracing of specific DNA sequences with dTALEs in living cells

    Science.gov (United States)

    Thanisch, Katharina; Schneider, Katrin; Morbitzer, Robert; Solovei, Irina; Lahaye, Thomas; Bultmann, Sebastian; Leonhardt, Heinrich

    2014-01-01

    Epigenetic regulation of gene expression involves, besides DNA and histone modifications, the relative positioning of DNA sequences within the nucleus. To trace specific DNA sequences in living cells, we used programmable sequence-specific DNA binding of designer transcription activator-like effectors (dTALEs). We designed a recombinant dTALE (msTALE) with variable repeat domains to specifically bind a 19-bp target sequence of major satellite DNA. The msTALE was fused with green fluorescent protein (GFP) and stably expressed in mouse embryonic stem cells. Hybridization with a major satellite probe (3D-fluorescent in situ hybridization) and co-staining for known cellular structures confirmed in vivo binding of the GFP-msTALE to major satellite DNA present at nuclear chromocenters. Dual tracing of major satellite DNA and the replication machinery throughout S-phase showed co-localization during mid to late S-phase, directly demonstrating the late replication timing of major satellite DNA. Fluorescence bleaching experiments indicated a relatively stable but still dynamic binding, with mean residence times in the range of minutes. Fluorescently labeled dTALEs open new perspectives to target and trace DNA sequences and to monitor dynamic changes in subnuclear positioning as well as interactions with functional nuclear structures during cell cycle progression and cellular differentiation. PMID:24371265

  3. Targeting and tracing of specific DNA sequences with dTALEs in living cells.

    Science.gov (United States)

    Thanisch, Katharina; Schneider, Katrin; Morbitzer, Robert; Solovei, Irina; Lahaye, Thomas; Bultmann, Sebastian; Leonhardt, Heinrich

    2014-04-01

    Epigenetic regulation of gene expression involves, besides DNA and histone modifications, the relative positioning of DNA sequences within the nucleus. To trace specific DNA sequences in living cells, we used programmable sequence-specific DNA binding of designer transcription activator-like effectors (dTALEs). We designed a recombinant dTALE (msTALE) with variable repeat domains to specifically bind a 19-bp target sequence of major satellite DNA. The msTALE was fused with green fluorescent protein (GFP) and stably expressed in mouse embryonic stem cells. Hybridization with a major satellite probe (3D-fluorescent in situ hybridization) and co-staining for known cellular structures confirmed in vivo binding of the GFP-msTALE to major satellite DNA present at nuclear chromocenters. Dual tracing of major satellite DNA and the replication machinery throughout S-phase showed co-localization during mid to late S-phase, directly demonstrating the late replication timing of major satellite DNA. Fluorescence bleaching experiments indicated a relatively stable but still dynamic binding, with mean residence times in the range of minutes. Fluorescently labeled dTALEs open new perspectives to target and trace DNA sequences and to monitor dynamic changes in subnuclear positioning as well as interactions with functional nuclear structures during cell cycle progression and cellular differentiation.

  4. Structure, organization, and sequence of alpha satellite DNA from human chromosome 17: evidence for evolution by unequal crossing-over and an ancestral pentamer repeat shared with the human X chromosome.

    Science.gov (United States)

    Waye, J S; Willard, H F

    1986-09-01

    The centromeric regions of all human chromosomes are characterized by distinct subsets of a diverse tandemly repeated DNA family, alpha satellite. On human chromosome 17, the predominant form of alpha satellite is a 2.7-kilobase-pair higher-order repeat unit consisting of 16 alphoid monomers. We present the complete nucleotide sequence of the 16-monomer repeat, which is present in 500 to 1,000 copies per chromosome 17, as well as that of a less abundant 15-monomer repeat, also from chromosome 17. These repeat units were approximately 98% identical in sequence, differing by the exclusion of precisely 1 monomer from the 15-monomer repeat. Homologous unequal crossing-over is suggested as a probable mechanism by which the different repeat lengths on chromosome 17 were generated, and the putative site of such a recombination event is identified. The monomer organization of the chromosome 17 higher-order repeat unit is based, in part, on tandemly repeated pentamers. A similar pentameric suborganization has been previously demonstrated for alpha satellite of the human X chromosome. Despite the organizational similarities, substantial sequence divergence distinguishes these subsets. Hybridization experiments indicate that the chromosome 17 and X subsets are more similar to each other than to the subsets found on several other human chromosomes. We suggest that the chromosome 17 and X alpha satellite subsets may be related components of a larger alphoid subfamily which have evolved from a common ancestral repeat into the contemporary chromosome-specific subsets.

  5. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

    Directory of Open Access Journals (Sweden)

    Varala Kranthi

    2007-05-01

    Full Text Available Abstract Background Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA. Results We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis. Conclusion This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.

  6. Analysis of a new strain of Euphorbia mosaic virus with distinct replication specificity unveils a lineage of begomoviruses with short Rep sequences in the DNA-B intergenic region

    Directory of Open Access Journals (Sweden)

    Argüello-Astorga Gerardo R

    2010-10-01

    Full Text Available Abstract Background Euphorbia mosaic virus (EuMV is a member of the SLCV clade, a lineage of New World begomoviruses that display distinctive features in their replication-associated protein (Rep and virion-strand replication origin. The first entirely characterized EuMV isolate is native from Yucatan Peninsula, Mexico; subsequently, EuMV was detected in weeds and pepper plants from another region of Mexico, and partial DNA-A sequences revealed significant differences in their putative replication specificity determinants with respect to EuMV-YP. This study was aimed to investigate the replication compatibility between two EuMV isolates from the same country. Results A new isolate of EuMV was obtained from pepper plants collected at Jalisco, Mexico. Full-length clones of both genomic components of EuMV-Jal were biolistically inoculated into plants of three different species, which developed symptoms indistinguishable from those induced by EuMV-YP. Pseudorecombination experiments with EuMV-Jal and EuMV-YP genomic components demonstrated that these viruses do not form infectious reassortants in Nicotiana benthamiana, presumably because of Rep-iteron incompatibility. Sequence analysis of the EuMV-Jal DNA-B intergenic region (IR led to the unexpected discovery of a 35-nt-long sequence that is identical to a segment of the rep gene in the cognate viral DNA-A. Similar short rep sequences ranging from 35- to 51-nt in length were identified in all EuMV isolates and in three distinct viruses from South America related to EuMV. These short rep sequences in the DNA-B IR are positioned downstream to a ~160-nt non-coding domain highly similar to the CP promoter of begomoviruses belonging to the SLCV clade. Conclusions EuMV strains are not compatible in replication, indicating that this begomovirus species probably is not a replicating lineage in nature. The genomic analysis of EuMV-Jal led to the discovery of a subgroup of SLCV clade viruses that contain in

  7. Discovery of Escherichia coli CRISPR sequences in an undergraduate laboratory.

    Science.gov (United States)

    Militello, Kevin T; Lazatin, Justine C

    2017-05-01

    Clustered regularly interspaced short palindromic repeats (CRISPRs) represent a novel type of adaptive immune system found in eubacteria and archaebacteria. CRISPRs have recently generated a lot of attention due to their unique ability to catalog foreign nucleic acids, their ability to destroy foreign nucleic acids in a mechanism that shares some similarity to RNA interference, and the ability to utilize reconstituted CRISPR systems for genome editing in numerous organisms. In order to introduce CRISPR biology into an undergraduate upper-level laboratory, a five-week set of exercises was designed to allow students to examine the CRISPR status of uncharacterized Escherichia coli strains and to allow the discovery of new repeats and spacers. Students started the project by isolating genomic DNA from E. coli and amplifying the iap CRISPR locus using the polymerase chain reaction (PCR). The PCR products were analyzed by Sanger DNA sequencing, and the sequences were examined for the presence of CRISPR repeat sequences. The regions between the repeats, the spacers, were extracted and analyzed with BLASTN searches. Overall, CRISPR loci were sequenced from several previously uncharacterized E. coli strains and one E. coli K-12 strain. Sanger DNA sequencing resulted in the discovery of 36 spacer sequences and their corresponding surrounding repeat sequences. Five of the spacers were homologous to foreign (non-E. coli) DNA. Assessment of the laboratory indicates that improvements were made in the ability of students to answer questions relating to the structure and function of CRISPRs. Future directions of the laboratory are presented and discussed. © 2016 by The International Union of Biochemistry and Molecular Biology, 45(3):262-269, 2017. © 2016 The International Union of Biochemistry and Molecular Biology.

  8. Sequence Dependent Interactions Between DNA and Single-Walled Carbon Nanotubes

    Science.gov (United States)

    Roxbury, Daniel

    It is known that single-stranded DNA adopts a helical wrap around a single-walled carbon nanotube (SWCNT), forming a water-dispersible hybrid molecule. The ability to sort mixtures of SWCNTs based on chirality (electronic species) has recently been demonstrated using special short DNA sequences that recognize certain matching SWCNTs of specific chirality. This thesis investigates the intricacies of DNA-SWCNT sequence-specific interactions through both experimental and molecular simulation studies. The DNA-SWCNT binding strengths were experimentally quantified by studying the kinetics of DNA replacement by a surfactant on the surface of particular SWCNTs. Recognition ability was found to correlate strongly with measured binding strength, e.g. DNA sequence (TAT)4 was found to bind 20 times stronger to the (6,5)-SWCNT than sequence (TAT)4T. Next, using replica exchange molecular dynamics (REMD) simulations, equilibrium structures formed by (a) single-strands and (b) multiple-strands of 12-mer oligonucleotides adsorbed on various SWCNTs were explored. A number of structural motifs were discovered in which the DNA strand wraps around the SWCNT and 'stitches' to itself via hydrogen bonding. Great variability among equilibrium structures was observed and shown to be directly influenced by DNA sequence and SWCNT type. For example, the (6,5)-SWCNT DNA recognition sequence, (TAT)4, was found to wrap in a tight single-stranded right-handed helical conformation. In contrast, DNA sequence T12 forms a beta-barrel left-handed structure on the same SWCNT. These are the first theoretical indications that DNA-based SWCNT selectivity can arise on a molecular level. In a biomedical collaboration with the Mayo Clinic, pathways for DNA-SWCNT internalization into healthy human endothelial cells were explored. Through absorbance spectroscopy, TEM imaging, and confocal fluorescence microscopy, we showed that intracellular concentrations of SWCNTs far exceeded those of the incubation

  9. Repeated extraction of DNA from FTA cards

    DEFF Research Database (Denmark)

    Stangegaard, Michael; Ferrero, Laura; Børsting, Claus

    2011-01-01

    Extraction of DNA using magnetic bead based techniques on automated DNA extraction instruments provides a fast, reliable and reproducible method for DNA extraction from various matrices. However, the yield of extracted DNA from FTA-cards is typically low. Here, we demonstrate that it is possible...... to repeatedly extract DNA from the processed FTA-disk. The method increases the yield from the nanogram range to the microgram range....

  10. Pericentric satellite DNA sequences in Pipistrellus pipistrellus (Vespertilionidae; Chiroptera).

    Science.gov (United States)

    Barragán, M J L; Martínez, S; Marchal, J A; Fernández, R; Bullejos, M; Díaz de la Guardia, R; Sánchez, A

    2003-09-01

    This paper reports the molecular and cytogenetic characterization of a HindIII family of satellite DNA in the bat species Pipistrellus pipistrellus. This satellite is organized in tandem repeats of 418 bp monomer units, and represents approximately 3% of the whole genome. The consensus sequence from five cloned monomer units has an A-T content of 62.20%. We have found differences in the ladder pattern of bands between two populations of the same species. These differences are probably because of the absence of the target sites for the HindIII enzyme in most monomer units of one population, but not in the other. Fluorescent in situ hybridization (FISH) localized the satellite DNA in the pericentromeric regions of all autosomes and the X chromosome, but it was absent from the Y chromosome. Digestion of genomic DNAs with HpaII and its isoschizomer MspI demonstrated that these repetitive DNA sequences are not methylated. Other bat species were tested for the presence of this repetitive DNA. It was absent in five Vespertilionidae and one Rhinolophidae species, indicating that it could be a species/genus specific, repetitive DNA family.

  11. The sequence specificity of UV-induced DNA damage in a systematically altered DNA sequence.

    Science.gov (United States)

    Khoe, Clairine V; Chung, Long H; Murray, Vincent

    2018-06-01

    The sequence specificity of UV-induced DNA damage was investigated in a specifically designed DNA plasmid using two procedures: end-labelling and linear amplification. Absorption of UV photons by DNA leads to dimerisation of pyrimidine bases and produces two major photoproducts, cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). A previous study had determined that two hexanucleotide sequences, 5'-GCTC*AC and 5'-TATT*AA, were high intensity UV-induced DNA damage sites. The UV clone plasmid was constructed by systematically altering each nucleotide of these two hexanucleotide sequences. One of the main goals of this study was to determine the influence of single nucleotide alterations on the intensity of UV-induced DNA damage. The sequence 5'-GCTC*AC was designed to examine the sequence specificity of 6-4PPs and the highest intensity 6-4PP damage sites were found at 5'-GTTC*CC nucleotides. The sequence 5'-TATT*AA was devised to investigate the sequence specificity of CPDs and the highest intensity CPD damage sites were found at 5'-TTTT*CG nucleotides. It was proposed that the tetranucleotide DNA sequence, 5'-YTC*Y (where Y is T or C), was the consensus sequence for the highest intensity UV-induced 6-4PP adduct sites; while it was 5'-YTT*C for the highest intensity UV-induced CPD damage sites. These consensus tetranucleotides are composed entirely of consecutive pyrimidines and must have a DNA conformation that is highly productive for the absorption of UV photons. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.

  12. An Active Immune Defense with a Minimal CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) RNA and without the Cas6 Protein*

    Science.gov (United States)

    Maier, Lisa-Katharina; Stachler, Aris-Edda; Saunders, Sita J.; Backofen, Rolf; Marchfelder, Anita

    2015-01-01

    The prokaryotic immune system CRISPR-Cas (clustered regularly interspaced short palindromic repeats-CRISPR-associated) is a defense system that protects prokaryotes against foreign DNA. The short CRISPR RNAs (crRNAs) are central components of this immune system. In CRISPR-Cas systems type I and III, crRNAs are generated by the endonuclease Cas6. We developed a Cas6b-independent crRNA maturation pathway for the Haloferax type I-B system in vivo that expresses a functional crRNA, which we termed independently generated crRNA (icrRNA). The icrRNA is effective in triggering degradation of an invader plasmid carrying the matching protospacer sequence. The Cas6b-independent maturation of the icrRNA allowed mutation of the repeat sequence without interfering with signals important for Cas6b processing. We generated 23 variants of the icrRNA and analyzed them for activity in the interference reaction. icrRNAs with deletions or mutations of the 3′ handle are still active in triggering an interference reaction. The complete 3′ handle could be removed without loss of activity. However, manipulations of the 5′ handle mostly led to loss of interference activity. Furthermore, we could show that in the presence of an icrRNA a strain without Cas6b (Δcas6b) is still active in interference. PMID:25512373

  13. An active immune defense with a minimal CRISPR (clustered regularly interspaced short palindromic repeats) RNA and without the Cas6 protein.

    Science.gov (United States)

    Maier, Lisa-Katharina; Stachler, Aris-Edda; Saunders, Sita J; Backofen, Rolf; Marchfelder, Anita

    2015-02-13

    The prokaryotic immune system CRISPR-Cas (clustered regularly interspaced short palindromic repeats-CRISPR-associated) is a defense system that protects prokaryotes against foreign DNA. The short CRISPR RNAs (crRNAs) are central components of this immune system. In CRISPR-Cas systems type I and III, crRNAs are generated by the endonuclease Cas6. We developed a Cas6b-independent crRNA maturation pathway for the Haloferax type I-B system in vivo that expresses a functional crRNA, which we termed independently generated crRNA (icrRNA). The icrRNA is effective in triggering degradation of an invader plasmid carrying the matching protospacer sequence. The Cas6b-independent maturation of the icrRNA allowed mutation of the repeat sequence without interfering with signals important for Cas6b processing. We generated 23 variants of the icrRNA and analyzed them for activity in the interference reaction. icrRNAs with deletions or mutations of the 3' handle are still active in triggering an interference reaction. The complete 3' handle could be removed without loss of activity. However, manipulations of the 5' handle mostly led to loss of interference activity. Furthermore, we could show that in the presence of an icrRNA a strain without Cas6b (Δcas6b) is still active in interference. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  14. General method of preparation of uniformly 13C, 15N-labeled DNA fragments for NMR analysis of DNA structures

    International Nuclear Information System (INIS)

    Rene, Brigitte; Masliah, Gregoire; Zargarian, Loussine; Mauffret, Olivier; Fermandjian, Serge

    2006-01-01

    Summary 13 C, 15 N labeling of biomolecules allows easier assignments of NMR resonances and provides a larger number of NMR parameters, which greatly improves the quality of DNA structures. However, there is no general DNA-labeling procedure, like those employed for proteins and RNAs. Here, we describe a general and widely applicable approach designed for preparation of isotopically labeled DNA fragments that can be used for NMR studies. The procedure is based on the PCR amplification of oligonucleotides in the presence of labeled deoxynucleotides triphosphates. It allows great flexibility thanks to insertion of a short DNA sequence (linker) between two repeats of DNA sequence to study. Size and sequence of the linker are designed as to create restriction sites at the junctions with DNA of interest. DNA duplex with desired sequence and size is released upon enzymatic digestion of the PCR product. The suitability of the procedure is validated through the preparation of two biological relevant DNA fragments

  15. simple sequence repeat (SSR)

    African Journals Online (AJOL)

    In the present study, 78 mapped simple sequence repeat (SSR) markers representing 11 linkage groups of adzuki bean were evaluated for transferability to mungbean and related Vigna spp. 41 markers amplified characteristic bands in at least one Vigna species. The transferability percentage across the genotypes ranged ...

  16. Management of High-Throughput DNA Sequencing Projects: Alpheus.

    Science.gov (United States)

    Miller, Neil A; Kingsmore, Stephen F; Farmer, Andrew; Langley, Raymond J; Mudge, Joann; Crow, John A; Gonzalez, Alvaro J; Schilkey, Faye D; Kim, Ryan J; van Velkinburgh, Jennifer; May, Gregory D; Black, C Forrest; Myers, M Kathy; Utsey, John P; Frost, Nicholas S; Sugarbaker, David J; Bueno, Raphael; Gullans, Stephen R; Baxter, Susan M; Day, Steve W; Retzel, Ernest F

    2008-12-26

    High-throughput DNA sequencing has enabled systems biology to begin to address areas in health, agricultural and basic biological research. Concomitant with the opportunities is an absolute necessity to manage significant volumes of high-dimensional and inter-related data and analysis. Alpheus is an analysis pipeline, database and visualization software for use with massively parallel DNA sequencing technologies that feature multi-gigabase throughput characterized by relatively short reads, such as Illumina-Solexa (sequencing-by-synthesis), Roche-454 (pyrosequencing) and Applied Biosystem's SOLiD (sequencing-by-ligation). Alpheus enables alignment to reference sequence(s), detection of variants and enumeration of sequence abundance, including expression levels in transcriptome sequence. Alpheus is able to detect several types of variants, including non-synonymous and synonymous single nucleotide polymorphisms (SNPs), insertions/deletions (indels), premature stop codons, and splice isoforms. Variant detection is aided by the ability to filter variant calls based on consistency, expected allele frequency, sequence quality, coverage, and variant type in order to minimize false positives while maximizing the identification of true positives. Alpheus also enables comparisons of genes with variants between cases and controls or bulk segregant pools. Sequence-based differential expression comparisons can be developed, with data export to SAS JMP Genomics for statistical analysis.

  17. Cloning Should Be Simple: Escherichia coli DH5α-Mediated Assembly of Multiple DNA Fragments with Short End Homologies

    Science.gov (United States)

    Richardson, Ruth E.; Suzuki, Yo

    2015-01-01

    Numerous DNA assembly technologies exist for generating plasmids for biological studies. Many procedures require complex in vitro or in vivo assembly reactions followed by plasmid propagation in recombination-impaired Escherichia coli strains such as DH5α, which are optimal for stable amplification of the DNA materials. Here we show that despite its utility as a cloning strain, DH5α retains sufficient recombinase activity to assemble up to six double-stranded DNA fragments ranging in size from 150 bp to at least 7 kb into plasmids in vivo. This process also requires surprisingly small amounts of DNA, potentially obviating the need for upstream assembly processes associated with most common applications of DNA assembly. We demonstrate the application of this process in cloning of various DNA fragments including synthetic genes, preparation of knockout constructs, and incorporation of guide RNA sequences in constructs for clustered regularly interspaced short palindromic repeats (CRISPR) genome editing. This consolidated process for assembly and amplification in a widely available strain of E. coli may enable productivity gain across disciplines involving recombinant DNA work. PMID:26348330

  18. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

    Directory of Open Access Journals (Sweden)

    Md. Rezaul Karim

    2012-03-01

    Full Text Available Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.

  19. Resolution of a serum sample mix-up through the use of short tandem repeat DNA typing.

    Science.gov (United States)

    Allen, Robert W; Pritchard, Jane K

    2004-12-01

    A sample mix-up occurred in a tissue procurement laboratory in which aliquots of serum from two tissue donors were accidentally mislabeled. The clues to the apparent mixup involved discrepant Hepatitis C test results. In an attempt to resolve the apparent mix up, DNA typing was performed using serum samples as a possible source of genomic DNA. Two hundred microliter aliquots of two reference sera and aliquots prepared from them were subjected to DNA extraction. PCR amplification of 9 STR loci was performed on the extracts and amplicons were analyzed by capillary electrophoresis. About 1 microg/ml of DNA was recovered from all serum samples and was of sufficient quality to direct the amplification of most, if not all STR loci allowing the mislabeled specimens to be traced to the proper tissue donor. Serum is a useful source of genomic DNA for STR analysis in situations in which such samples are the only source of DNA for testing. Interestingly, one of the tissue donors on life support and repeatedly receiving blood products, exhibited a mixed DNA profile indicative of the presence of DNA from multiple individuals in the bloodstream.

  20. Cytogenetic Diversity of Simple Sequences Repeats in Morphotypes of Brassica rapa ssp. chinensis.

    Science.gov (United States)

    Zheng, Jin-Shuang; Sun, Cheng-Zhen; Zhang, Shu-Ning; Hou, Xi-Lin; Bonnema, Guusje

    2016-01-01

    A significant fraction of the nuclear DNA of all eukaryotes is comprised of simple sequence repeats (SSRs). Although these sequences are widely used for studying genetic variation, linkage mapping and evolution, little attention had been paid to the chromosomal distribution and cytogenetic diversity of these sequences. In this paper, we report the distribution characterization of mono-, di-, and tri-nucleotide SSRs in Brassica rapa ssp. chinensis. Fluorescence in situ hybridization was used to characterize the cytogenetic diversity of SSRs among morphotypes of B. rapa ssp. chinensis. The proportion of different SSR motifs varied among morphotypes of B. rapa ssp. chinensis, with tri-nucleotide SSRs being more prevalent in the genome of B. rapa ssp. chinensis. We determined the chromosomal locations of mono-, di-, and tri-nucleotide repeat loci. The results showed that the chromosomal distribution of SSRs in the different morphotypes is non-random and motif-dependent, and allowed us to characterize the relative variability in terms of SSR numbers and similar chromosomal distributions in centromeric/peri-centromeric heterochromatin. The differences between SSR repeats with respect to abundance and distribution indicate that SSRs are a driving force in the genomic evolution of B. rapa species. Our results provide a comprehensive view of the SSR sequence distribution and evolution for comparison among morphotypes B. rapa ssp. chinensis.

  1. Sequence-specific activation of the DNA sensor cGAS by Y-form DNA structures as found in primary HIV-1 cDNA.

    Science.gov (United States)

    Herzner, Anna-Maria; Hagmann, Cristina Amparo; Goldeck, Marion; Wolter, Steven; Kübler, Kirsten; Wittmann, Sabine; Gramberg, Thomas; Andreeva, Liudmila; Hopfner, Karl-Peter; Mertens, Christina; Zillinger, Thomas; Jin, Tengchuan; Xiao, Tsan Sam; Bartok, Eva; Coch, Christoph; Ackermann, Damian; Hornung, Veit; Ludwig, Janos; Barchet, Winfried; Hartmann, Gunther; Schlee, Martin

    2015-10-01

    Cytosolic DNA that emerges during infection with a retrovirus or DNA virus triggers antiviral type I interferon responses. So far, only double-stranded DNA (dsDNA) over 40 base pairs (bp) in length has been considered immunostimulatory. Here we found that unpaired DNA nucleotides flanking short base-paired DNA stretches, as in stem-loop structures of single-stranded DNA (ssDNA) derived from human immunodeficiency virus type 1 (HIV-1), activated the type I interferon-inducing DNA sensor cGAS in a sequence-dependent manner. DNA structures containing unpaired guanosines flanking short (12- to 20-bp) dsDNA (Y-form DNA) were highly stimulatory and specifically enhanced the enzymatic activity of cGAS. Furthermore, we found that primary HIV-1 reverse transcripts represented the predominant viral cytosolic DNA species during early infection of macrophages and that these ssDNAs were highly immunostimulatory. Collectively, our study identifies unpaired guanosines in Y-form DNA as a highly active, minimal cGAS recognition motif that enables detection of HIV-1 ssDNA.

  2. High-temperature protein G is essential for activity of the Escherichia coli clustered regularly interspaced short palindromic repeats (CRISPR)/Cas system.

    Science.gov (United States)

    Yosef, Ido; Goren, Moran G; Kiro, Ruth; Edgar, Rotem; Qimron, Udi

    2011-12-13

    Prokaryotic DNA arrays arranged as clustered regularly interspaced short palindromic repeats (CRISPR), along with their associated proteins, provide prokaryotes with adaptive immunity by RNA-mediated targeting of alien DNA or RNA matching the sequences between the repeats. Here, we present a thorough screening system for the identification of bacterial proteins participating in immunity conferred by the Escherichia coli CRISPR system. We describe the identification of one such protein, high-temperature protein G (HtpG), a homolog of the eukaryotic chaperone heat-shock protein 90. We demonstrate that in the absence of htpG, the E. coli CRISPR system loses its suicidal activity against λ prophage and its ability to provide immunity from lysogenization. Transcomplementation of htpG restores CRISPR activity. We further show that inactivity of the CRISPR system attributable to htpG deficiency can be suppressed by expression of Cas3, a protein that is essential for its activity. Accordingly, we also find that the steady-state level of overexpressed Cas3 is significantly enhanced following HtpG expression. We conclude that HtpG is a newly identified positive modulator of the CRISPR system that is essential for maintaining functional levels of Cas3.

  3. GREAM: A Web Server to Short-List Potentially Important Genomic Repeat Elements Based on Over-/Under-Representation in Specific Chromosomal Locations, Such as the Gene Neighborhoods, within or across 17 Mammalian Species.

    Directory of Open Access Journals (Sweden)

    Darshan Shimoga Chandrashekar

    Full Text Available Genome-wide repeat sequences, such as LINEs, SINEs and LTRs share a considerable part of the mammalian nuclear genomes. These repeat elements seem to be important for multiple functions including the regulation of transcription initiation, alternative splicing and DNA methylation. But it is not possible to study all repeats and, hence, it would help to short-list before exploring their potential functional significance via experimental studies and/or detailed in silico analyses.We developed the 'Genomic Repeat Element Analyzer for Mammals' (GREAM for analysis, screening and selection of potentially important mammalian genomic repeats. This web-server offers many novel utilities. For example, this is the only tool that can reveal a categorized list of specific types of transposons, retro-transposons and other genome-wide repetitive elements that are statistically over-/under-represented in regions around a set of genes, such as those expressed differentially in a disease condition. The output displays the position and frequency of identified elements within the specified regions. In addition, GREAM offers two other types of analyses of genomic repeat sequences: a enrichment within chromosomal region(s of interest, and b comparative distribution across the neighborhood of orthologous genes. GREAM successfully short-listed a repeat element (MER20 known to contain functional motifs. In other case studies, we could use GREAM to short-list repetitive elements in the azoospermia factor a (AZFa region of the human Y chromosome and those around the genes associated with rat liver injury. GREAM could also identify five over-represented repeats around some of the human and mouse transcription factor coding genes that had conserved expression patterns across the two species.GREAM has been developed to provide an impetus to research on the role of repetitive sequences in mammalian genomes by offering easy selection of more interesting repeats in various

  4. Local repeat sequence organization of an intergenic spacer

    Indian Academy of Sciences (India)

    The amplification yielded the same uniquely ``sequence-scrambled” product, whether the template used for PCR was total cellular DNA, chloroplast DNA or a plasmid clone DNA corresponding to that region. The PCR product, a ``unique” new sequence, had lost the repetitive organization of the template genome where it ...

  5. Base excision repair of chemotherapeutically-induced alkylated DNA damage predominantly causes contractions of expanded GAA repeats associated with Friedreich's ataxia.

    Directory of Open Access Journals (Sweden)

    Yanhao Lai

    Full Text Available Expansion of GAA·TTC repeats within the first intron of the frataxin gene is the cause of Friedreich's ataxia (FRDA, an autosomal recessive neurodegenerative disorder. However, no effective treatment for the disease has been developed as yet. In this study, we explored a possibility of shortening expanded GAA repeats associated with FRDA through chemotherapeutically-induced DNA base lesions and subsequent base excision repair (BER. We provide the first evidence that alkylated DNA damage induced by temozolomide, a chemotherapeutic DNA damaging agent can induce massive GAA repeat contractions/deletions, but only limited expansions in FRDA patient lymphoblasts. We showed that temozolomide-induced GAA repeat instability was mediated by BER. Further characterization of BER of an abasic site in the context of (GAA20 repeats indicates that the lesion mainly resulted in a large deletion of 8 repeats along with small expansions. This was because temozolomide-induced single-stranded breaks initially led to DNA slippage and the formation of a small GAA repeat loop in the upstream region of the damaged strand and a small TTC loop on the template strand. This allowed limited pol β DNA synthesis and the formation of a short 5'-GAA repeat flap that was cleaved by FEN1, thereby leading to small repeat expansions. At a later stage of BER, the small template loop expanded into a large template loop that resulted in the formation of a long 5'-GAA repeat flap. Pol β then performed limited DNA synthesis to bypass the loop, and FEN1 removed the long repeat flap ultimately causing a large repeat deletion. Our study indicates that chemotherapeutically-induced alkylated DNA damage can induce large contractions/deletions of expanded GAA repeats through BER in FRDA patient cells. This further suggests the potential of developing chemotherapeutic alkylating agents to shorten expanded GAA repeats for treatment of FRDA.

  6. Genotyping and Molecular Identification of Date Palm Cultivars Using Inter-Simple Sequence Repeat (ISSR) Markers.

    Science.gov (United States)

    Ayesh, Basim M

    2017-01-01

    Molecular markers are credible for the discrimination of genotypes and estimation of the extent of genetic diversity and relatedness in a set of genotypes. Inter-simple sequence repeat (ISSR) markers rapidly reveal high polymorphic fingerprints and have been used frequently to determine the genetic diversity among date palm cultivars. This chapter describes the application of ISSR markers for genotyping of date palm cultivars. The application involves extraction of genomic DNA from the target cultivars with reliable quality and quantity. Subsequently the extracted DNA serves as a template for amplification of genomic regions flanked by inverted simple sequence repeats using a single primer. The similarity of each pair of samples is measured by calculating the number of mono- and polymorphic bands revealed by gel electrophoresis. Matrices constructed for similarity and genetic distance are used to build a phylogenetic tree and cluster analysis, to determine the molecular relatedness of cultivars. The protocol describes 3 out of 9 tested primers consistently amplified 31 loci in 6 date palm cultivars, with 28 polymorphic loci.

  7. Mutations in Cytosine-5 tRNA Methyltransferases Impact Mobile Element Expression and Genome Stability at Specific DNA Repeats

    Directory of Open Access Journals (Sweden)

    Bianca Genenncher

    2018-02-01

    Full Text Available The maintenance of eukaryotic genome stability is ensured by the interplay of transcriptional as well as post-transcriptional mechanisms that control recombination of repeat regions and the expression and mobility of transposable elements. We report here that mutations in two (cytosine-5 RNA methyltransferases, Dnmt2 and NSun2, impact the accumulation of mobile element-derived sequences and DNA repeat integrity in Drosophila. Loss of Dnmt2 function caused moderate effects under standard conditions, while heat shock exacerbated these effects. In contrast, NSun2 function affected mobile element expression and genome integrity in a heat shock-independent fashion. Reduced tRNA stability in both RCMT mutants indicated that tRNA-dependent processes affected mobile element expression and DNA repeat stability. Importantly, further experiments indicated that complex formation with RNA could also contribute to the impact of RCMT function on gene expression control. These results thus uncover a link between tRNA modification enzymes, the expression of repeat DNA, and genomic integrity.

  8. Expressed Sequence Tag-Simple Sequence Repeat (EST-SSR Marker Resources for Diversity Analysis of Mango (Mangifera indica L.

    Directory of Open Access Journals (Sweden)

    Natalie L. Dillon

    2014-01-01

    Full Text Available In this study, a collection of 24,840 expressed sequence tags (ESTs generated from five mango (Mangifera indica L. cDNA libraries was mined for EST-based simple sequence repeat (SSR markers. Over 1,000 ESTs with SSR motifs were detected from more than 24,000 EST sequences with di- and tri-nucleotide repeat motifs the most abundant. Of these, 25 EST-SSRs in genes involved in plant development, stress response, and fruit color and flavor development pathways were selected, developed into PCR markers and characterized in a population of 32 mango selections including M. indica varieties, and related Mangifera species. Twenty-four of the 25 EST-SSR markers exhibited polymorphisms, identifying a total of 86 alleles with an average of 5.38 alleles per locus, and distinguished between all Mangifera selections. Private alleles were identified for Mangifera species. These newly developed EST-SSR markers enhance the current 11 SSR mango genetic identity panel utilized by the Australian Mango Breeding Program. The current panel has been used to identify progeny and parents for selection and the application of this extended panel will further improve and help to design mango hybridization strategies for increased breeding efficiency.

  9. Differential effects of simple repeating DNA sequences on gene expression from the SV40 early promoter.

    Science.gov (United States)

    Amirhaeri, S; Wohlrab, F; Wells, R D

    1995-02-17

    The influence of simple repeat sequences, cloned into different positions relative to the SV40 early promoter/enhancer, on the transient expression of the chloramphenicol acetyltransferase (CAT) gene was investigated. Insertion of (G)29.(C)29 in either orientation into the 5'-untranslated region of the CAT gene reduced expression in CV-1 cells 50-100 fold when compared with controls with random sequence inserts. Analysis of CAT-specific mRNA levels demonstrated that the effect was due to a reduction of CAT mRNA production rather than to posttranscriptional events. In contrast, insertion of the same insert in either orientation upstream of the promoter-enhancer or downstream of the gene stimulated gene expression 2-3-fold. These effects could be reversed by cotransfection of a competitor plasmid carrying (G)25.(C)25 sequences. The results suggest that a G.C-binding transcription factor modulates gene expression in this system and that promoter strength can be regulated by providing protein-binding sites in trans. Although constructs containing longer tracts of alternating (C-G), (T-G), or (A-T) sequences inhibited CAT expression when inserted in the 5'-untranslated region of the CAT gene, the amount of CAT mRNA was unaffected. Hence, these inhibitions must be due to posttranscriptional events, presumably at the level of translation. These effects of microsatellite sequences on gene expression are discussed with respect to recent data on related simple repeat sequences which cause several human genetic diseases.

  10. Inverted repeats in the promoter as an autoregulatory sequence for TcrX in Mycobacterium tuberculosis

    International Nuclear Information System (INIS)

    Bhattacharya, Monolekha; Das, Amit Kumar

    2011-01-01

    Highlights: ► The regulatory sequences recognized by TcrX have been identified. ► The regulatory region comprises of inverted repeats segregated by 30 bp region. ► The mode of binding of TcrX with regulatory sequence is unique. ► In silico TcrX–DNA docked model binds one of the inverted repeats. ► Both phosphorylated and unphosphorylated TcrX binds regulatory sequence in vitro. -- Abstract: TcrY, a histidine kinase, and TcrX, a response regulator, constitute a two-component system in Mycobacterium tuberculosis. tcrX, which is expressed during iron scarcity, is instrumental in the survival of iron-dependent M. tuberculosis. However, the regulator of tcrX/Y has not been fully characterized. Crosslinking studies of TcrX reveal that it can form oligomers in vitro. Electrophoretic mobility shift assays (EMSAs) show that TcrX recognizes two regions in the promoter that are comprised of inverted repeats separated by ∼30 bp. The dimeric in silico model of TcrX predicts binding to one of these inverted repeat regions. Site-directed mutagenesis and radioactive phosphorylation indicate that D54 of TcrX is phosphorylated by H256 of TcrY. However, phosphorylated and unphosphorylated TcrX bind the regulatory sequence with equal efficiency, which was shown with an EMSA using the D54A TcrX mutant.

  11. A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes

    Directory of Open Access Journals (Sweden)

    Glass John I

    2010-07-01

    Full Text Available Abstract Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT. Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the

  12. Lactobacillus buchneri genotyping on the basis of clustered regularly interspaced short palindromic repeat (CRISPR) locus diversity.

    Science.gov (United States)

    Briner, Alexandra E; Barrangou, Rodolphe

    2014-02-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) in combination with associated sequences (cas) constitute the CRISPR-Cas immune system, which uptakes DNA from invasive genetic elements as novel "spacers" that provide a genetic record of immunization events. We investigated the potential of CRISPR-based genotyping of Lactobacillus buchneri, a species relevant for commercial silage, bioethanol, and vegetable fermentations. Upon investigating the occurrence and diversity of CRISPR-Cas systems in Lactobacillus buchneri genomes, we observed a ubiquitous occurrence of CRISPR arrays containing a 36-nucleotide (nt) type II-A CRISPR locus adjacent to four cas genes, including the universal cas1 and cas2 genes and the type II signature gene cas9. Comparative analysis of CRISPR spacer content in 26 L. buchneri pickle fermentation isolates associated with spoilage revealed 10 unique locus genotypes that contained between 9 and 29 variable spacers. We observed a set of conserved spacers at the ancestral end, reflecting a common origin, as well as leader-end polymorphisms, reflecting recent divergence. Some of these spacers showed perfect identity with phage sequences, and many spacers showed homology to Lactobacillus plasmid sequences. Following a comparative analysis of sequences immediately flanking protospacers that matched CRISPR spacers, we identified a novel putative protospacer-adjacent motif (PAM), 5'-AAAA-3'. Overall, these findings suggest that type II-A CRISPR-Cas systems are valuable for genotyping of L. buchneri.

  13. Biosensors for DNA sequence detection

    Science.gov (United States)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  14. Two tandemly repeated telomere-associated sequences in Nicotiana plumbaginifolia.

    Science.gov (United States)

    Chen, C M; Wang, C T; Wang, C J; Ho, C H; Kao, Y Y; Chen, C C

    1997-12-01

    Two tandemly repeated telomere-associated sequences, NP3R and NP4R, have been isolated from Nicotiana plumbaginifolia. The length of a repeating unit for NP3R and NP4R is 165 and 180 nucleotides respectively. The abundance of NP3R, NP4R and telomeric repeats is, respectively, 8.4 x 10(4), 6 x 10(3) and 1.5 x 10(6) copies per haploid genome of N. plumbaginifolia. Fluorescence in situ hybridization revealed that NP3R is located at the ends and/or in interstitial regions of all 10 chromosomes and NP4R on the terminal regions of three chromosomes in the haploid genome of N. plumbaginifolia. Sequence homology search revealed that not only are NP3R and NP4R homologous to HRS60 and GRS, respectively, two tandem repeats isolated from N. tabacum, but that NP3R and NP4R are also related to each other, suggesting that they originated from a common ancestral sequence. The role of these repeated sequences in chromosome healing is discussed based on the observation that two to three copies of a telomere-similar sequence were present in each repeating unit of NP3R and NP4R.

  15. "First generation" automated DNA sequencing technology.

    Science.gov (United States)

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  16. [Clustered regularly interspaced short palindromic repeats (CRISPR) site in Bacillus anthracis].

    Science.gov (United States)

    Gao, Zhiqi; Wang, Dongshu; Feng, Erling; Wang, Bingxiang; Hui, Yiming; Han, Shaobo; Jiao, Lei; Liu, Xiankai; Wang, Hengliang

    2014-11-04

    To investigate the polymorphism of clustered regularly interspaced short palindromic repeats (CRISPR) in Bacillu santhracis and the application to molecular typing based on the polymorphism of CRISPR in B. anthracis. We downloaded the whole genome sequence of 6 B. anthracis strains and extracted the CRISPR sites. We designed the primers of CRISPR sites and amplified the CRISPR fragments in 193 B. anthracis strains by PCR and sequenced these fragments. In order to reveal the polymorphism of CRISPR in B. anthracis, wealigned all the extracted sequences and sequenced results by local blasting. At the same time, we also analyzed the CRISPR sites in B. cereus and B. thuringiensis. We did not find any polymorphism of CRISPR in B. anthracis. The molecular typing approach based on CRISPR polymorphism is not suitable for B. anthracis, but it is possible for us to distinguish B. anthracis from B. cereus and B. thuringiensis.

  17. Identification of genes in anonymous DNA sequences. Annual performance report, February 1, 1991--January 31, 1992

    Energy Technology Data Exchange (ETDEWEB)

    Fields, C.A.

    1996-06-01

    The objective of this project is the development of practical software to automate the identification of genes in anonymous DNA sequences from the human, and other higher eukaryotic genomes. A software system for automated sequence analysis, gm (gene modeler) has been designed, implemented, tested, and distributed to several dozen laboratories worldwide. A significantly faster, more robust, and more flexible version of this software, gm 2.0 has now been completed, and is being tested by operational use to analyze human cosmid sequence data. A range of efforts to further understand the features of eukaryoyic gene sequences are also underway. This progress report also contains papers coming out of the project including the following: gm: a Tool for Exploratory Analysis of DNA Sequence Data; The Human THE-LTR(O) and MstII Interspersed Repeats are subfamilies of a single widely distruted highly variable repeat family; Information contents and dinucleotide compostions of plant intron sequences vary with evolutionary origin; Splicing signals in Drosophila: intron size, information content, and consensus sequences; Integration of automated sequence analysis into mapping and sequencing projects; Software for the C. elegans genome project.

  18. Intermittency as a universal characteristic of the complete chromosome DNA sequences of eukaryotes: From protozoa to human genomes

    Science.gov (United States)

    Rybalko, S.; Larionov, S.; Poptsova, M.; Loskutov, A.

    2011-10-01

    Large-scale dynamical properties of complete chromosome DNA sequences of eukaryotes are considered. Using the proposed deterministic models with intermittency and symbolic dynamics we describe a wide spectrum of large-scale patterns inherent in these sequences, such as segmental duplications, tandem repeats, and other complex sequence structures. It is shown that the recently discovered gene number balance on the strands is not of a random nature, and certain subsystems of a complete chromosome DNA sequence exhibit the properties of deterministic chaos.

  19. Sequence-specific RNA Photocleavage by Single-stranded DNA in Presence of Riboflavin

    Science.gov (United States)

    Zhao, Yongyun; Chen, Gangyi; Yuan, Yi; Li, Na; Dong, Juan; Huang, Xin; Cui, Xin; Tang, Zhuo

    2015-10-01

    Constant efforts have been made to develop new method to realize sequence-specific RNA degradation, which could cause inhibition of the expression of targeted gene. Herein, by using an unmodified short DNA oligonucleotide for sequence recognition and endogenic small molecue, vitamin B2 (riboflavin) as photosensitizer, we report a simple strategy to realize the sequence-specific photocleavage of targeted RNA. The DNA strand is complimentary to the target sequence to form DNA/RNA duplex containing a G•U wobble in the middle. The cleavage reaction goes through oxidative elimination mechanism at the nucleoside downstream of U of the G•U wobble in duplex to obtain unnatural RNA terminal, and the whole process is under tight control by using light as switch, which means the cleavage could be carried out according to specific spatial and temporal requirements. The biocompatibility of this method makes the DNA strand in combination with riboflavin a promising molecular tool for RNA manipulation.

  20. Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions

    Science.gov (United States)

    Gardner, Shea N; Mariella, Jr., Raymond P; Christian, Allen T; Young, Jennifer A; Clague, David S

    2013-06-25

    A method of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths.

  1. A novel family of sequence-specific endoribonucleases associated with the clustered regularly interspaced short palindromic repeats.

    Science.gov (United States)

    Beloglazova, Natalia; Brown, Greg; Zimmerman, Matthew D; Proudfoot, Michael; Makarova, Kira S; Kudritska, Marina; Kochinyan, Samvel; Wang, Shuren; Chruszcz, Maksymilian; Minor, Wladek; Koonin, Eugene V; Edwards, Aled M; Savchenko, Alexei; Yakunin, Alexander F

    2008-07-18

    Clustered regularly interspaced short palindromic repeats (CRISPRs) together with the associated CAS proteins protect microbial cells from invasion by foreign genetic elements using presently unknown molecular mechanisms. All CRISPR systems contain proteins of the CAS2 family, suggesting that these uncharacterized proteins play a central role in this process. Here we show that the CAS2 proteins represent a novel family of endoribonucleases. Six purified CAS2 proteins from diverse organisms cleaved single-stranded RNAs preferentially within U-rich regions. A representative CAS2 enzyme, SSO1404 from Sulfolobus solfataricus, cleaved the phosphodiester linkage on the 3'-side and generated 5'-phosphate- and 3'-hydroxyl-terminated oligonucleotides. The crystal structure of SSO1404 was solved at 1.6A resolution revealing the first ribonuclease with a ferredoxin-like fold. Mutagenesis of SSO1404 identified six residues (Tyr-9, Asp-10, Arg-17, Arg-19, Arg-31, and Phe-37) that are important for enzymatic activity and suggested that Asp-10 might be the principal catalytic residue. Thus, CAS2 proteins are sequence-specific endoribonucleases, and we propose that their role in the CRISPR-mediated anti-phage defense might involve degradation of phage or cellular mRNAs.

  2. cDNA sequence quality data - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Budding yeast cDNA sequencing project cDNA sequence quality data Data detail Data name cDNA sequence quality... data DOI 10.18908/lsdba.nbdc00838-003 Description of data contents Phred's quality score. P...tion Download License Update History of This Database Site Policy | Contact Us cDNA sequence quality

  3. Using TESS to predict transcription factor binding sites in DNA sequence.

    Science.gov (United States)

    Schug, Jonathan

    2008-03-01

    This unit describes how to use the Transcription Element Search System (TESS). This Web site predicts transcription factor binding sites (TFBS) in DNA sequence using two different kinds of models of sites, strings and positional weight matrices. The binding of transcription factors to DNA is a major part of the control of gene expression. Transcription factors exhibit sequence-specific binding; they form stronger bonds to some DNA sequences than to others. Identification of a good binding site in the promoter for a gene suggests the possibility that the corresponding factor may play a role in the regulation of that gene. However, the sequences transcription factors recognize are typically short and allow for some amount of mismatch. Because of this, binding sites for a factor can typically be found at random every few hundred to a thousand base pairs. TESS has features to help sort through and evaluate the significance of predicted sites.

  4. DNA Polymerases Drive DNA Sequencing-by-Synthesis Technologies: Both Past and Present

    Directory of Open Access Journals (Sweden)

    Cheng-Yao eChen

    2014-06-01

    Full Text Available Next-generation sequencing (NGS technologies have revolutionized modern biological and biomedical research. The engines responsible for this innovation are DNA polymerases; they catalyze the biochemical reaction for deriving template sequence information. In fact, DNA polymerase has been a cornerstone of DNA sequencing from the very beginning. E. coli DNA polymerase I proteolytic (Klenow fragment was originally utilized in Sanger's dideoxy chain terminating DNA sequencing chemistry. From these humble beginnings followed an explosion of organism-specific, genome sequence information accessible via public database. Family A/B DNA polymerases from mesophilic/thermophilic bacteria/archaea were modified and tested in today's standard capillary electrophoresis (CE and NGS sequencing platforms. These enzymes were selected for their efficient incorporation of bulky dye-terminator and reversible dye-terminator nucleotides respectively. Third generation, real-time single molecule sequencing platform requires slightly different enzyme properties. Enterobacterial phage ⱷ29 DNA polymerase copies long stretches of DNA and possesses a unique capability to efficiently incorporate terminal phosphate-labeled nucleoside polyphosphates. Furthermore, ⱷ29 enzyme has also been utilized in emerging DNA sequencing technologies including nanopore-, and protein-transistor-based sequencing. DNA polymerase is, and will continue to be, a crucial component of sequencing technologies.

  5. Complete chloroplast genome and 45S nrDNA sequences of the medicinal plant species Glycyrrhiza glabra and Glycyrrhiza uralensis.

    Science.gov (United States)

    Kang, Sang-Ho; Lee, Jeong-Hoon; Lee, Hyun Oh; Ahn, Byoung Ohg; Won, So Youn; Sohn, Seong-Han; Kim, Jung Sun

    2017-10-06

    Glycyrrhiza uralensis and G. glabra, members of the Fabaceae, are medicinally important species that are native to Asia and Europe. Extracts from these plants are widely used as natural sweeteners because of their much greater sweetness than sucrose. In this study, the three complete chloroplast genomes and five 45S nuclear ribosomal (nr)DNA sequences of these two licorice species and an interspecific hybrid are presented. The chloroplast genomes of G. glabra, G. uralensis and G. glabra × G. uralensis were 127,895 bp, 127,716 bp and 127,939 bp, respectively. The three chloroplast genomes harbored 110 annotated genes, including 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. The 45S nrDNA sequences were either 5,947 or 5,948 bp in length. Glycyrrhiza glabra and G. glabra × G. uralensis showed two types of nrDNA, while G. uralensis contained a single type. The complete 45S nrDNA sequence unit contains 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 26S rRNA. We identified simple sequence repeat and tandem repeat sequences. We also developed four reliable markers for analysis of Glycyrrhiza diversity authentication.

  6. Genetic variation among the Mapuche Indians from the Patagonian region of Argentina: mitochondrial DNA sequence variation and allele frequencies of several nuclear genes.

    Science.gov (United States)

    Ginther, C; Corach, D; Penacino, G A; Rey, J A; Carnese, F R; Hutz, M H; Anderson, A; Just, J; Salzano, F M; King, M C

    1993-01-01

    DNA samples from 60 Mapuche Indians, representing 39 maternal lineages, were genetically characterized for (1) nucleotide sequences of the mtDNA control region; (2) presence or absence of a nine base duplication in mtDNA region V; (3) HLA loci DRB1 and DQA1; (4) variation at three nuclear genes with short tandem repeats; and (5) variation at the polymorphic marker D2S44. The genetic profile of the Mapuche population was compared to other Amerinds and to worldwide populations. Two highly polymorphic portions of the mtDNA control region, comprising 650 nucleotides, were amplified by the polymerase chain reaction (PCR) and directly sequenced. The 39 maternal lineages were defined by two or three generation families identified by the Mapuches. These 39 lineages included 19 different mtDNA sequences that could be grouped into four classes. The same classes of sequences appear in other Amerinds from North, Central, and South American populations separated by thousands of miles, suggesting that the origin of the mtDNA patterns predates the migration to the Americas. The mtDNA sequence similarity between Amerind populations suggests that the migration throughout the Americas occurred rapidly relative to the mtDNA mutation rate. HLA DRB1 alleles 1602 and 1402 were frequent among the Mapuches. These alleles also occur at high frequency among other Amerinds in North and South America, but not among Spanish, Chinese or African-American populations. The high frequency of these alleles throughout the Americas, and their specificity to the Americas, supports the hypothesis that Mapuches and other Amerind groups are closely related.(ABSTRACT TRUNCATED AT 250 WORDS)

  7. Simple sequence repeat markers useful for sorghum downy mildew (Peronosclerospora sorghi and related species

    Directory of Open Access Journals (Sweden)

    Odvody Gary N

    2008-11-01

    Full Text Available Abstract Background A recent outbreak of sorghum downy mildew in Texas has led to the discovery of both metalaxyl resistance and a new pathotype in the causal organism, Peronosclerospora sorghi. These observations and the difficulty in resolving among phylogenetically related downy mildew pathogens dramatically point out the need for simply scored markers in order to differentiate among isolates and species, and to study the population structure within these obligate oomycetes. Here we present the initial results from the use of a biotin capture method to discover, clone and develop PCR primers that permit the use of simple sequence repeats (microsatellites to detect differences at the DNA level. Results Among the 55 primers pairs designed from clones from pathotype 3 of P. sorghi, 36 flanked microsatellite loci containing simple repeats, including 28 (55% with dinucleotide repeats and 6 (11% with trinucleotide repeats. A total of 22 microsatellites with CA/AC or GT/TG repeats were the most abundant (40% and GA/AG or CT/TC types contribute 15% in our collection. When used to amplify DNA from 19 isolates from P. sorghi, as well as from 5 related species that cause downy mildew on other hosts, the number of different bands detected for each SSR primer pair using a LI-COR- DNA Analyzer ranged from two to eight. Successful cross-amplification for 12 primer pairs studied in detail using DNA from downy mildews that attack maize (P. maydis & P. philippinensis, sugar cane (P. sacchari, pearl millet (Sclerospora graminicola and rose (Peronospora sparsa indicate that the flanking regions are conserved in all these species. A total of 15 SSR amplicons unique to P. philippinensis (one of the potential threats to US maize production were detected, and these have potential for development of diagnostic tests. A total of 260 alleles were obtained using 54 microsatellites primer combinations, with an average of 4.8 polymorphic markers per SSR across 34

  8. Simple sequence repeat markers useful for sorghum downy mildew (Peronosclerospora sorghi) and related species.

    Science.gov (United States)

    Perumal, Ramasamy; Nimmakayala, Padmavathi; Erattaimuthu, Saradha R; No, Eun-Gyu; Reddy, Umesh K; Prom, Louis K; Odvody, Gary N; Luster, Douglas G; Magill, Clint W

    2008-11-29

    A recent outbreak of sorghum downy mildew in Texas has led to the discovery of both metalaxyl resistance and a new pathotype in the causal organism, Peronosclerospora sorghi. These observations and the difficulty in resolving among phylogenetically related downy mildew pathogens dramatically point out the need for simply scored markers in order to differentiate among isolates and species, and to study the population structure within these obligate oomycetes. Here we present the initial results from the use of a biotin capture method to discover, clone and develop PCR primers that permit the use of simple sequence repeats (microsatellites) to detect differences at the DNA level. Among the 55 primers pairs designed from clones from pathotype 3 of P. sorghi, 36 flanked microsatellite loci containing simple repeats, including 28 (55%) with dinucleotide repeats and 6 (11%) with trinucleotide repeats. A total of 22 microsatellites with CA/AC or GT/TG repeats were the most abundant (40%) and GA/AG or CT/TC types contribute 15% in our collection. When used to amplify DNA from 19 isolates from P. sorghi, as well as from 5 related species that cause downy mildew on other hosts, the number of different bands detected for each SSR primer pair using a LI-COR- DNA Analyzer ranged from two to eight. Successful cross-amplification for 12 primer pairs studied in detail using DNA from downy mildews that attack maize (P. maydis & P. philippinensis), sugar cane (P. sacchari), pearl millet (Sclerospora graminicola) and rose (Peronospora sparsa) indicate that the flanking regions are conserved in all these species. A total of 15 SSR amplicons unique to P. philippinensis (one of the potential threats to US maize production) were detected, and these have potential for development of diagnostic tests. A total of 260 alleles were obtained using 54 microsatellites primer combinations, with an average of 4.8 polymorphic markers per SSR across 34 Peronosclerospora, Peronospora and Sclerospora

  9. MSDB: A Comprehensive Database of Simple Sequence Repeats.

    Science.gov (United States)

    Avvaru, Akshay Kumar; Saxena, Saketh; Sowpati, Divya Tej; Mishra, Rakesh Kumar

    2017-06-01

    Microsatellites, also known as Simple Sequence Repeats (SSRs), are short tandem repeats of 1-6 nt motifs present in all genomes, particularly eukaryotes. Besides their usefulness as genome markers, SSRs have been shown to perform important regulatory functions, and variations in their length at coding regions are linked to several disorders in humans. Microsatellites show a taxon-specific enrichment in eukaryotic genomes, and some may be functional. MSDB (Microsatellite Database) is a collection of >650 million SSRs from 6,893 species including Bacteria, Archaea, Fungi, Plants, and Animals. This database is by far the most exhaustive resource to access and analyze SSR data of multiple species. In addition to exploring data in a customizable tabular format, users can view and compare the data of multiple species simultaneously using our interactive plotting system. MSDB is developed using the Django framework and MySQL. It is freely available at http://tdb.ccmb.res.in/msdb. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  10. The polymorphic integumentary mucin B.1 from Xenopus laevis contains the short consensus repeat.

    Science.gov (United States)

    Probst, J C; Hauser, F; Joba, W; Hoffmann, W

    1992-03-25

    The frog integumentary mucin B.1 (FIM-B.1), discovered by molecular cloning, contains a cysteine-rich C-terminal domain which is homologous with von Willebrand factor. With the help of the polymerase chain reaction, we now characterize a contiguous region 5' to the von Willebrand factor domain containing the short consensus repeat typical of many proteins from the complement system. Multiple transcripts have been cloned, which originate from a single animal and differ by a variable number of tandem repeats (rep-33 sequences). These different transcripts probably originate solely from two genes and are generated presumably by alternative splicing of an huge array of functional cassettes. This model is supported by analysis of genomic FIM-B.1 sequences from Xenopus laevis. Here, rep-33 sequences are arranged in an interrupted array of individual units. Additionally, results of Southern analysis revealed genetic polymorphism between different animals which is predicted to be within the tandem repeats. A first investigation of the predicted mucins with the help of a specific antibody against a synthetic peptide determined the molecular mass of FIM-B.1 to greater than 200 kDa. Here again, genetic polymorphism between different animals is detected.

  11. R-loops: targets for nuclease cleavage and repeat instability.

    Science.gov (United States)

    Freudenreich, Catherine H

    2018-01-11

    R-loops form when transcribed RNA remains bound to its DNA template to form a stable RNA:DNA hybrid. Stable R-loops form when the RNA is purine-rich, and are further stabilized by DNA secondary structures on the non-template strand. Interestingly, many expandable and disease-causing repeat sequences form stable R-loops, and R-loops can contribute to repeat instability. Repeat expansions are responsible for multiple neurodegenerative diseases, including Huntington's disease, myotonic dystrophy, and several types of ataxias. Recently, it was found that R-loops at an expanded CAG/CTG repeat tract cause DNA breaks as well as repeat instability (Su and Freudenreich, Proc Natl Acad Sci USA 114, E8392-E8401, 2017). Two factors were identified as causing R-loop-dependent breaks at CAG/CTG tracts: deamination of cytosines and the MutLγ (Mlh1-Mlh3) endonuclease, defining two new mechanisms for how R-loops can generate DNA breaks (Su and Freudenreich, Proc Natl Acad Sci USA 114, E8392-E8401, 2017). Following R-loop-dependent nicking, base excision repair resulted in repeat instability. These results have implications for human repeat expansion diseases and provide a paradigm for how RNA:DNA hybrids can cause genome instability at structure-forming DNA sequences. This perspective summarizes mechanisms of R-loop-induced fragility at G-rich repeats and new links between DNA breaks and repeat instability.

  12. Repetitive DNA in the pea (Pisum sativum L. genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula

    Directory of Open Access Journals (Sweden)

    Navrátilová Alice

    2007-11-01

    Full Text Available Abstract Background Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum. Results Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. Conclusion We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35–48% of the genome. These data

  13. Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data.

    Science.gov (United States)

    Al-Nakeeb, Kosai; Petersen, Thomas Nordahl; Sicheritz-Pontén, Thomas

    2017-11-21

    Whole-genome sequencing (WGS) projects provide short read nucleotide sequences from nuclear and possibly organelle DNA depending on the source of origin. Mitochondrial DNA is present in animals and fungi, while plants contain DNA from both mitochondria and chloroplasts. Current techniques for separating organelle reads from nuclear reads in WGS data require full reference or partial seed sequences for assembling. Norgal (de Novo ORGAneLle extractor) avoids this requirement by identifying a high frequency subset of k-mers that are predominantly of mitochondrial origin and performing a de novo assembly on a subset of reads that contains these k-mers. The method was applied to WGS data from a panda, brown algae seaweed, butterfly and filamentous fungus. We were able to extract full circular mitochondrial genomes and obtained sequence identities to the reference sequences in the range from 98.5 to 99.5%. We also assembled the chloroplasts of grape vines and cucumbers using Norgal together with seed-based de novo assemblers. Norgal is a pipeline that can extract and assemble full or partial mitochondrial and chloroplast genomes from WGS short reads without prior knowledge. The program is available at: https://bitbucket.org/kosaidtu/norgal .

  14. [Clustered regularly interspaced short palindromic repeats: structure, function and application--a review].

    Science.gov (United States)

    Cui, Yujun; Li, Yanjun; Yan, Yanfeng; Yang, Ruifu

    2008-11-01

    CRISPRs (Clustered Regularly Interspaced Short Palindromic Repeats), the basis of spoligotyping technology, can provide prokaryotes with heritable adaptive immunity against phages' invasion. Studies on CRISPR loci and their associated elements, including various CAS (CRISPR-associated) proteins and leader sequences, are still in its infant period. We introduce the brief history', structure, function, bioinformatics research and application of this amazing immunity system in prokaryotic organism for inspiring more scientists to find their interest in this developing topic.

  15. Evaluation of 13 short tandem repeated loci for use in personal identification applications

    Energy Technology Data Exchange (ETDEWEB)

    Hammond, H.A.; Caskey, C.T. (Baylor College of Medicine, Houston, TX (United States)); Jin, L.; Zhong, Y.; Chakraborty, R. (Univ. of Texas Graduate School of Biomedical Sciences, Houston, TX (United States))

    1994-07-01

    Personal identification by using DNA typing methodologies has been an issue in the popular and scientific press for several years. The authors present a PCR-based DNA-typing method using 13 unlinked short tandem repeat (STR) loci. Validation of the loci and methodology has been performed to meet standards set by the forensic community and the accrediting organization for parentage testing. Extensive statistical analysis has addressed the issues surrounding the presentation of [open quotes]match[close quotes] statistics. The authors have found STR loci to provide a rapid, sensitive, and reliable method of DNA typing for parentage testing, forensic identification, and medical diagnostics. Valid statistical analysis is generally simpler than similar analysis of RFLP-VNTR results and provides powerful statistical evidence of the low frequency of random multilocus genotype matching. 54 refs., 4 figs., 6 tabs.

  16. Repeated extragenic sequences in prokaryotic genomes: a proposal for the origin and dynamics of the RUP element in Streptococcus pneumoniae.

    Science.gov (United States)

    Oggioni, M R; Claverys, J P

    1999-10-01

    A survey of all Streptococcus pneumoniae GenBank/EMBL DNA sequence entries and of the public domain sequence (representing more than 90% of the genome) of an S. pneumoniae type 4 strain allowed identification of 108 copies of a 107-bp-long highly repeated intergenic element called RUP (for repeat unit of pneumococcus). Several features of the element, revealed in this study, led to the proposal that RUP is an insertion sequence (IS)-derivative that could still be mobile. Among these features are: (1) a highly significant homology between the terminal inverted repeats (IRs) of RUPs and of IS630-Spn1, a new putative IS of S. pneumoniae; and (2) insertion at a TA dinucleotide, a characteristic target of several members of the IS630 family. Trans-mobilization of RUP is therefore proposed to be mediated by the transposase of IS630-Spn1. To account for the observation that RUPs are distributed among four subtypes which exhibit different degrees of sequence homogeneity, a scenario is invoked based on successive stages of RUP mobility and non-mobility, depending on whether an active transposase is present or absent. In the latter situation, an active transposase could be reintroduced into the species through natural transformation. Examination of sequences flanking RUP revealed a preferential association with ISs. It also provided evidence that RUPs promote sequence rearrangements, thereby contributing to genome flexibility. The possibility that RUP preferentially targets transforming DNA of foreign origin and subsequently favours disruption/rearrangement of exogenous sequences is discussed.

  17. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments.

    Science.gov (United States)

    Dabney, Jesse; Knapp, Michael; Glocke, Isabelle; Gansauge, Marie-Theres; Weihmann, Antje; Nickel, Birgit; Valdiosera, Cristina; García, Nuria; Pääbo, Svante; Arsuaga, Juan-Luis; Meyer, Matthias

    2013-09-24

    Although an inverse relationship is expected in ancient DNA samples between the number of surviving DNA fragments and their length, ancient DNA sequencing libraries are strikingly deficient in molecules shorter than 40 bp. We find that a loss of short molecules can occur during DNA extraction and present an improved silica-based extraction protocol that enables their efficient retrieval. In combination with single-stranded DNA library preparation, this method enabled us to reconstruct the mitochondrial genome sequence from a Middle Pleistocene cave bear (Ursus deningeri) bone excavated at Sima de los Huesos in the Sierra de Atapuerca, Spain. Phylogenetic reconstructions indicate that the U. deningeri sequence forms an early diverging sister lineage to all Western European Late Pleistocene cave bears. Our results prove that authentic ancient DNA can be preserved for hundreds of thousand years outside of permafrost. Moreover, the techniques presented enable the retrieval of phylogenetically informative sequences from samples in which virtually all DNA is diminished to fragments shorter than 50 bp.

  18. Complete Chloroplast Genome of Pinus massoniana (Pinaceae): Gene Rearrangements, Loss of ndh Genes, and Short Inverted Repeats Contraction, Expansion.

    Science.gov (United States)

    Ni, ZhouXian; Ye, YouJu; Bai, Tiandao; Xu, Meng; Xu, Li-An

    2017-09-11

    The chloroplast genome (CPG) of Pinus massoniana belonging to the genus Pinus (Pinaceae), which is a primary source of turpentine, was sequenced and analyzed in terms of gene rearrangements, ndh genes loss, and the contraction and expansion of short inverted repeats (IRs). P. massoniana CPG has a typical quadripartite structure that includes large single copy (LSC) (65,563 bp), small single copy (SSC) (53,230 bp) and two IRs (IRa and IRb, 485 bp). The 108 unique genes were identified, including 73 protein-coding genes, 31 tRNAs, and 4 rRNAs. Most of the 81 simple sequence repeats (SSRs) identified in CPG were mononucleotides motifs of A/T types and located in non-coding regions. Comparisons with related species revealed an inversion (21,556 bp) in the LSC region; P. massoniana CPG lacks all 11 intact ndh genes (four ndh genes lost completely; the five remained truncated as pseudogenes; and the other two ndh genes remain as pseudogenes because of short insertions or deletions). A pair of short IRs was found instead of large IRs, and size variations among pine species were observed, which resulted from short insertions or deletions and non-synchronized variations between "IRa" and "IRb". The results of phylogenetic analyses based on whole CPG sequences of 16 conifers indicated that the whole CPG sequences could be used as a powerful tool in phylogenetic analyses.

  19. Molecular cloning and sequence analysis of hamster CENP-A cDNA

    Directory of Open Access Journals (Sweden)

    Valdivia Manuel M

    2002-05-01

    Full Text Available Abstract Background The centromere is a specialized locus that mediates chromosome movement during mitosis and meiosis. This chromosomal domain comprises a uniquely packaged form of heterochromatin that acts as a nucleus for the assembly of the kinetochore a trilaminar proteinaceous structure on the surface of each chromatid at the primary constriction. Kinetochores mediate interactions with the spindle fibers of the mitotic apparatus. Centromere protein A (CENP-A is a histone H3-like protein specifically located to the inner plate of kinetochore at active centromeres. CENP-A works as a component of specialized nucleosomes at centromeres bound to arrays of repeat satellite DNA. Results We have cloned the hamster homologue of human and mouse CENP-A. The cDNA isolated was found to contain an open reading frame encoding a polypeptide consisting of 129 amino acid residues with a C-terminal histone fold domain highly homologous to those of CENP-A and H3 sequences previously released. However, significant sequence divergence was found at the N-terminal region of hamster CENP-A that is five and eleven residues shorter than those of mouse and human respectively. Further, a human serine 7 residue, a target site for Aurora B kinase phosphorylation involved in the mechanism of cytokinesis, was not found in the hamster protein. A human autoepitope at the N-terminal region of CENP-A described in autoinmune diseases is not conserved in the hamster protein. Conclusions We have cloned the hamster cDNA for the centromeric protein CENP-A. Significant differences on protein sequence were found at the N-terminal tail of hamster CENP-A in comparison with that of human and mouse. Our results show a high degree of evolutionary divergence of kinetochore CENP-A proteins in mammals. This is related to the high diverse nucleotide repeat sequences found at the centromere DNA among species and support a current centromere model for kinetochore function and structural

  20. Characterizing novel endogenous retroviruses from genetic variation inferred from short sequence reads

    DEFF Research Database (Denmark)

    Mourier, Tobias; Mollerup, Sarah; Vinner, Lasse

    2015-01-01

    From Illumina sequencing of DNA from brain and liver tissue from the lion, Panthera leo, and tumor samples from the pike-perch, Sander lucioperca, we obtained two assembled sequence contigs with similarity to known retroviruses. Phylogenetic analyses suggest that the pike-perch retrovirus belongs...... to the epsilonretroviruses, and the lion retrovirus to the gammaretroviruses. To determine if these novel retroviral sequences originate from an endogenous retrovirus or from a recently integrated exogenous retrovirus, we assessed the genetic diversity of the parental sequences from which the short Illumina reads...

  1. CORE-SINEs: eukaryotic short interspersed retroposing elements with common sequence motifs.

    Science.gov (United States)

    Gilbert, N; Labuda, D

    1999-03-16

    A 65-bp "core" sequence is dispersed in hundreds of thousands copies in the human genome. This sequence was found to constitute the central segment of a group of short interspersed elements (SINEs), referred to as mammalian-wide interspersed repeats, that proliferated before the radiation of placental mammals. Here, we propose that the core identifies an ancient tRNA-like SINE element, which survived in different lineages such as mammals, reptiles, birds, and fish, as well as mollusks, presumably for >550 million years. This element gave rise to a number of sequence families (CORE-SINEs), including mammalian-wide interspersed repeats, whose distinct 3' ends are shared with different families of long interspersed elements (LINEs). The evolutionary success of the generic CORE-SINE element can be related to the recruitment of the internal promoter from highly transcribed host RNA as well as to its capacity to adapt to changing retropositional opportunities by sequence exchange with actively amplifying LINEs. It reinforces the notion that the very existence of SINEs depends on the cohabitation with both LINEs and the host genome.

  2. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9.

    Science.gov (United States)

    Sternberg, Samuel H; Redding, Sy; Jinek, Martin; Greene, Eric C; Doudna, Jennifer A

    2014-03-06

    The clustered regularly interspaced short palindromic repeats (CRISPR)-associated enzyme Cas9 is an RNA-guided endonuclease that uses RNA-DNA base-pairing to target foreign DNA in bacteria. Cas9-guide RNA complexes are also effective genome engineering agents in animals and plants. Here we use single-molecule and bulk biochemical experiments to determine how Cas9-RNA interrogates DNA to find specific cleavage sites. We show that both binding and cleavage of DNA by Cas9-RNA require recognition of a short trinucleotide protospacer adjacent motif (PAM). Non-target DNA binding affinity scales with PAM density, and sequences fully complementary to the guide RNA but lacking a nearby PAM are ignored by Cas9-RNA. Competition assays provide evidence that DNA strand separation and RNA-DNA heteroduplex formation initiate at the PAM and proceed directionally towards the distal end of the target sequence. Furthermore, PAM interactions trigger Cas9 catalytic activity. These results reveal how Cas9 uses PAM recognition to quickly identify potential target sites while scanning large DNA molecules, and to regulate scission of double-stranded DNA.

  3. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9

    Science.gov (United States)

    Sternberg, Samuel H.; Redding, Sy; Jinek, Martin; Greene, Eric C.; Doudna, Jennifer A.

    2014-03-01

    The clustered regularly interspaced short palindromic repeats (CRISPR)-associated enzyme Cas9 is an RNA-guided endonuclease that uses RNA-DNA base-pairing to target foreign DNA in bacteria. Cas9-guide RNA complexes are also effective genome engineering agents in animals and plants. Here we use single-molecule and bulk biochemical experiments to determine how Cas9-RNA interrogates DNA to find specific cleavage sites. We show that both binding and cleavage of DNA by Cas9-RNA require recognition of a short trinucleotide protospacer adjacent motif (PAM). Non-target DNA binding affinity scales with PAM density, and sequences fully complementary to the guide RNA but lacking a nearby PAM are ignored by Cas9-RNA. Competition assays provide evidence that DNA strand separation and RNA-DNA heteroduplex formation initiate at the PAM and proceed directionally towards the distal end of the target sequence. Furthermore, PAM interactions trigger Cas9 catalytic activity. These results reveal how Cas9 uses PAM recognition to quickly identify potential target sites while scanning large DNA molecules, and to regulate scission of double-stranded DNA.

  4. A hybrid swarm population of Pinus densiflora x P. sylvestris hybrids inferred from sequence analysis of chloroplast DNA and morphological characters

    Science.gov (United States)

    To confirm a hybrid swarm population of Pinus densiflora × P. sylvestris in Jilin, China and to study whether shoot apex morphology of 4-year old seedlings can be correlated with the sequence of a chloroplast DNA simple sequence repeat marker (cpDNA SSR), needles and seeds from P. densiflora, P. syl...

  5. De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis.

    Science.gov (United States)

    Nowrousian, Minou; Stajich, Jason E; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D; Pöggeler, Stefanie; Read, Nick D; Seiler, Stephan; Smith, Kristina M; Zickler, Denise; Kück, Ulrich; Freitag, Michael

    2010-04-08

    Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for

  6. De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis.

    Directory of Open Access Journals (Sweden)

    Minou Nowrousian

    2010-04-01

    Full Text Available Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data

  7. High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA.

    Science.gov (United States)

    Chandrananda, Dineika; Thorne, Natalie P; Bahlo, Melanie

    2015-06-17

    High-throughput sequencing of cell-free DNA fragments found in human plasma has been used to non-invasively detect fetal aneuploidy, monitor organ transplants and investigate tumor DNA. However, many biological properties of this extracellular genetic material remain unknown. Research that further characterizes circulating DNA could substantially increase its diagnostic value by allowing the application of more sophisticated bioinformatics tools that lead to an improved signal to noise ratio in the sequencing data. In this study, we investigate various features of cell-free DNA in plasma using deep-sequencing data from two pregnant women (>70X, >50X) and compare them with matched cellular DNA. We utilize a descriptive approach to examine how the biological cleavage of cell-free DNA affects different sequence signatures such as fragment lengths, sequence motifs at fragment ends and the distribution of cleavage sites along the genome. We show that the size distributions of these cell-free DNA molecules are dependent on their autosomal and mitochondrial origin as well as the genomic location within chromosomes. DNA mapping to particular microsatellites and alpha repeat elements display unique size signatures. We show how cell-free fragments occur in clusters along the genome, localizing to nucleosomal arrays and are preferentially cleaved at linker regions by correlating the mapping locations of these fragments with ENCODE annotation of chromatin organization. Our work further demonstrates that cell-free autosomal DNA cleavage is sequence dependent. The region spanning up to 10 positions on either side of the DNA cleavage site show a consistent pattern of preference for specific nucleotides. This sequence motif is present in cleavage sites localized to nucleosomal cores and linker regions but is absent in nucleosome-free mitochondrial DNA. These background signals in cell-free DNA sequencing data stem from the non-random biological cleavage of these fragments. This

  8. Analysis of genetic polymorphism of nine short tandem repeat loci in ...

    African Journals Online (AJOL)

    Yomi

    2012-03-15

    Mar 15, 2012 ... Key words: short tandem repeat, repeat motif, genetic polymorphism, Han population, forensic genetics. INTRODUCTION. Short tandem repeat (STR) is widely .... Data analysis. The exact test of Hardy-Weinberg equilibrium was conducted with. Arlequin version 3.5 software (Computational and Molecular.

  9. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  10. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data.

    Science.gov (United States)

    Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  11. TRDistiller: a rapid filter for enrichment of sequence datasets with proteins containing tandem repeats.

    Science.gov (United States)

    Richard, François D; Kajava, Andrey V

    2014-06-01

    The dramatic growth of sequencing data evokes an urgent need to improve bioinformatics tools for large-scale proteome analysis. Over the last two decades, the foremost efforts of computer scientists were devoted to proteins with aperiodic sequences having globular 3D structures. However, a large portion of proteins contain periodic sequences representing arrays of repeats that are directly adjacent to each other (so called tandem repeats or TRs). These proteins frequently fold into elongated fibrous structures carrying different fundamental functions. Algorithms specific to the analysis of these regions are urgently required since the conventional approaches developed for globular domains have had limited success when applied to the TR regions. The protein TRs are frequently not perfect, containing a number of mutations, and some of them cannot be easily identified. To detect such "hidden" repeats several algorithms have been developed. However, the most sensitive among them are time-consuming and, therefore, inappropriate for large scale proteome analysis. To speed up the TR detection we developed a rapid filter that is based on the comparison of composition and order of short strings in the adjacent sequence motifs. Tests show that our filter discards up to 22.5% of proteins which are known to be without TRs while keeping almost all (99.2%) TR-containing sequences. Thus, we are able to decrease the size of the initial sequence dataset enriching it with TR-containing proteins which allows a faster subsequent TR detection by other methods. The program is available upon request. Copyright © 2014 Elsevier Inc. All rights reserved.

  12. Assembly of Repeat Content Using Next Generation Sequencing Data

    Energy Technology Data Exchange (ETDEWEB)

    labutti, Kurt; Kuo, Alan; Grigoriev, Igor; Copeland, Alex

    2014-03-17

    Repetitive organisms pose a challenge for short read assembly, and typically only unique regions and repeat regions shorter than the read length, can be accurately assembled. Recently, we have been investigating the use of Pacific Biosciences reads for de novo fungal assembly. We will present an assessment of the quality and degree of repeat reconstruction possible in a fungal genome using long read technology. We will also compare differences in assembly of repeat content using short read and long read technology.

  13. The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats

    Directory of Open Access Journals (Sweden)

    Vergnaud Gilles

    2007-05-01

    Full Text Available Abstract Background In Archeae and Bacteria, the repeated elements called CRISPRs for "clustered regularly interspaced short palindromic repeats" are believed to participate in the defence against viruses. Short sequences called spacers are stored in-between repeated elements. In the current model, motifs comprising spacers and repeats may target an invading DNA and lead to its degradation through a proposed mechanism similar to RNA interference. Analysis of intra-species polymorphism shows that new motifs (one spacer and one repeated element are added in a polarised fashion. Although their principal characteristics have been described, a lot remains to be discovered on the way CRISPRs are created and evolve. As new genome sequences become available it appears necessary to develop automated scanning tools to make available CRISPRs related information and to facilitate additional investigations. Description We have produced a program, CRISPRFinder, which identifies CRISPRs and extracts the repeated and unique sequences. Using this software, a database is constructed which is automatically updated monthly from newly released genome sequences. Additional tools were created to allow the alignment of flanking sequences in search for similarities between different loci and to build dictionaries of unique sequences. To date, almost six hundred CRISPRs have been identified in 475 published genomes. Two Archeae out of thirty-seven and about half of Bacteria do not possess a CRISPR. Fine analysis of repeated sequences strongly supports the current view that new motifs are added at one end of the CRISPR adjacent to the putative promoter. Conclusion It is hoped that availability of a public database, regularly updated and which can be queried on the web will help in further dissecting and understanding CRISPR structure and flanking sequences evolution. Subsequent analyses of the intra-species CRISPR polymorphism will be facilitated by CRISPRFinder and the

  14. Crystal structure of clustered regularly interspaced short palindromic repeats (CRISPR)-associated Csn2 protein revealed Ca2+-dependent double-stranded DNA binding activity.

    Science.gov (United States)

    Nam, Ki Hyun; Kurinov, Igor; Ke, Ailong

    2011-09-02

    Clustered regularly interspaced short palindromic repeats (CRISPR) and their associated protein genes (cas genes) are widespread in bacteria and archaea. They form a line of RNA-based immunity to eradicate invading bacteriophages and malicious plasmids. A key molecular event during this process is the acquisition of new spacers into the CRISPR loci to guide the selective degradation of the matching foreign genetic elements. Csn2 is a Nmeni subtype-specific cas gene required for new spacer acquisition. Here we characterize the Enterococcus faecalis Csn2 protein as a double-stranded (ds-) DNA-binding protein and report its 2.7 Å tetrameric ring structure. The inner circle of the Csn2 tetrameric ring is ∼26 Å wide and populated with conserved lysine residues poised for nonspecific interactions with ds-DNA. Each Csn2 protomer contains an α/β domain and an α-helical domain; significant hinge motion was observed between these two domains. Ca(2+) was located at strategic positions in the oligomerization interface. We further showed that removal of Ca(2+) ions altered the oligomerization state of Csn2, which in turn severely decreased its affinity for ds-DNA. In summary, our results provided the first insight into the function of the Csn2 protein in CRISPR adaptation by revealing that it is a ds-DNA-binding protein functioning at the quaternary structure level and regulated by Ca(2+) ions.

  15. Low-Energy Electron-Induced Strand Breaks in Telomere-Derived DNA Sequences-Influence of DNA Sequence and Topology.

    Science.gov (United States)

    Rackwitz, Jenny; Bald, Ilko

    2018-03-26

    During cancer radiation therapy high-energy radiation is used to reduce tumour tissue. The irradiation produces a shower of secondary low-energy (DNA very efficiently by dissociative electron attachment. Recently, it was suggested that low-energy electron-induced DNA strand breaks strongly depend on the specific DNA sequence with a high sensitivity of G-rich sequences. Here, we use DNA origami platforms to expose G-rich telomere sequences to low-energy (8.8 eV) electrons to determine absolute cross sections for strand breakage and to study the influence of sequence modifications and topology of telomeric DNA on the strand breakage. We find that the telomeric DNA 5'-(TTA GGG) 2 is more sensitive to low-energy electrons than an intermixed sequence 5'-(TGT GTG A) 2 confirming the unique electronic properties resulting from G-stacking. With increasing length of the oligonucleotide (i.e., going from 5'-(GGG ATT) 2 to 5'-(GGG ATT) 4 ), both the variety of topology and the electron-induced strand break cross sections increase. Addition of K + ions decreases the strand break cross section for all sequences that are able to fold G-quadruplexes or G-intermediates, whereas the strand break cross section for the intermixed sequence remains unchanged. These results indicate that telomeric DNA is rather sensitive towards low-energy electron-induced strand breakage suggesting significant telomere shortening that can also occur during cancer radiation therapy. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. A theory that may explain the Hayflick limit--a means to delete one copy of a repeating sequence during each cell cycle in certain human cells such as fibroblasts.

    Science.gov (United States)

    Naveilhan, P; Baudet, C; Jabbour, W; Wion, D

    1994-09-01

    A model that may explain the limited division potential of certain cells such as human fibroblasts in culture is presented. The central postulate of this theory is that there exists, prior to certain key exons that code for materials needed for cell division, a unique sequence of specific repeating segments of DNA. One copy of such repeating segments is deleted during each cell cycle in cells that are not protected from such deletion through methylation of their cytosine residues. According to this theory, the means through which such repeated sequences are removed, one per cycle, is through the sequential action of enzymes that act much as bacterial restriction enzymes do--namely to produce scissions in both strands of DNA in areas that correspond to the DNA base sequence recognition specificities of such enzymes. After the first scission early in a replicative cycle, that enzyme becomes inhibited, but the cleavage of the first site exposes the closest site in the repetitive element to the action of a second restriction enzyme after which that enzyme also becomes inhibited. Then repair occurs, regenerating the original first site. Through this sequential activation and inhibition of two different restriction enzymes, only one copy of the repeating sequence is deleted during each cell cycle. In effect, the repeating sequence operates as a precise counter of the numbers of cell doubling that have occurred since the cells involved differentiated during development.

  17. Analysis of DNA restriction fragments greater than 5.7 Mb in size from the centromeric region of human chromosomes.

    Science.gov (United States)

    Arn, P H; Li, X; Smith, C; Hsu, M; Schwartz, D C; Jabs, E W

    1991-01-01

    Pulsed electrophoresis was used to study the organization of the human centromeric region. Genomic DNA was digested with rare-cutting enzymes. DNA fragments from 0.2 to greater than 5.7 Mb were separated by electrophoresis and hybridized with alphoid and simple DNA repeats. Rare-cutting enzymes (Mlu I, Nar I, Not I, Nru I, Sal I, Sfi I, Sst II) demonstrated fewer restriction sites at centromeric regions than elsewhere in the genome. The enzyme Not I had the fewest restriction sites at centromeric regions. As much as 70% of these sequences from the centromeric region are present in Not I DNA fragments greater than 5.7 and estimated to be as large as 10 Mb in size. Other repetitive sequences such as short interspersed repeated segments (SINEs), long interspersed repeated segments (LINEs), ribosomal DNA, and mini-satellite DNA that are not enriched at the centromeric region, are not enriched in Not I fragments of greater than 5.7 Mb in size.

  18. Inhibition of hepatitis B virus replication with linear DNA sequences expressing antiviral micro-RNA shuttles

    Energy Technology Data Exchange (ETDEWEB)

    Chattopadhyay, Saket; Ely, Abdullah; Bloom, Kristie; Weinberg, Marc S. [Antiviral Gene Therapy Research Unit, University of the Witwatersrand (South Africa); Arbuthnot, Patrick, E-mail: Patrick.Arbuthnot@wits.ac.za [Antiviral Gene Therapy Research Unit, University of the Witwatersrand (South Africa)

    2009-11-20

    RNA interference (RNAi) may be harnessed to inhibit viral gene expression and this approach is being developed to counter chronic infection with hepatitis B virus (HBV). Compared to synthetic RNAi activators, DNA expression cassettes that generate silencing sequences have advantages of sustained efficacy and ease of propagation in plasmid DNA (pDNA). However, the large size of pDNAs and inclusion of sequences conferring antibiotic resistance and immunostimulation limit delivery efficiency and safety. To develop use of alternative DNA templates that may be applied for therapeutic gene silencing, we assessed the usefulness of PCR-generated linear expression cassettes that produce anti-HBV micro-RNA (miR) shuttles. We found that silencing of HBV markers of replication was efficient (>75%) in cell culture and in vivo. miR shuttles were processed to form anti-HBV guide strands and there was no evidence of induction of the interferon response. Modification of terminal sequences to include flanking human adenoviral type-5 inverted terminal repeats was easily achieved and did not compromise silencing efficacy. These linear DNA sequences should have utility in the development of gene silencing applications where modifications of terminal elements with elimination of potentially harmful and non-essential sequences are required.

  19. Inhibition of hepatitis B virus replication with linear DNA sequences expressing antiviral micro-RNA shuttles

    International Nuclear Information System (INIS)

    Chattopadhyay, Saket; Ely, Abdullah; Bloom, Kristie; Weinberg, Marc S.; Arbuthnot, Patrick

    2009-01-01

    RNA interference (RNAi) may be harnessed to inhibit viral gene expression and this approach is being developed to counter chronic infection with hepatitis B virus (HBV). Compared to synthetic RNAi activators, DNA expression cassettes that generate silencing sequences have advantages of sustained efficacy and ease of propagation in plasmid DNA (pDNA). However, the large size of pDNAs and inclusion of sequences conferring antibiotic resistance and immunostimulation limit delivery efficiency and safety. To develop use of alternative DNA templates that may be applied for therapeutic gene silencing, we assessed the usefulness of PCR-generated linear expression cassettes that produce anti-HBV micro-RNA (miR) shuttles. We found that silencing of HBV markers of replication was efficient (>75%) in cell culture and in vivo. miR shuttles were processed to form anti-HBV guide strands and there was no evidence of induction of the interferon response. Modification of terminal sequences to include flanking human adenoviral type-5 inverted terminal repeats was easily achieved and did not compromise silencing efficacy. These linear DNA sequences should have utility in the development of gene silencing applications where modifications of terminal elements with elimination of potentially harmful and non-essential sequences are required.

  20. Functional interrogation of non-coding DNA through CRISPR genome editing.

    Science.gov (United States)

    Canver, Matthew C; Bauer, Daniel E; Orkin, Stuart H

    2017-05-15

    Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. The DNA sequence of the human X chromosome

    Science.gov (United States)

    Ross, Mark T.; Grafham, Darren V.; Coffey, Alison J.; Scherer, Steven; McLay, Kirsten; Muzny, Donna; Platzer, Matthias; Howell, Gareth R.; Burrows, Christine; Bird, Christine P.; Frankish, Adam; Lovell, Frances L.; Howe, Kevin L.; Ashurst, Jennifer L.; Fulton, Robert S.; Sudbrak, Ralf; Wen, Gaiping; Jones, Matthew C.; Hurles, Matthew E.; Andrews, T. Daniel; Scott, Carol E.; Searle, Stephen; Ramser, Juliane; Whittaker, Adam; Deadman, Rebecca; Carter, Nigel P.; Hunt, Sarah E.; Chen, Rui; Cree, Andrew; Gunaratne, Preethi; Havlak, Paul; Hodgson, Anne; Metzker, Michael L.; Richards, Stephen; Scott, Graham; Steffen, David; Sodergren, Erica; Wheeler, David A.; Worley, Kim C.; Ainscough, Rachael; Ambrose, Kerrie D.; Ansari-Lari, M. Ali; Aradhya, Swaroop; Ashwell, Robert I. S.; Babbage, Anne K.; Bagguley, Claire L.; Ballabio, Andrea; Banerjee, Ruby; Barker, Gary E.; Barlow, Karen F.; Barrett, Ian P.; Bates, Karen N.; Beare, David M.; Beasley, Helen; Beasley, Oliver; Beck, Alfred; Bethel, Graeme; Blechschmidt, Karin; Brady, Nicola; Bray-Allen, Sarah; Bridgeman, Anne M.; Brown, Andrew J.; Brown, Mary J.; Bonnin, David; Bruford, Elspeth A.; Buhay, Christian; Burch, Paula; Burford, Deborah; Burgess, Joanne; Burrill, Wayne; Burton, John; Bye, Jackie M.; Carder, Carol; Carrel, Laura; Chako, Joseph; Chapman, Joanne C.; Chavez, Dean; Chen, Ellson; Chen, Guan; Chen, Yuan; Chen, Zhijian; Chinault, Craig; Ciccodicola, Alfredo; Clark, Sue Y.; Clarke, Graham; Clee, Chris M.; Clegg, Sheila; Clerc-Blankenburg, Kerstin; Clifford, Karen; Cobley, Vicky; Cole, Charlotte G.; Conquer, Jen S.; Corby, Nicole; Connor, Richard E.; David, Robert; Davies, Joy; Davis, Clay; Davis, John; Delgado, Oliver; DeShazo, Denise; Dhami, Pawandeep; Ding, Yan; Dinh, Huyen; Dodsworth, Steve; Draper, Heather; Dugan-Rocha, Shannon; Dunham, Andrew; Dunn, Matthew; Durbin, K. James; Dutta, Ireena; Eades, Tamsin; Ellwood, Matthew; Emery-Cohen, Alexandra; Errington, Helen; Evans, Kathryn L.; Faulkner, Louisa; Francis, Fiona; Frankland, John; Fraser, Audrey E.; Galgoczy, Petra; Gilbert, James; Gill, Rachel; Glöckner, Gernot; Gregory, Simon G.; Gribble, Susan; Griffiths, Coline; Grocock, Russell; Gu, Yanghong; Gwilliam, Rhian; Hamilton, Cerissa; Hart, Elizabeth A.; Hawes, Alicia; Heath, Paul D.; Heitmann, Katja; Hennig, Steffen; Hernandez, Judith; Hinzmann, Bernd; Ho, Sarah; Hoffs, Michael; Howden, Phillip J.; Huckle, Elizabeth J.; Hume, Jennifer; Hunt, Paul J.; Hunt, Adrienne R.; Isherwood, Judith; Jacob, Leni; Johnson, David; Jones, Sally; de Jong, Pieter J.; Joseph, Shirin S.; Keenan, Stephen; Kelly, Susan; Kershaw, Joanne K.; Khan, Ziad; Kioschis, Petra; Klages, Sven; Knights, Andrew J.; Kosiura, Anna; Kovar-Smith, Christie; Laird, Gavin K.; Langford, Cordelia; Lawlor, Stephanie; Leversha, Margaret; Lewis, Lora; Liu, Wen; Lloyd, Christine; Lloyd, David M.; Loulseged, Hermela; Loveland, Jane E.; Lovell, Jamieson D.; Lozado, Ryan; Lu, Jing; Lyne, Rachael; Ma, Jie; Maheshwari, Manjula; Matthews, Lucy H.; McDowall, Jennifer; McLaren, Stuart; McMurray, Amanda; Meidl, Patrick; Meitinger, Thomas; Milne, Sarah; Miner, George; Mistry, Shailesh L.; Morgan, Margaret; Morris, Sidney; Müller, Ines; Mullikin, James C.; Nguyen, Ngoc; Nordsiek, Gabriele; Nyakatura, Gerald; O’Dell, Christopher N.; Okwuonu, Geoffery; Palmer, Sophie; Pandian, Richard; Parker, David; Parrish, Julia; Pasternak, Shiran; Patel, Dina; Pearce, Alex V.; Pearson, Danita M.; Pelan, Sarah E.; Perez, Lesette; Porter, Keith M.; Ramsey, Yvonne; Reichwald, Kathrin; Rhodes, Susan; Ridler, Kerry A.; Schlessinger, David; Schueler, Mary G.; Sehra, Harminder K.; Shaw-Smith, Charles; Shen, Hua; Sheridan, Elizabeth M.; Shownkeen, Ratna; Skuce, Carl D.; Smith, Michelle L.; Sotheran, Elizabeth C.; Steingruber, Helen E.; Steward, Charles A.; Storey, Roy; Swann, R. Mark; Swarbreck, David; Tabor, Paul E.; Taudien, Stefan; Taylor, Tineace; Teague, Brian; Thomas, Karen; Thorpe, Andrea; Timms, Kirsten; Tracey, Alan; Trevanion, Steve; Tromans, Anthony C.; d’Urso, Michele; Verduzco, Daniel; Villasana, Donna; Waldron, Lenee; Wall, Melanie; Wang, Qiaoyan; Warren, James; Warry, Georgina L.; Wei, Xuehong; West, Anthony; Whitehead, Siobhan L.; Whiteley, Mathew N.; Wilkinson, Jane E.; Willey, David L.; Williams, Gabrielle; Williams, Leanne; Williamson, Angela; Williamson, Helen; Wilming, Laurens; Woodmansey, Rebecca L.; Wray, Paul W.; Yen, Jennifer; Zhang, Jingkun; Zhou, Jianling; Zoghbi, Huda; Zorilla, Sara; Buck, David; Reinhardt, Richard; Poustka, Annemarie; Rosenthal, André; Lehrach, Hans; Meindl, Alfons; Minx, Patrick J.; Hillier, LaDeana W.; Willard, Huntington F.; Wilson, Richard K.; Waterston, Robert H.; Rice, Catherine M.; Vaudin, Mark; Coulson, Alan; Nelson, David L.; Weinstock, George; Sulston, John E.; Durbin, Richard; Hubbard, Tim; Gibbs, Richard A.; Beck, Stephan; Rogers, Jane; Bentley, David R.

    2009-01-01

    The human X chromosome has a unique biology that was shaped by its evolution as the sex chromosome shared by males and females. We have determined 99.3% of the euchromatic sequence of the X chromosome. Our analysis illustrates the autosomal origin of the mammalian sex chromosomes, the stepwise process that led to the progressive loss of recombination between X and Y, and the extent of subsequent degradation of the Y chromosome. LINE1 repeat elements cover one-third of the X chromosome, with a distribution that is consistent with their proposed role as way stations in the process of X-chromosome inactivation. We found 1,098 genes in the sequence, of which 99 encode proteins expressed in testis and in various tumour types. A disproportionately high number of mendelian diseases are documented for the X chromosome. Of this number, 168 have been explained by mutations in 113 X-linked genes, which in many cases were characterized with the aid of the DNA sequence. PMID:15772651

  2. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    Energy Technology Data Exchange (ETDEWEB)

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by

  3. Simple sequence repeat marker development and genetic mapping ...

    Indian Academy of Sciences (India)

    polymorphic SSR (simple sequence repeats) markers from libraries enriched for GA, CAA and AAT repeats, as well as 6 ... ers for quinoa was the development of a genetic linkage map ...... Weber J. L. 1990 Informativeness of human (dC-dA)n.

  4. Transcription of repetitive DNA in Neurospora crassa

    Energy Technology Data Exchange (ETDEWEB)

    Dutta, S K; Chaudhuri, R K

    1975-01-01

    Repeated DNA sequences of Neurospora crassa were isolated and characterized. Approximately 10 to 12 percent of N. crassa DNA sequence were repeated, of which 7.3 percent were found to be transcribed in mid-log phase of mycelial growth as measured by DNA:RNA hybridization. It is suggested that part of repetitive DNA transcripts in N. crassa were mitochondrial and part were nuclear DNA. Most of the nuclear repeated DNAs, however, code for rRNA and tRNA in N. crassa. (auth)

  5. Breaks in the 45S rDNA Lead to Recombination-Mediated Loss of Repeats

    Directory of Open Access Journals (Sweden)

    Daniël O. Warmerdam

    2016-03-01

    Full Text Available rDNA repeats constitute the most heavily transcribed region in the human genome. Tumors frequently display elevated levels of recombination in rDNA, indicating that the repeats are a liability to the genomic integrity of a cell. However, little is known about how cells deal with DNA double-stranded breaks in rDNA. Using selective endonucleases, we show that human cells are highly sensitive to breaks in 45S but not the 5S rDNA repeats. We find that homologous recombination inhibits repair of breaks in 45S rDNA, and this results in repeat loss. We identify the structural maintenance of chromosomes protein 5 (SMC5 as contributing to recombination-mediated repair of rDNA breaks. Together, our data demonstrate that SMC5-mediated recombination can lead to error-prone repair of 45S rDNA repeats, resulting in their loss and thereby reducing cellular viability.

  6. The influence of DNA sequence on epigenome-induced pathologies

    Directory of Open Access Journals (Sweden)

    Meagher Richard B

    2012-07-01

    Full Text Available Abstract Clear cause-and-effect relationships are commonly established between genotype and the inherited risk of acquiring human and plant diseases and aberrant phenotypes. By contrast, few such cause-and-effect relationships are established linking a chromatin structure (that is, the epitype with the transgenerational risk of acquiring a disease or abnormal phenotype. It is not entirely clear how epitypes are inherited from parent to offspring as populations evolve, even though epigenetics is proposed to be fundamental to evolution and the likelihood of acquiring many diseases. This article explores the hypothesis that, for transgenerationally inherited chromatin structures, “genotype predisposes epitype”, and that epitype functions as a modifier of gene expression within the classical central dogma of molecular biology. Evidence for the causal contribution of genotype to inherited epitypes and epigenetic risk comes primarily from two different kinds of studies discussed herein. The first and direct method of research proceeds by the examination of the transgenerational inheritance of epitype and the penetrance of phenotype among genetically related individuals. The second approach identifies epitypes that are duplicated (as DNA sequences are duplicated and evolutionarily conserved among repeated patterns in the DNA sequence. The body of this article summarizes particularly robust examples of these studies from humans, mice, Arabidopsis, and other organisms. The bulk of the data from both areas of research support the hypothesis that genotypes predispose the likelihood of displaying various epitypes, but for only a few classes of epitype. This analysis suggests that renewed efforts are needed in identifying polymorphic DNA sequences that determine variable nucleosome positioning and DNA methylation as the primary cause of inherited epigenome-induced pathologies. By contrast, there is very little evidence that DNA sequence directly

  7. Alu polymerase chain reaction: A method for rapid isolation of human-specific sequences from complex DNA sources

    International Nuclear Information System (INIS)

    Nelson, D.L.; Ledbetter, S.A.; Corbo, L.; Victoria, M.F.; Ramirez-Solis, R.; Webster, T.D.; Ledbetter, D.H.; Caskey, C.T.

    1989-01-01

    Current efforts to map the human genome are focused on individual chromosomes or smaller regions and frequently rely on the use of somatic cell hybrids. The authors report the application of the polymerase chain reaction to direct amplification of human DNA from hybrid cells containing regions of the human genome in rodent cell backgrounds using primers directed to the human Alu repeat element. They demonstrate Alu-directed amplification of a fragment of the human HPRT gene from both hybrid cell and cloned DNA and identify through sequence analysis the Alu repeats involved in this amplification. They also demonstrate the application of this technique to identify the chromosomal locations of large fragments of the human X chromosome cloned in a yeast artificial chromosome and the general applicability of the method to the preparation of DNA probes from cloned human sequences. The technique allows rapid gene mapping and provides a simple method for the isolation and analysis of specific chromosomal regions

  8. A novel constraint for thermodynamically designing DNA sequences.

    Directory of Open Access Journals (Sweden)

    Qiang Zhang

    Full Text Available Biotechnological and biomolecular advances have introduced novel uses for DNA such as DNA computing, storage, and encryption. For these applications, DNA sequence design requires maximal desired (and minimal undesired hybridizations, which are the product of a single new DNA strand from 2 single DNA strands. Here, we propose a novel constraint to design DNA sequences based on thermodynamic properties. Existing constraints for DNA design are based on the Hamming distance, a constraint that does not address the thermodynamic properties of the DNA sequence. Using a unique, improved genetic algorithm, we designed DNA sequence sets which satisfy different distance constraints and employ a free energy gap based on a minimum free energy (MFE to gauge DNA sequences based on set thermodynamic properties. When compared to the best constraints of the Hamming distance, our method yielded better thermodynamic qualities. We then used our improved genetic algorithm to obtain lower-bound DNA sequence sets. Here, we discuss the effects of novel constraint parameters on the free energy gap.

  9. Long Terminal Repeat Retrotransposon Content in Eight Diploid Sunflower Species Inferred from Next-Generation Sequence Data

    Science.gov (United States)

    Tetreault, Hannah M.; Ungerer, Mark C.

    2016-01-01

    The most abundant transposable elements (TEs) in plant genomes are Class I long terminal repeat (LTR) retrotransposons represented by superfamilies gypsy and copia. Amplification of these superfamilies directly impacts genome structure and contributes to differential patterns of genome size evolution among plant lineages. Utilizing short-read Illumina data and sequence information from a panel of Helianthus annuus (sunflower) full-length gypsy and copia elements, we explore the contribution of these sequences to genome size variation among eight diploid Helianthus species and an outgroup taxon, Phoebanthus tenuifolius. We also explore transcriptional dynamics of these elements in both leaf and bud tissue via RT-PCR. We demonstrate that most LTR retrotransposon sublineages (i.e., families) display patterns of similar genomic abundance across species. A small number of LTR retrotransposon sublineages exhibit lineage-specific amplification, particularly in the genomes of species with larger estimated nuclear DNA content. RT-PCR assays reveal that some LTR retrotransposon sublineages are transcriptionally active across all species and tissue types, whereas others display species-specific and tissue-specific expression. The species with the largest estimated genome size, H. agrestis, has experienced amplification of LTR retrotransposon sublineages, some of which have proliferated independently in other lineages in the Helianthus phylogeny. PMID:27233667

  10. Twisting short dsDNA with applied tension

    Science.gov (United States)

    Zoli, Marco

    2018-02-01

    The twisting deformation of mechanically stretched DNA molecules is studied by a coarse grained Hamiltonian model incorporating the fundamental interactions that stabilize the double helix and accounting for the radial and angular base pair fluctuations. The latter are all the more important at short length scales in which DNA fragments maintain an intrinsic flexibility. The presented computational method simulates a broad ensemble of possible molecule conformations characterized by a specific average twist and determines the energetically most convenient helical twist by free energy minimization. As this is done for any external load, the method yields the characteristic twist-stretch profile of the molecule and also computes the changes in the macroscopic helix parameters i.e. average diameter and rise distance. It is predicted that short molecules under stretching should first over-twist and then untwist by increasing the external load. Moreover, applying a constant load and simulating a torsional strain which over-twists the helix, it is found that the average helix diameter shrinks while the molecule elongates, in agreement with the experimental trend observed in kilo-base long sequences. The quantitative relation between percent relative elongation and superhelical density at fixed load is derived. The proposed theoretical model and computational method offer a general approach to characterize specific DNA fragments and predict their macroscopic elastic response as a function of the effective potential parameters of the mesoscopic Hamiltonian.

  11. Fractals in DNA sequence analysis

    Institute of Scientific and Technical Information of China (English)

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  12. DNA polymerase ι: The long and the short of it!

    Science.gov (United States)

    Frank, Ekaterina G; McLenigan, Mary P; McDonald, John P; Huston, Donald; Mead, Samantha; Woodgate, Roger

    2017-10-01

    The cDNA encoding human DNA polymerase ι (POLI) was cloned in 1999. At that time, it was believed that the POLI gene encoded a protein of 715 amino acids. Advances in DNA sequencing technologies led to the realization that there is an upstream, in-frame initiation codon that would encode a DNA polymerase ι (polι) protein of 740 amino acids. The extra 25 amino acid region is rich in acidic residues (11/25) and is reasonably conserved in eukaryotes ranging from fish to humans. As a consequence, the curated Reference Sequence (RefSeq) database identified polι as a 740 amino acid protein. However, the existence of the 740 amino acid polι has never been shown experimentally. Using highly specific antibodies to the 25 N-terminal amino acids of polι, we were unable to detect the longer 740 amino acid (ι-long) isoform in western blots. However, trace amounts of the ι-long isoform were detected after enrichment by immunoprecipitation. One might argue that the longer isoform may have a distinct biological function, if it exhibits significant differences in its enzymatic properties from the shorter, well-characterized 715 amino acid polι. We therefore purified and characterized recombinant full-length (740 amino acid) polι-long and compared it to full-length (715 amino acid) polι-short in vitro. The metal ion requirements for optimal catalytic activity differ slightly between ι-long and ι-short, but under optimal conditions, both isoforms exhibit indistinguishable enzymatic properties in vitro. We also report that like ι-short, the ι-long isoform can be monoubiquitinated and polyubiuquitinated in vivo, as well as form damage induced foci in vivo. We conclude that the predominant isoform of DNA polι in human cells is the shorter 715 amino acid protein and that if, or when, expressed, the longer 740 amino acid isoform has identical properties to the considerably more abundant shorter isoform. Published by Elsevier B.V.

  13. Unusual structures are present in DNA fragments containing super-long Huntingtin CAG repeats.

    Directory of Open Access Journals (Sweden)

    Daniel Duzdevich

    2011-02-01

    Full Text Available In the R6/2 mouse model of Huntington's disease (HD, expansion of the CAG trinucleotide repeat length beyond about 300 repeats induces a novel phenotype associated with a reduction in transcription of the transgene.We analysed the structure of polymerase chain reaction (PCR-generated DNA containing up to 585 CAG repeats using atomic force microscopy (AFM. As the number of CAG repeats increased, an increasing proportion of the DNA molecules exhibited unusual structural features, including convolutions and multiple protrusions. At least some of these features are hairpin loops, as judged by cross-sectional analysis and sensitivity to cleavage by mung bean nuclease. Single-molecule force measurements showed that the convoluted DNA was very resistant to untangling. In vitro replication by PCR was markedly reduced, and TseI restriction enzyme digestion was also hindered by the abnormal DNA structures. However, significantly, the DNA gained sensitivity to cleavage by the Type III restriction-modification enzyme, EcoP15I."Super-long" CAG repeats are found in a number of neurological diseases and may also appear through CAG repeat instability. We suggest that unusual DNA structures associated with super-long CAG repeats decrease transcriptional efficiency in vitro. We also raise the possibility that if these structures occur in vivo, they may play a role in the aetiology of CAG repeat diseases such as HD.

  14. Breaks in the 45S rDNA Lead to Recombination-Mediated Loss of Repeats.

    Science.gov (United States)

    Warmerdam, Daniël O; van den Berg, Jeroen; Medema, René H

    2016-03-22

    rDNA repeats constitute the most heavily transcribed region in the human genome. Tumors frequently display elevated levels of recombination in rDNA, indicating that the repeats are a liability to the genomic integrity of a cell. However, little is known about how cells deal with DNA double-stranded breaks in rDNA. Using selective endonucleases, we show that human cells are highly sensitive to breaks in 45S but not the 5S rDNA repeats. We find that homologous recombination inhibits repair of breaks in 45S rDNA, and this results in repeat loss. We identify the structural maintenance of chromosomes protein 5 (SMC5) as contributing to recombination-mediated repair of rDNA breaks. Together, our data demonstrate that SMC5-mediated recombination can lead to error-prone repair of 45S rDNA repeats, resulting in their loss and thereby reducing cellular viability. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  15. A sequence-dependent rigid-base model of DNA

    Science.gov (United States)

    Gonzalez, O.; Petkevičiutė, D.; Maddocks, J. H.

    2013-02-01

    A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can

  16. A sequence-dependent rigid-base model of DNA.

    Science.gov (United States)

    Gonzalez, O; Petkevičiūtė, D; Maddocks, J H

    2013-02-07

    A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can

  17. The RTR Complex Partner RMI2 and the DNA Helicase RTEL1 Are Both Independently Involved in Preserving the Stability of 45S rDNA Repeats in Arabidopsis thaliana.

    Directory of Open Access Journals (Sweden)

    Sarah Röhrig

    2016-10-01

    Full Text Available The stability of repetitive sequences in complex eukaryotic genomes is safeguarded by factors suppressing homologues recombination. Prominent in this is the role of the RTR complex. In plants, it consists of the RecQ helicase RECQ4A, the topoisomerase TOP3α and RMI1. Like mammals, but not yeast, plants harbor an additional complex partner, RMI2. Here, we demonstrate that, in Arabidopsis thaliana, RMI2 is involved in the repair of aberrant replication intermediates in root meristems as well as in intrastrand crosslink repair. In both instances, RMI2 is involved independently of the DNA helicase RTEL1. Surprisingly, simultaneous loss of RMI2 and RTEL1 leads to loss of male fertility. As both the RTR complex and RTEL1 are involved in suppression of homologous recombination (HR, we tested the efficiency of HR in the double mutant rmi2-2 rtel1-1 and found a synergistic enhancement (80-fold. Searching for natural target sequences we found that RTEL1 is required for stabilizing 45S rDNA repeats. In the double mutant with rmi2-2 the number of 45S rDNA repeats is further decreased sustaining independent roles of both factors in this process. Thus, loss of suppression of HR does not only lead to a destabilization of rDNA repeats but might be especially deleterious for tissues undergoing multiple cell divisions such as the male germline.

  18. The RTR Complex Partner RMI2 and the DNA Helicase RTEL1 Are Both Independently Involved in Preserving the Stability of 45S rDNA Repeats in Arabidopsis thaliana.

    Science.gov (United States)

    Röhrig, Sarah; Schröpfer, Susan; Knoll, Alexander; Puchta, Holger

    2016-10-01

    The stability of repetitive sequences in complex eukaryotic genomes is safeguarded by factors suppressing homologues recombination. Prominent in this is the role of the RTR complex. In plants, it consists of the RecQ helicase RECQ4A, the topoisomerase TOP3α and RMI1. Like mammals, but not yeast, plants harbor an additional complex partner, RMI2. Here, we demonstrate that, in Arabidopsis thaliana, RMI2 is involved in the repair of aberrant replication intermediates in root meristems as well as in intrastrand crosslink repair. In both instances, RMI2 is involved independently of the DNA helicase RTEL1. Surprisingly, simultaneous loss of RMI2 and RTEL1 leads to loss of male fertility. As both the RTR complex and RTEL1 are involved in suppression of homologous recombination (HR), we tested the efficiency of HR in the double mutant rmi2-2 rtel1-1 and found a synergistic enhancement (80-fold). Searching for natural target sequences we found that RTEL1 is required for stabilizing 45S rDNA repeats. In the double mutant with rmi2-2 the number of 45S rDNA repeats is further decreased sustaining independent roles of both factors in this process. Thus, loss of suppression of HR does not only lead to a destabilization of rDNA repeats but might be especially deleterious for tissues undergoing multiple cell divisions such as the male germline.

  19. Fast and secure retrieval of DNA sequences

    NARCIS (Netherlands)

    2014-01-01

    Sequence models are retrieved from a sequences index. The sequence models model DNA or RNA sequences stored in a database, and each comprises a finite memory tree source model and parameters for the finite memory tree source model. One or more DNA or RNA sequences stored in the database are

  20. Revisiting the TALE repeat.

    Science.gov (United States)

    Deng, Dong; Yan, Chuangye; Wu, Jianping; Pan, Xiaojing; Yan, Nieng

    2014-04-01

    Transcription activator-like (TAL) effectors specifically bind to double stranded (ds) DNA through a central domain of tandem repeats. Each TAL effector (TALE) repeat comprises 33-35 amino acids and recognizes one specific DNA base through a highly variable residue at a fixed position in the repeat. Structural studies have revealed the molecular basis of DNA recognition by TALE repeats. Examination of the overall structure reveals that the basic building block of TALE protein, namely a helical hairpin, is one-helix shifted from the previously defined TALE motif. Here we wish to suggest a structure-based re-demarcation of the TALE repeat which starts with the residues that bind to the DNA backbone phosphate and concludes with the base-recognition hyper-variable residue. This new numbering system is consistent with the α-solenoid superfamily to which TALE belongs, and reflects the structural integrity of TAL effectors. In addition, it confers integral number of TALE repeats that matches the number of bound DNA bases. We then present fifteen crystal structures of engineered dHax3 variants in complex with target DNA molecules, which elucidate the structural basis for the recognition of bases adenine (A) and guanine (G) by reported or uncharacterized TALE codes. Finally, we analyzed the sequence-structure correlation of the amino acid residues within a TALE repeat. The structural analyses reported here may advance the mechanistic understanding of TALE proteins and facilitate the design of TALEN with improved affinity and specificity.

  1. DNA sequencing conference, 2

    Energy Technology Data Exchange (ETDEWEB)

    Cook-Deegan, R.M. [Georgetown Univ., Kennedy Inst. of Ethics, Washington, DC (United States); Venter, J.C. [National Inst. of Neurological Disorders and Strokes, Bethesda, MD (United States); Gilbert, W. [Harvard Univ., Cambridge, MA (United States); Mulligan, J. [Stanford Univ., CA (United States); Mansfield, B.K. [Oak Ridge National Lab., TN (United States)

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  2. Evaluation of Patients with an Apparent False Positive Stool DNA Test: The Role of Repeat Stool DNA Testing.

    Science.gov (United States)

    Cooper, Gregory S; Markowitz, Sanford D; Chen, Zhengyi; Tuck, Missy; Willis, Joseph E; Berger, Barry M; Brenner, Dean E; Li, Li

    2018-03-07

    There is uncertainty as to the appropriate follow-up of patients who test positive on multimarker stool DNA (sDNA) testing and have a colonoscopy without neoplasia. To determine the prevalence of missed colonic or occult upper gastrointestinal neoplasia in patients with an apparent false positive sDNA. We prospectively identified 30 patients who tested positive with a commercially available sDNA followed by colonoscopy without neoplastic lesions. Patients were invited to undergo repeat sDNA at 11-29 months after the initial test followed by repeat colonoscopy and upper endoscopy. We determined the presence of neoplastic lesions on repeat evaluation stratified by results of repeat sDNA. Twelve patients were restudied. Seven patients had a negative second sDNA test and a normal second colonoscopy and upper endoscopy. In contrast, 5 of 12 subjects had a persistently positive second sDNA test, and 3 had positive findings, including a 3-cm sessile transverse colon adenoma with high-grade dysplasia, a 2-cm right colon sessile serrated adenoma with dysplasia, and a nonadvanced colon adenoma (p = 0.045). These corresponded to a positive predictive value of 0.60 (95% CI 0.17-1.00) and a negative predictive value of 1.00 (95% CI 1.00-1.00) for the second sDNA test. In addition, the medical records of all 30 subjects with apparent false positive testing were reviewed and no documented cases of malignant tumors were recorded. Repeat positive sDNA testing may identify a subset of patients with missed or occult colorectal neoplasia after negative colonoscopy for an initially positive sDNA. High-quality colonoscopy with careful attention to the right colon in patients with positive sDNA is critically important and may avoid false negative colonoscopy.

  3. Deviating T-DNA transfer from Agrobacterium tumefaciens to plants

    DEFF Research Database (Denmark)

    van der Graaff, Eric; den Dulk-Ras, A; Hooykaas, P J

    1996-01-01

    -region. On the basis of the structure of the transferred DNA we propose that in these lines T-DNA transfer started at the left-border repeat, continued through the vector part, passed the right border repeat, and ended only after reaching again this left-border repeat.......We analyzed 29 T-DNA inserts in transgenic Arabidopsis thaliana plants for the junction of the right border sequences and the flanking plant DNA. DNA sequencing showed that in most lines the right border sequences transferred had been preserved during integration, corroborating literature data....... Surprisingly, in four independent transgenic lines a complete right border repeat was present followed by binary vector sequences. Cloning of two of these T-DNA inserts by plasmid rescue showed that in these lines the transferred DNA consisted of the complete binary vector sequences in addition to the T...

  4. Genomic DNA sequence and cytosine methylation changes of adult rice leaves after seeds space flight

    Science.gov (United States)

    Shi, Jinming

    In this study, cytosine methylation on CCGG site and genomic DNA sequence changes of adult leaves of rice after seeds space flight were detected by methylation-sensitive amplification polymorphism (MSAP) and Amplified fragment length polymorphism (AFLP) technique respectively. Rice seeds were planted in the trial field after 4 days space flight on the shenzhou-6 Spaceship of China. Adult leaves of space-treated rice including 8 plants chosen randomly and 2 plants with phenotypic mutation were used for AFLP and MSAP analysis. Polymorphism of both DNA sequence and cytosine methylation were detected. For MSAP analysis, the average polymorphic frequency of the on-ground controls, space-treated plants and mutants are 1.3%, 3.1% and 11% respectively. For AFLP analysis, the average polymorphic frequencies are 1.4%, 2.9%and 8%respectively. Total 27 and 22 polymorphic fragments were cloned sequenced from MSAP and AFLP analysis respectively. Nine of the 27 fragments from MSAP analysis show homology to coding sequence. For the 22 polymorphic fragments from AFLP analysis, no one shows homology to mRNA sequence and eight fragments show homology to repeat region or retrotransposon sequence. These results suggest that although both genomic DNA sequence and cytosine methylation status can be effected by space flight, the genomic region homology to the fragments from genome DNA and cytosine methylation analysis were different.

  5. Nucleotide sequence preservation of human mitochondrial DNA

    International Nuclear Information System (INIS)

    Monnat, R.J. Jr.; Loeb, L.A.

    1985-01-01

    Recombinant DNA techniques have been used to quantitate the amount of nucleotide sequence divergence in the mitochondrial DNA population of individual normal humans. Mitochondrial DNA was isolated from the peripheral blood lymphocytes of five normal humans and cloned in M13 mp11; 49 kilobases of nucleotide sequence information was obtained from 248 independently isolated clones from the five normal donors. Both between- and within-individual differences were identified. Between-individual differences were identified in approximately = to 1/200 nucleotides. In contrast, only one within-individual difference was identified in 49 kilobases of nucleotide sequence information. This high degree of mitochondrial nucleotide sequence homogeneity in human somatic cells is in marked contrast to the rapid evolutionary divergence of human mitochondrial DNA and suggests the existence of mechanisms for the concerted preservation of mammalian mitochondrial DNA sequences in single organisms

  6. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  7. Cis-acting regulatory sequences promote high-frequency gene conversion between repeated sequences in mammalian cells.

    Science.gov (United States)

    Raynard, Steven J; Baker, Mark D

    2004-01-01

    In mammalian cells, little is known about the nature of recombination-prone regions of the genome. Previously, we reported that the immunoglobulin heavy chain (IgH) mu locus behaved as a hotspot for mitotic, intrachromosomal gene conversion (GC) between repeated mu constant (Cmu) regions in mouse hybridoma cells. To investigate whether elements within the mu gene regulatory region were required for hotspot activity, gene targeting was used to delete a 9.1 kb segment encompassing the mu gene promoter (Pmu), enhancer (Emu) and switch region (Smu) from the locus. In these cell lines, GC between the Cmu repeats was significantly reduced, indicating that this 'recombination-enhancing sequence' (RES) is necessary for GC hotspot activity at the IgH locus. Importantly, the RES fragment stimulated GC when appended to the same Cmu repeats integrated at ectopic genomic sites. We also show that deletion of Emu and flanking matrix attachment regions (MARs) from the RES abolishes GC hotspot activity at the IgH locus. However, no stimulation of ectopic GC was observed with the Emu/MARs fragment alone. Finally, we provide evidence that no correlation exists between the level of transcription and GC promoted by the RES. We suggest a model whereby Emu/MARS enhances mitotic GC at the endogenous IgH mu locus by effecting chromatin modifications in adjacent DNA.

  8. Generating markers based on biotic stress of protein system in and tandem repeats sequence for Aquilaria sp

    International Nuclear Information System (INIS)

    Azhar Mohamad; Muhammad Hanif Azhari N; Siti Norhayati Ismail

    2014-01-01

    Aquilaria sp. belongs to the Thymelaeaceae family and is well distributed in Asia region. The species has multipurpose use from root to shoot and is an economically important crop, which generates wide interest in understanding genetic diversity of the species. Knowledge on DNA-based markers has become a prerequisite for more effective application of molecular marker techniques in breeding and mapping programs. In this work, both targeted genes and tandem repeat sequences were used for DNA fingerprinting in Aquilaria sp. A total of 100 ISSR (inter simple sequence repeat) primers and 50 combination pairs of specific primers derived from conserved region of a specific protein known as system in were optimized. 38 ISSR primers were found affirmative for polymorphism evaluation study and were generated from both specific and degenerate ISSR primers. And one utmost combination of system in primers showed significant results in distinguishing the Aquilaria sp. In conclusion, polymorphism derived from ISSR profiling and targeted stress genes of protein system in proved as a powerful approach for identification and molecular classification of Aquilaria sp. which will be useful for diversification in identifying any mutant lines derived from nature. (author)

  9. Heterogeneous Diversity of Spacers within CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)

    Science.gov (United States)

    He, Jiankui; Deem, Michael W.

    2010-09-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) in bacterial and archaeal DNA have recently been shown to be a new type of antiviral immune system in these organisms. We here study the diversity of spacers in CRISPR under selective pressure. We propose a population dynamics model that explains the biological observation that the leader-proximal end of CRISPR is more diversified and the leader-distal end of CRISPR is more conserved. This result is shown to be in agreement with recent experiments. Our results show that the CRISPR spacer structure is influenced by and provides a record of the viral challenges that bacteria face.

  10. ChloroSSRdb: a repository of perfect and imperfect chloroplastic simple sequence repeats (cpSSRs) of green plants.

    Science.gov (United States)

    Kapil, Aditi; Rai, Piyush Kant; Shanker, Asheesh

    2014-01-01

    Simple sequence repeats (SSRs) are regions in DNA sequence that contain repeating motifs of length 1-6 nucleotides. These repeats are ubiquitously present and are found in both coding and non-coding regions of genome. A total of 534 complete chloroplast genome sequences (as on 18 September 2014) of Viridiplantae are available at NCBI organelle genome resource. It provides opportunity to mine these genomes for the detection of SSRs and store them in the form of a database. In an attempt to properly manage and retrieve chloroplastic SSRs, we designed ChloroSSRdb which is a relational database developed using SQL server 2008 and accessed through ASP.NET. It provides information of all the three types (perfect, imperfect and compound) of SSRs. At present, ChloroSSRdb contains 124 430 mined SSRs, with majority lying in non-coding region. Out of these, PCR primers were designed for 118 249 SSRs. Tetranucleotide repeats (47 079) were found to be the most frequent repeat type, whereas hexanucleotide repeats (6414) being the least abundant. Additionally, in each species statistical analyses were performed to calculate relative frequency, correlation coefficient and chi-square statistics of perfect and imperfect SSRs. In accordance with the growing interest in SSR studies, ChloroSSRdb will prove to be a useful resource in developing genetic markers, phylogenetic analysis, genetic mapping, etc. Moreover, it will serve as a ready reference for mined SSRs in available chloroplast genomes of green plants. Database URL: www.compubio.in/chlorossrdb/ © The Author(s) 2014. Published by Oxford University Press.

  11. Next-Generation Sequencing Reveals the Impact of Repetitive DNA Across Phylogenetically Closely Related Genomes of Orobanchaceae

    Science.gov (United States)

    Piednoël, Mathieu; Aberer, Andre J.; Schneeweiss, Gerald M.; Macas, Jiri; Novak, Petr; Gundlach, Heidrun; Temsch, Eva M.; Renner, Susanne S.

    2013-01-01

    We used next-generation sequencing to characterize the genomes of nine species of Orobanchaceae of known phylogenetic relationships, different life forms, and including a polyploid species. The study species are the autotrophic, nonparasitic Lindenbergia philippensis, the hemiparasitic Schwalbea americana, and seven nonphotosynthetic parasitic species of Orobanche (Orobanche crenata, Orobanche cumana, Orobanche gracilis (tetraploid), and Orobanche pancicii) and Phelipanche (Phelipanche lavandulacea, Phelipanche purpurea, and Phelipanche ramosa). Ty3/Gypsy elements comprise 1.93%–28.34% of the nine genomes and Ty1/Copia elements comprise 8.09%–22.83%. When compared with L. philippensis and S. americana, the nonphotosynthetic species contain higher proportions of repetitive DNA sequences, perhaps reflecting relaxed selection on genome size in parasitic organisms. Among the parasitic species, those in the genus Orobanche have smaller genomes but higher proportions of repetitive DNA than those in Phelipanche, mostly due to a diversification of repeats and an accumulation of Ty3/Gypsy elements. Genome downsizing in the tetraploid O. gracilis probably led to sequence loss across most repeat types. PMID:22723303

  12. Sequence-Dependent Mechanism of DNA Oligonucleotide Dehybridization Resolved through Infrared Spectroscopy.

    Science.gov (United States)

    Sanstead, Paul J; Stevenson, Paul; Tokmakoff, Andrei

    2016-09-14

    Despite its important role in biology and nanotechnology, many questions remain regarding the molecular mechanism and dynamics by which oligonucleotides recognize and hybridize to their complementary sequence. The thermodynamics and kinetics of DNA oligonucleotide hybridization and dehybridization are often assumed to involve an all-or-nothing two-state dissociation pathway, but deviations from this behavior can be considerable even for short sequences. We introduce a new strategy to characterize the base-pair-specific thermal dissociation mechanism of DNA oligonucleotides through steady-state and time-resolved infrared spectroscopy. Experiments are interpreted with a lattice model to provide a structure-specific interpretation. This method is applied to a model set of self-complementary 10-base-pair sequences in which the placement of GC base pairs is varied in an otherwise AT strand. Through a combination of Fourier transform infrared and two-dimensional infrared spectroscopy, experiments reveal varying degrees of deviation from simple two-state behavior. As the temperature is increased, duplexes dissociate through a path in which the terminal bases fray, without any significant contribution from loop configurations. Transient temperature jump experiments reveal time scales of 70-100 ns for fraying and 10-30 μs for complete dissociation near the melting temperature. Whether or not frayed states are metastable intermediates or short-lived configurations during the full dissociation of the duplex is dictated by the nucleobase sequence.

  13. Phylogenetic relationships in three species of canine Demodex mite based on partial sequences of mitochondrial 16S rDNA.

    Science.gov (United States)

    Sastre, Natalia; Ravera, Ivan; Villanueva, Sergio; Altet, Laura; Bardagí, Mar; Sánchez, Armand; Francino, Olga; Ferrer, Lluís

    2012-12-01

    The historical classification of Demodex mites has been based on their hosts and morphological features. Genome sequencing has proved to be a very effective taxonomic tool in phylogenetic studies and has been applied in the classification of Demodex. Mitochondrial 16S rDNA has been demonstrated to be an especially useful marker to establish phylogenetic relationships. To amplify and sequence a segment of the mitochondrial 16S rDNA from Demodex canis and Demodex injai, as well as from the short-bodied mite called, unofficially, D. cornei and to determine their genetic proximity. Demodex mites were examined microscopically and classified as Demodex folliculorum (one sample), D. canis (four samples), D. injai (two samples) or the short-bodied species D. cornei (three samples). DNA was extracted, and a 338 bp fragment of the 16S rDNA was amplified and sequenced. The sequences of the four D. canis mites were identical and shared 99.6 and 97.3% identity with two D. canis sequences available at GenBank. The sequences of the D. cornei isolates were identical and showed 97.8, 98.2 and 99.6% identity with the D. canis isolates. The sequences of the two D. injai isolates were also identical and showed 76.6% identity with the D. canis sequence. Demodex canis and D. injai are two different species, with a genetic distance of 23.3%. It would seem that the short-bodied Demodex mite D. cornei is a morphological variant of D. canis. © 2012 The Authors. Veterinary Dermatology © 2012 ESVD and ACVD.

  14. Single Strand Annealing Plays a Major Role in RecA-Independent Recombination between Repeated Sequences in the Radioresistant Deinococcus radiodurans Bacterium.

    Directory of Open Access Journals (Sweden)

    Solenne Ithurbide

    2015-10-01

    Full Text Available The bacterium Deinococcus radiodurans is one of the most radioresistant organisms known. It is able to reconstruct a functional genome from hundreds of radiation-induced chromosomal fragments. Our work aims to highlight the genes involved in recombination between 438 bp direct repeats separated by intervening sequences of various lengths ranging from 1,479 bp to 10,500 bp to restore a functional tetA gene in the presence or absence of radiation-induced DNA double strand breaks. The frequency of spontaneous deletion events between the chromosomal direct repeats were the same in recA+ and in ΔrecA, ΔrecF, and ΔrecO bacteria, whereas recombination between chromosomal and plasmid DNA was shown to be strictly dependent on the RecA and RecF proteins. The presence of mutations in one of the repeated sequence reduced, in a MutS-dependent manner, the frequency of the deletion events. The distance between the repeats did not influence the frequencies of deletion events in recA+ as well in ΔrecA bacteria. The absence of the UvrD protein stimulated the recombination between the direct repeats whereas the absence of the DdrB protein, previously shown to be involved in DNA double strand break repair through a single strand annealing (SSA pathway, strongly reduces the frequency of RecA- (and RecO- independent deletions events. The absence of the DdrB protein also increased the lethal sectoring of cells devoid of RecA or RecO protein. γ-irradiation of recA+ cells increased about 10-fold the frequencies of the deletion events, but at a lesser extend in cells devoid of the DdrB protein. Altogether, our results suggest a major role of single strand annealing in DNA repeat deletion events in bacteria devoid of the RecA protein, and also in recA+ bacteria exposed to ionizing radiation.

  15. Translocation and gross deletion breakpoints in human inherited disease and cancer II: Potential involvement of repetitive sequence elements in secondary structure formation between DNA ends.

    Science.gov (United States)

    Chuzhanova, Nadia; Abeysinghe, Shaun S; Krawczak, Michael; Cooper, David N

    2003-09-01

    Translocations and gross deletions are responsible for a significant proportion of both cancer and inherited disease. Although such gene rearrangements are nonuniformly distributed in the human genome, the underlying mutational mechanisms remain unclear. We have studied the potential involvement of various types of repetitive sequence elements in the formation of secondary structure intermediates between the single-stranded DNA ends that recombine during rearrangements. Complexity analysis was used to assess the potential of these ends to form secondary structures, the maximum decrease in complexity consequent to a gross rearrangement being used as an indicator of the type of repeat and the specific DNA ends involved. A total of 175 pairs of deletion/translocation breakpoint junction sequences available from the Gross Rearrangement Breakpoint Database [GRaBD; www.uwcm.ac.uk/uwcm/mg/grabd/grabd.html] were analyzed. Potential secondary structure was noted between the 5' flanking sequence of the first breakpoint and the 3' flanking sequence of the second breakpoint in 49% of rearrangements and between the 5' flanking sequence of the second breakpoint and the 3' flanking sequence of the first breakpoint in 36% of rearrangements. Inverted repeats, inversions of inverted repeats, and symmetric elements were found in association with gross rearrangements at approximately the same frequency. However, inverted repeats and inversions of inverted repeats accounted for the vast majority (83%) of deletions plus small insertions, symmetric elements for one-half of all antigen receptor-mediated translocations, while direct repeats appear only to be involved in mediating simple deletions. These findings extend our understanding of illegitimate recombination by highlighting the importance of secondary structure formation between single-stranded DNA ends at breakpoint junctions. Copyright 2003 Wiley-Liss, Inc.

  16. EGNAS: an exhaustive DNA sequence design algorithm

    Directory of Open Access Journals (Sweden)

    Kick Alfred

    2012-06-01

    Full Text Available Abstract Background The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of sequences with defined properties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences offers the possibility of controlling both interstrand and intrastrand properties. The guanine-cytosine content can be adjusted. Sequences can be forced to start and end with guanine or cytosine. This option reduces the risk of “fraying” of DNA strands. It is possible to limit cross hybridizations of a defined length, and to adjust the uniqueness of sequences. Self-complementarity and hairpin structures of certain length can be avoided. Sequences and subsequences can optionally be forbidden. Furthermore, sequences can be designed to have minimum interactions with predefined strands and neighboring sequences. Results The algorithm is realized in a C++ program. TAG sequences can be generated and combined with primers for single-base extension reactions, which were described for multiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldback through intrastrand interaction of TAG-primer pairs can be limited. The design of sequences for specific attachment of molecular constructs to DNA origami is presented. Conclusions We developed a new software tool called EGNAS for the design of unique nucleic acid sequences. The presented exhaustive algorithm allows to generate greater sets of sequences than with previous software and equal constraints. EGNAS is freely available for noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS.

  17. Diversity, evolution, and functionality of clustered regularly interspaced short palindromic repeat (CRISPR) regions in the fire blight pathogen Erwinia amylovora.

    Science.gov (United States)

    Rezzonico, Fabio; Smits, Theo H M; Duffy, Brion

    2011-06-01

    The clustered regularly interspaced short palindromic repeat (CRISPR)/Cas system confers acquired heritable immunity against mobile nucleic acid elements in prokaryotes, limiting phage infection and horizontal gene transfer of plasmids. In CRISPR arrays, characteristic repeats are interspersed with similarly sized nonrepetitive spacers derived from transmissible genetic elements and acquired when the cell is challenged with foreign DNA. New spacers are added sequentially and the number and type of CRISPR units can differ among strains, providing a record of phage/plasmid exposure within a species and giving a valuable typing tool. The aim of this work was to investigate CRISPR diversity in the highly homogeneous species Erwinia amylovora, the causal agent of fire blight. A total of 18 CRISPR genotypes were defined within a collection of 37 cosmopolitan strains. Strains from Spiraeoideae plants clustered in three major groups: groups II and III were composed exclusively of bacteria originating from the United States, whereas group I generally contained strains of more recent dissemination obtained in Europe, New Zealand, and the Middle East. Strains from Rosoideae and Indian hawthorn (Rhaphiolepis indica) clustered separately and displayed a higher intrinsic diversity than that of isolates from Spiraeoideae plants. Reciprocal exclusion was generally observed between plasmid content and cognate spacer sequences, supporting the role of the CRISPR/Cas system in protecting against foreign DNA elements. However, in several group III strains, retention of plasmid pEU30 is inconsistent with a functional CRISPR/Cas system.

  18. Sequence periodicity in nucleosomal DNA and intrinsic curvature.

    Science.gov (United States)

    Nair, T Murlidharan

    2010-05-17

    Most eukaryotic DNA contained in the nucleus is packaged by wrapping DNA around histone octamers. Histones are ubiquitous and bind most regions of chromosomal DNA. In order to achieve smooth wrapping of the DNA around the histone octamer, the DNA duplex should be able to deform and should possess intrinsic curvature. The deformability of DNA is a result of the non-parallelness of base pair stacks. The stacking interaction between base pairs is sequence dependent. The higher the stacking energy the more rigid the DNA helix, thus it is natural to expect that sequences that are involved in wrapping around the histone octamer should be unstacked and possess intrinsic curvature. Intrinsic curvature has been shown to be dictated by the periodic recurrence of certain dinucleotides. Several genome-wide studies directed towards mapping of nucleosome positions have revealed periodicity associated with certain stretches of sequences. In the current study, these sequences have been analyzed with a view to understand their sequence-dependent structures. Higher order DNA structures and the distribution of molecular bend loci associated with 146 base nucleosome core DNA sequence from C. elegans and chicken have been analyzed using the theoretical model for DNA curvature. The curvature dispersion calculated by cyclically permuting the sequences revealed that the molecular bend loci were delocalized throughout the nucleosome core region and had varying degrees of intrinsic curvature. The higher order structures associated with nucleosomes of C.elegans and chicken calculated from the sequences revealed heterogeneity with respect to the deviation of the DNA axis. The results points to the possibility of context dependent curvature of varying degrees to be associated with nucleosomal DNA.

  19. Haloarcula hispanica CRISPR authenticates PAM of a target sequence to prime discriminative adaptation.

    Science.gov (United States)

    Li, Ming; Wang, Rui; Xiang, Hua

    2014-06-01

    The prokaryotic immune system CRISPR/Cas (Clustered Regularly Interspaced Short Palindromic Repeats/CRISPR-associated genes) adapts to foreign invaders by acquiring their short deoxyribonucleic acid (DNA) fragments as spacers, which guide subsequent interference to foreign nucleic acids based on sequence matching. The adaptation mechanism avoiding acquiring 'self' DNA fragments is poorly understood. In Haloarcula hispanica, we previously showed that CRISPR adaptation requires being primed by a pre-existing spacer partially matching the invader DNA. Here, we further demonstrate that flanking a fully-matched target sequence, a functional PAM (protospacer adjacent motif) is still required to prime adaptation. Interestingly, interference utilizes only four PAM sequences, whereas adaptation-priming tolerates as many as 23 PAM sequences. This relaxed PAM selectivity explains how adaptation-priming maximizes its tolerance of PAM mutations (that escape interference) while avoiding mis-targeting the spacer DNA within CRISPR locus. We propose that the primed adaptation, which hitches and cooperates with the interference pathway, distinguishes target from non-target by CRISPR ribonucleic acid guidance and PAM recognition. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. DNA sequence modeling based on context trees

    NARCIS (Netherlands)

    Kusters, C.J.; Ignatenko, T.; Roland, J.; Horlin, F.

    2015-01-01

    Genomic sequences contain instructions for protein and cell production. Therefore understanding and identification of biologically and functionally meaningful patterns in DNA sequences is of paramount importance. Modeling of DNA sequences in its turn can help to better understand and identify such

  1. Breaks in the 45S rDNA Lead to Recombination-Mediated Loss of Repeats

    OpenAIRE

    Warmerdam, Daniël O.; van den Berg, Jeroen; Medema, René H.

    2016-01-01

    rDNA repeats constitute the most heavily transcribed region in the human genome. Tumors frequently display elevated levels of recombination in rDNA, indicating that the repeats are a liability to the genomic integrity of a cell. However, little is known about how cells deal with DNA double-stranded breaks in rDNA. Using selective endonucleases, we show that human cells are highly sensitive to breaks in 45S but not the 5S rDNA repeats. We find that homologous recombination inhibits repair of b...

  2. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  3. Sequence of human protamine 2 cDNA

    Energy Technology Data Exchange (ETDEWEB)

    Domenjoud, L; Fronia, C; Uhde, F; Engel, W [Universitaet Goettingen (West Germany)

    1988-08-11

    The authors report the cloning and sequencing of a cDNA clone for human protamine 2 (hp2), isolated from a human testis cDNA library cloned in the vector {lambda}-gt11. A 66mer oligonucleotide, that corresponds to an amino acid sequence which is highly conserved between hp2 and mouse protamine 2 (mp2) served as hybridization probe. The homology between the amino acid sequence deduced from our cDNA and the published amino acid sequence for hp2 is 100%.

  4. Development of polymorphic genic-SSR markers by cDNA library sequencing in boxwood, Buxus spp. (Buxaceae)

    Science.gov (United States)

    Genic microsatellites or simple sequence repeat (genic-SSR) markers were developed in boxwood (Buxus taxa) for genetic diversity analysis, identification of taxa, and to facilitate breeding. cDNA libraries were developed from mRNA extracted from leaves of Buxus sempervirens ‘Vardar Valley’ and seque...

  5. Extrachromosomal circles of satellite repeats and 5S ribosomal DNA in human cells

    Directory of Open Access Journals (Sweden)

    Cohen Sarit

    2010-03-01

    Full Text Available Abstract Background Extrachomosomal circular DNA (eccDNA is ubiquitous in eukaryotic organisms and was detected in every organism tested, including in humans. A two-dimensional gel electrophoresis facilitates the detection of eccDNA in preparations of genomic DNA. Using this technique we have previously demonstrated that most of eccDNA consists of exact multiples of chromosomal tandemly repeated DNA, including both coding genes and satellite DNA. Results Here we report the occurrence of eccDNA in every tested human cell line. It has heterogeneous mass ranging from less than 2 kb to over 20 kb. We describe eccDNA homologous to human alpha satellite and the SstI mega satellite. Moreover, we show, for the first time, circular multimers of the human 5S ribosomal DNA (rDNA, similar to previous findings in Drosophila and plants. We further demonstrate structures that correspond to intermediates of rolling circle replication, which emerge from the circular multimers of 5S rDNA and SstI satellite. Conclusions These findings, and previous reports, support the general notion that every chromosomal tandem repeat is prone to generate eccDNA in eukryoric organisms including humans. They suggest the possible involvement of eccDNA in the length variability observed in arrays of tandem repeats. The implications of eccDNA on genome biology may include mechanisms of centromere evolution, concerted evolution and homogenization of tandem repeats and genomic plasticity.

  6. In silico reversal of repeat-induced point mutation (RIP identifies the origins of repeat families and uncovers obscured duplicated genes

    Directory of Open Access Journals (Sweden)

    Hane James K

    2010-11-01

    Full Text Available Abstract Background Repeat-induced point mutation (RIP is a fungal genome defence mechanism guarding against transposon invasion. RIP mutates the sequence of repeated DNA and over time renders the affected regions unrecognisable by similarity search tools such as BLAST. Results DeRIP is a new software tool developed to predict the original sequence of a RIP-mutated region prior to the occurrence of RIP. In this study, we apply deRIP to the genome of the wheat pathogen Stagonospora nodorum SN15 and predict the origin of several previously uncharacterised classes of repetitive DNA. Conclusions Five new classes of transposon repeats and four classes of endogenous gene repeats were identified after deRIP. The deRIP process is a new tool for fungal genomics that facilitates the identification and understanding of the role and origin of fungal repetitive DNA. DeRIP is open-source and is available as part of the RIPCAL suite at http://www.sourceforge.net/projects/ripcal.

  7. Molecular identification and characterization of clustered regularly interspaced short palindromic repeats (CRISPRs) in a urease-positive thermophilic Campylobacter sp. (UPTC).

    Science.gov (United States)

    Tasaki, E; Hirayama, J; Tazumi, A; Hayashi, K; Hara, Y; Ueno, H; Moore, J E; Millar, B C; Matsuda, M

    2012-02-01

    Novel clustered regularly-interspaced short palindromic repeats (CRISPRs) locus [7,500 base pairs (bp) in length] occurred in the urease-positive thermophilic Campylobacter (UPTC) Japanese isolate, CF89-12. The 7,500 bp gene loci consisted of the 5'-methylaminomethyl-2-thiouridylate methyltransferase gene, putative (P) CRISPR associated (p-Cas), putative open reading frames, Cas1 and Cas2, leader sequence region (146 bp), 12 CRISPRs consensus sequence repeats (each 36 bp) separated by a non-repetitive unique spacer region of similar length (26-31 bp) and the phosphatidyl glycerophosphatase A gene. When the CRISPRs loci in the UPTC CF89-12 and five C. jejuni isolates were compared with one another, these six isolates contained p-Cas, Cas1 and Cas2 within the loci. Four to 12 CRISPRs consensus sequence repeats separated by a non-repetitive unique spacer region occurred in six isolates and the nucleotide sequences of those repeats gave approximately 92-100% similarity with each other. However, no sequence similarity occurred in the unique spacer regions among these isolates. The putative σ(70) transcriptional promoter and the hypothetical ρ-independent terminator structures for the CRISPRs and Cas were detected. No in vivo transcription of p-Cas, Cas1 and Cas2 was confirmed in the UPTC cells.

  8. Cloning the human lysozyme cDNA: Inverted Alu repeat in the mRNA and in situ hybridization for macrophages and Paneth cells

    International Nuclear Information System (INIS)

    Chung, L.P.; Keshav, S.; Gordon, S.

    1988-01-01

    Lysozyme is a major secretory product of human and rodent macrophages and a useful marker for myelomonocytic cells. Based on the known human lysozyme amino acid sequence, oligonucleotides were synthesized and used as probes to screen a phorbol 12-myristate 13-acetate-treated U937 cDNA library. A full-length human lysozyme cDNA clone, pHL-2, was obtained and characterized. Sequence analysis shows that human lysozyme, like chicken lysozyme, has in 18-amino-acid-long signal peptide, but unlike the chicken lysozyme cDNA, the human lysozyme cDNA has a >1-kilobase-long 3' nontranslated sequence. Interestingly, within this 3' region, an inverted repeat of the Alu family of repetitive sequences was discovered. In RNA blot analyses, DNA probes prepared from pHL-2 can be used to detect lysozyme mRNA not only from human but also from mouse and rat. Moreover, by in situ hybridization, complementary RNA transcripts have been used as probes to detect lysozyme mRNA in mouse macrophages and Paneth cells. This human lysozyme cDNA clone is therefore likely to be a useful molecular probe for studying macrophage distribution and gene expression

  9. Bacterial identification and subtyping using DNA microarray and DNA sequencing.

    Science.gov (United States)

    Al-Khaldi, Sufian F; Mossoba, Magdi M; Allard, Marc M; Lienau, E Kurt; Brown, Eric D

    2012-01-01

    The era of fast and accurate discovery of biological sequence motifs in prokaryotic and eukaryotic cells is here. The co-evolution of direct genome sequencing and DNA microarray strategies not only will identify, isotype, and serotype pathogenic bacteria, but also it will aid in the discovery of new gene functions by detecting gene expressions in different diseases and environmental conditions. Microarray bacterial identification has made great advances in working with pure and mixed bacterial samples. The technological advances have moved beyond bacterial gene expression to include bacterial identification and isotyping. Application of new tools such as mid-infrared chemical imaging improves detection of hybridization in DNA microarrays. The research in this field is promising and future work will reveal the potential of infrared technology in bacterial identification. On the other hand, DNA sequencing by using 454 pyrosequencing is so cost effective that the promise of $1,000 per bacterial genome sequence is becoming a reality. Pyrosequencing technology is a simple to use technique that can produce accurate and quantitative analysis of DNA sequences with a great speed. The deposition of massive amounts of bacterial genomic information in databanks is creating fingerprint phylogenetic analysis that will ultimately replace several technologies such as Pulsed Field Gel Electrophoresis. In this chapter, we will review (1) the use of DNA microarray using fluorescence and infrared imaging detection for identification of pathogenic bacteria, and (2) use of pyrosequencing in DNA cluster analysis to fingerprint bacterial phylogenetic trees.

  10. Molecular dynamics simulations of DNA-free and DNA-bound TAL effectors.

    Directory of Open Access Journals (Sweden)

    Hua Wan

    Full Text Available TAL (transcriptional activator-like effectors (TALEs are DNA-binding proteins, containing a modular central domain that recognizes specific DNA sequences. Recently, the crystallographic studies of TALEs revealed the structure of DNA-recognition domain. In this article, molecular dynamics (MD simulations are employed to study two crystal structures of an 11.5-repeat TALE, in the presence and absence of DNA, respectively. The simulated results indicate that the specific binding of RVDs (repeat-variable diresidues with DNA leads to the markedly reduced fluctuations of tandem repeats, especially at the two ends. In the DNA-bound TALE system, the base-specific interaction is formed mainly by the residue at position 13 within a TAL repeat. Tandem repeats with weak RVDs are unfavorable for the TALE-DNA binding. These observations are consistent with experimental studies. By using principal component analysis (PCA, the dominant motions are open-close movements between the two ends of the superhelical structure in both DNA-free and DNA-bound TALE systems. The open-close movements are found to be critical for the recognition and binding of TALE-DNA based on the analysis of free energy landscape (FEL. The conformational analysis of DNA indicates that the 5' end of DNA target sequence has more remarkable structural deformability than the other sites. Meanwhile, the conformational change of DNA is likely associated with the specific interaction of TALE-DNA. We further suggest that the arrangement of N-terminal repeats with strong RVDs may help in the design of efficient TALEs. This study provides some new insights into the understanding of the TALE-DNA recognition mechanism.

  11. High-Throughput Block Optical DNA Sequence Identification.

    Science.gov (United States)

    Sagar, Dodderi Manjunatha; Korshoj, Lee Erik; Hanson, Katrina Bethany; Chowdhury, Partha Pratim; Otoupal, Peter Britton; Chatterjee, Anushree; Nagpal, Prashant

    2018-01-01

    Optical techniques for molecular diagnostics or DNA sequencing generally rely on small molecule fluorescent labels, which utilize light with a wavelength of several hundred nanometers for detection. Developing a label-free optical DNA sequencing technique will require nanoscale focusing of light, a high-throughput and multiplexed identification method, and a data compression technique to rapidly identify sequences and analyze genomic heterogeneity for big datasets. Such a method should identify characteristic molecular vibrations using optical spectroscopy, especially in the "fingerprinting region" from ≈400-1400 cm -1 . Here, surface-enhanced Raman spectroscopy is used to demonstrate label-free identification of DNA nucleobases with multiplexed 3D plasmonic nanofocusing. While nanometer-scale mode volumes prevent identification of single nucleobases within a DNA sequence, the block optical technique can identify A, T, G, and C content in DNA k-mers. The content of each nucleotide in a DNA block can be a unique and high-throughput method for identifying sequences, genes, and other biomarkers as an alternative to single-letter sequencing. Additionally, coupling two complementary vibrational spectroscopy techniques (infrared and Raman) can improve block characterization. These results pave the way for developing a novel, high-throughput block optical sequencing method with lossy genomic data compression using k-mer identification from multiplexed optical data acquisition. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Replication slippage of the thermophilic DNA polymerases B and D from the Euryarchaeota Pyrococcus abyssi

    Directory of Open Access Journals (Sweden)

    Melissa G. eCastillo-Lizardo

    2014-08-01

    Full Text Available Replication slippage or slipped-strand mispairing involves the misalignment of DNA strands during the replication of repeated DNA sequences, and can lead to genetic rearrangements such as microsatellite instability. Here, we show that PolB and PolD replicative DNA polymerases from the archaeal model Pyrococcus abyssi (Pab slip in vitro during replication of a single-stranded DNA template carrying a hairpin structure and short direct repeats. We find that this occurs in both their wild-type (exo+ and exonuclease deficient (exo- forms. The slippage behavior of PabPolB and PabPolD, probably due to limited strand displacement activity, resembles that observed for the high fidelity Pyrococcus furiosus (Pfu DNA polymerase. The presence of PabPCNA inhibited PabPolB and PabPolD slippage. We propose a model whereby PabPCNA stimulates strand displacement activity and polymerase progression through the hairpin, thus permitting the error-free replication of repetitive sequences.

  13. DNA fingerprinting based on simple sequence repeat (SSR ...

    African Journals Online (AJOL)

    New varieties of sugarcane are protected using morphological descriptors, which have limitations in identifying morphologically similar cultivars. Development of a reliable DNA fingerprint system for identification of new varieties would contribute greatly to the breeding of these species. Microsatellite markers are tools with ...

  14. Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

    Science.gov (United States)

    Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

    2017-07-01

    PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.

  15. Characterization of the env gene and long terminal repeat of molecularly cloned Friend mink cell focus-inducing virus DNA.

    OpenAIRE

    Adachi, A; Sakai, K; Kitamura, N; Nakanishi, S; Niwa, O; Matsuyama, M; Ishimoto, A

    1984-01-01

    The highly oncogenic erythroleukemia-inducing Friend mink cell focus-inducing (MCF) virus was molecularly cloned in phage lambda gtWES.lambda B, and the DNA sequences of the env gene and the long terminal repeat were determined. The nucleotide sequences of Friend MCF virus and Friend spleen focus-forming virus were quite homologous, supporting the hypothesis that Friend spleen focus-forming virus might be generated via Friend MCF virus from an ecotropic Friend virus mainly by some deletions. ...

  16. Analysis of expressed sequence tags from Prunus mume flower and fruit and development of simple sequence repeat markers

    Directory of Open Access Journals (Sweden)

    Gao Zhihong

    2010-07-01

    Full Text Available Abstract Background Expressed Sequence Tag (EST has been a cost-effective tool in molecular biology and represents an abundant valuable resource for genome annotation, gene expression, and comparative genomics in plants. Results In this study, we constructed a cDNA library of Prunus mume flower and fruit, sequenced 10,123 clones of the library, and obtained 8,656 expressed sequence tag (EST sequences with high quality. The ESTs were assembled into 4,473 unigenes composed of 1,492 contigs and 2,981 singletons and that have been deposited in NCBI (accession IDs: GW868575 - GW873047, among which 1,294 unique ESTs were with known or putative functions. Furthermore, we found 1,233 putative simple sequence repeats (SSRs in the P. mume unigene dataset. We randomly tested 42 pairs of PCR primers flanking potential SSRs, and 14 pairs were identified as true-to-type SSR loci and could amplify polymorphic bands from 20 individual plants of P. mume. We further used the 14 EST-SSR primer pairs to test the transferability on peach and plum. The result showed that nearly 89% of the primer pairs produced target PCR bands in the two species. A high level of marker polymorphism was observed in the plum species (65% and low in the peach (46%, and the clustering analysis of the three species indicated that these SSR markers were useful in the evaluation of genetic relationships and diversity between and within the Prunus species. Conclusions We have constructed the first cDNA library of P. mume flower and fruit, and our data provide sets of molecular biology resources for P. mume and other Prunus species. These resources will be useful for further study such as genome annotation, new gene discovery, gene functional analysis, molecular breeding, evolution and comparative genomics between Prunus species.

  17. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  18. Crystal Structure of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-associated Csn2 Protein Revealed Ca[superscript 2+]-dependent Double-stranded DNA Binding Activity

    Energy Technology Data Exchange (ETDEWEB)

    Nam, Ki Hyun; Kurinov, Igor; Ke, Ailong (Cornell); (NWU)

    2012-05-22

    Clustered regularly interspaced short palindromic repeats (CRISPR) and their associated protein genes (cas genes) are widespread in bacteria and archaea. They form a line of RNA-based immunity to eradicate invading bacteriophages and malicious plasmids. A key molecular event during this process is the acquisition of new spacers into the CRISPR loci to guide the selective degradation of the matching foreign genetic elements. Csn2 is a Nmeni subtype-specific cas gene required for new spacer acquisition. Here we characterize the Enterococcus faecalis Csn2 protein as a double-stranded (ds-) DNA-binding protein and report its 2.7 {angstrom} tetrameric ring structure. The inner circle of the Csn2 tetrameric ring is {approx}26 {angstrom} wide and populated with conserved lysine residues poised for nonspecific interactions with ds-DNA. Each Csn2 protomer contains an {alpha}/{beta} domain and an {alpha}-helical domain; significant hinge motion was observed between these two domains. Ca{sup 2+} was located at strategic positions in the oligomerization interface. We further showed that removal of Ca{sup 2+} ions altered the oligomerization state of Csn2, which in turn severely decreased its affinity for ds-DNA. In summary, our results provided the first insight into the function of the Csn2 protein in CRISPR adaptation by revealing that it is a ds-DNA-binding protein functioning at the quaternary structure level and regulated by Ca{sup 2+} ions.

  19. Identification, variation and transcription of pneumococcal repeat sequences

    Science.gov (United States)

    2011-01-01

    Background Small interspersed repeats are commonly found in many bacterial chromosomes. Two families of repeats (BOX and RUP) have previously been identified in the genome of Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen of humans. However, little is known about the role they play in pneumococcal genetics. Results Analysis of the genome of S. pneumoniae ATCC 700669 revealed the presence of a third repeat family, which we have named SPRITE. All three repeats are present at a reduced density in the genome of the closely related species S. mitis. However, they are almost entirely absent from all other streptococci, although a set of elements related to the pneumococcal BOX repeat was identified in the zoonotic pathogen S. suis. In conjunction with information regarding their distribution within the pneumococcal chromosome, this suggests that it is unlikely that these repeats are specialised sequences performing a particular role for the host, but rather that they constitute parasitic elements. However, comparing insertion sites between pneumococcal sequences indicates that they appear to transpose at a much lower rate than IS elements. Some large BOX elements in S. pneumoniae were found to encode open reading frames on both strands of the genome, whilst another was found to form a composite RNA structure with two T box riboswitches. In multiple cases, such BOX elements were demonstrated as being expressed using directional RNA-seq and RT-PCR. Conclusions BOX, RUP and SPRITE repeats appear to have proliferated extensively throughout the pneumococcal chromosome during the species' past, but novel insertions are currently occurring at a relatively slow rate. Through their extensive secondary structures, they seem likely to affect the expression of genes with which they are co-transcribed. Software for annotation of these repeats is freely available from ftp://ftp.sanger.ac.uk/pub/pathogens/strep_repeats/. PMID:21333003

  20. DNA Sequencing by Capillary Electrophoresis

    Science.gov (United States)

    Karger, Barry L.; Guttman, Andras

    2009-01-01

    Sequencing of human and other genomes has been at the center of interest in the biomedical field over the past several decades and is now leading toward an era of personalized medicine. During this time, DNA sequencing methods have evolved from the labor intensive slab gel electrophoresis, through automated multicapillary electrophoresis systems using fluorophore labeling with multispectral imaging, to the “next generation” technologies of cyclic array, hybridization based, nanopore and single molecule sequencing. Deciphering the genetic blueprint and follow-up confirmatory sequencing of Homo sapiens and other genomes was only possible by the advent of modern sequencing technologies that was a result of step by step advances with a contribution of academics, medical personnel and instrument companies. While next generation sequencing is moving ahead at break-neck speed, the multicapillary electrophoretic systems played an essential role in the sequencing of the Human Genome, the foundation of the field of genomics. In this prospective, we wish to overview the role of capillary electrophoresis in DNA sequencing based in part of several of our articles in this journal. PMID:19517496

  1. Estimating Genetic Conformism of Korean Mulberry Cultivars Using Random Amplified Polymorphic DNA and Inter-Simple Sequence Repeat Profiling

    Directory of Open Access Journals (Sweden)

    Sunirmal Sheet

    2018-03-01

    Full Text Available Apart from being fed to silkworms in sericulture, the ecologically important Mulberry plant has been used for traditional medicine in Asian countries as well as in manufacturing wine, food, and beverages. Germplasm analysis among Mulberry cultivars originating from South Korea is crucial in the plant breeding program for cultivar development. Hence, the genetic deviations and relations among 8 Morus alba plants, and one Morus lhou plant, of different cultivars collected from South Korea were investigated using 10 random amplified polymorphic DNA (RAPD and 10 inter-simple sequence repeat (ISSR markers in the present study. The ISSR markers exhibited a higher polymorphism (63.42% among mulberry genotypes in comparison to RAPD markers. Furthermore, the similarity coefficient was estimated for both markers and found to be varying between 0.183 and 0.814 for combined pooled data of ISSR and RAPD. The phenogram drawn using the UPGMA cluster method based on combined pooled data of RAPD and ISSR markers divided the nine mulberry genotypes into two divergent major groups and the two individual independent accessions. The distant relationship between Dae-Saug (SM1 and SangchonJo Sang Saeng (SM5 offers a possibility of utilizing them in mulberry cultivar improvement of Morus species of South Korea.

  2. Agarose gel electrophoresis and polyacrylamide gel electrophoresis for visualization of simple sequence repeats.

    Science.gov (United States)

    Anderson, James; Wright, Drew; Meksem, Khalid

    2013-01-01

    In the modern age of genetic research there is a constant search for ways to improve the efficiency of plant selection. The most recent technology that can result in a highly efficient means of selection and still be done at a low cost is through plant selection directed by simple sequence repeats (SSRs or microsatellites). The molecular markers are used to select for certain desirable plant traits without relying on ambiguous phenotypic data. The best way to detect these is the use of gel electrophoresis. Gel electrophoresis is a common technique in laboratory settings which is used to separate deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) by size. Loading DNA and RNA onto gels allows for visualization of the size of fragments through the separation of DNA and RNA fragments. This is achieved through the use of the charge in the particles. As the fragments separate, they form into distinct bands at set sizes. We describe the ability to visualize SSRs on slab gels of agarose and polyacrylamide gel electrophoresis.

  3. DNA Replication Dynamics of the GGGGCC Repeat of the C9orf72 Gene.

    Science.gov (United States)

    Thys, Ryan Griffin; Wang, Yuh-Hwa

    2015-11-27

    DNA has the ability to form a variety of secondary structures in addition to the normal B-form DNA, including hairpins and quadruplexes. These structures are implicated in a number of neurological diseases and cancer. Expansion of a GGGGCC repeat located at C9orf72 is associated with familial amyotrophic lateral sclerosis and frontotemporal dementia. This repeat expands from two to 24 copies in normal individuals to several hundreds or thousands of repeats in individuals with the disease. Biochemical studies have demonstrated that as little as four repeats have the ability to form a stable DNA secondary structure known as a G-quadruplex. Quadruplex structures have the ability to disrupt normal DNA processes such as DNA replication and transcription. Here we examine the role of GGGGCC repeat length and orientation on DNA replication using an SV40 replication system in human cells. Replication through GGGGCC repeats leads to a decrease in overall replication efficiency and an increase in instability in a length-dependent manner. Both repeat expansions and contractions are observed, and replication orientation is found to influence the propensity for expansions or contractions. The presence of replication stress, such as low-dose aphidicolin, diminishes replication efficiency but has no effect on instability. Two-dimensional gel electrophoresis analysis demonstrates a replication stall with as few as 20 GGGGCC repeats. These results suggest that replication of the GGGGCC repeat at C9orf72 is perturbed by the presence of expanded repeats, which has the potential to result in further expansion, leading to disease. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  4. CRISPR-Cas systems exploit viral DNA injection to establish and maintain adaptive immunity.

    Science.gov (United States)

    Modell, Joshua W; Jiang, Wenyan; Marraffini, Luciano A

    2017-04-06

    Clustered regularly interspaced short palindromic repeats (CRISPR)-Cas systems provide protection against viral and plasmid infection by capturing short DNA sequences from these invaders and integrating them into the CRISPR locus of the prokaryotic host. These sequences, known as spacers, are transcribed into short CRISPR RNA guides that specify the cleavage site of Cas nucleases in the genome of the invader. It is not known when spacer sequences are acquired during viral infection. Here, to investigate this, we tracked spacer acquisition in Staphylococcus aureus cells harbouring a type II CRISPR-Cas9 system after infection with the staphylococcal bacteriophage ϕ12. We found that new spacers were acquired immediately after infection preferentially from the cos site, the viral free DNA end that is first injected into the cell. Analysis of spacer acquisition after infection with mutant phages demonstrated that most spacers are acquired during DNA injection, but not during other stages of the viral cycle that produce free DNA ends, such as DNA replication or packaging. Finally, we showed that spacers acquired from early-injected genomic regions, which direct Cas9 cleavage of the viral DNA immediately after infection, provide better immunity than spacers acquired from late-injected regions. Our results reveal that CRISPR-Cas systems exploit the phage life cycle to generate a pattern of spacer acquisition that ensures a successful CRISPR immune response.

  5. Cloning and cDNA sequence of the dihydrolipoamide dehydrogenase component of human α-ketoacid dehydrogenase complexes

    International Nuclear Information System (INIS)

    Pons, G.; Raefsky-Estrin, C.; Carothers, D.J.; Pepin, R.A.; Javed, A.A.; Jesse, B.W.; Ganapathi, M.K.; Samols, D.; Patel, M.S.

    1988-01-01

    cDNA clones comprising the entire coding region for human dihydrolipoamide dehydrogenase have been isolated from a human liver cDNA library. The cDNA sequence of the largest clone consisted of 2082 base pairs and contained a 1527-base open reading frame that encodes a precursor dihydrolipoamide dehydrogenase of 509 amino acid residues. The first 35-amino acid residues of the open reading frame probably correspond to a typical mitochondrial import leader sequence. The predicted amino acid sequence of the mature protein, starting at the residue number 36 of the open reading frame, is almost identical (>98% homology) with the known partial amino acid sequence of the pig heart dihydrolipoamide dehydrogenase. The cDNA clone also contains a 3' untranslated region of 505 bases with an unusual polyadenylylation signal (TATAAA) and a short poly(A) track. By blot-hybridization analysis with the cDNA as probe, two mRNAs, 2.2 and 2.4 kilobases in size, have been detected in human tissues and fibroblasts, whereas only one mRNA (2.4 kilobases) was detected in rat tissues

  6. On site DNA barcoding by nanopore sequencing.

    Directory of Open Access Journals (Sweden)

    Michele Menegon

    Full Text Available Biodiversity research is becoming increasingly dependent on genomics, which allows the unprecedented digitization and understanding of the planet's biological heritage. The use of genetic markers i.e. DNA barcoding, has proved to be a powerful tool in species identification. However, full exploitation of this approach is hampered by the high sequencing costs and the absence of equipped facilities in biodiversity-rich countries. In the present work, we developed a portable sequencing laboratory based on the portable DNA sequencer from Oxford Nanopore Technologies, the MinION. Complementary laboratory equipment and reagents were selected to be used in remote and tough environmental conditions. The performance of the MinION sequencer and the portable laboratory was tested for DNA barcoding in a mimicking tropical environment, as well as in a remote rainforest of Tanzania lacking electricity. Despite the relatively high sequencing error-rate of the MinION, the development of a suitable pipeline for data analysis allowed the accurate identification of different species of vertebrates including amphibians, reptiles and mammals. In situ sequencing of a wild frog allowed us to rapidly identify the species captured, thus confirming that effective DNA barcoding in the field is possible. These results open new perspectives for real-time-on-site DNA sequencing thus potentially increasing opportunities for the understanding of biodiversity in areas lacking conventional laboratory facilities.

  7. Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

    Directory of Open Access Journals (Sweden)

    Jason D Thompson

    Full Text Available Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.

  8. Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

    Science.gov (United States)

    Thompson, Jason D; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre

    2012-01-01

    Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.

  9. Highly multiplexed targeted DNA sequencing from single nuclei.

    Science.gov (United States)

    Leung, Marco L; Wang, Yong; Kim, Charissa; Gao, Ruli; Jiang, Jerry; Sei, Emi; Navin, Nicholas E

    2016-02-01

    Single-cell DNA sequencing methods are challenged by poor physical coverage, high technical error rates and low throughput. To address these issues, we developed a single-cell DNA sequencing protocol that combines flow-sorting of single nuclei, time-limited multiple-displacement amplification (MDA), low-input library preparation, DNA barcoding, targeted capture and next-generation sequencing (NGS). This approach represents a major improvement over our previous single nucleus sequencing (SNS) Nature Protocols paper in terms of generating higher-coverage data (>90%), thereby enabling the detection of genome-wide variants in single mammalian cells at base-pair resolution. Furthermore, by pooling 48-96 single-cell libraries together for targeted capture, this approach can be used to sequence many single-cell libraries in parallel in a single reaction. This protocol greatly reduces the cost of single-cell DNA sequencing, and it can be completed in 5-6 d by advanced users. This single-cell DNA sequencing protocol has broad applications for studying rare cells and complex populations in diverse fields of biological research and medicine.

  10. Triplet repeat sequences in human DNA can be detected by hybridization to a synthetic (5'-CGG-3')17 oligodeoxyribonucleotide

    DEFF Research Database (Denmark)

    Behn-Krappa, A; Mollenhauer, J; Doerfler, W

    1993-01-01

    The seemingly autonomous amplification of naturally occurring triplet repeat sequences in the human genome has been implicated in the causation of human genetic disease, such as the fragile X (Martin-Bell) syndrome, myotonic dystrophy (Curshmann-Steinert), spinal and bulbar muscular atrophy...

  11. Analysis of unstable DNA sequence in FRM1 gene in Polish families with fragile X syndrome

    International Nuclear Information System (INIS)

    Milewski, Michal; Bal, Jerzy; Obersztyn, Ewa; Bocian, Ewa; Mazurczak, Tadeusz; Zygulska, Marta; Horst, Juergen; Deelen, Wout H.; Halley, Dicky J.J.

    1996-01-01

    The unstable DNA sequence in the FMR1 gene was analyzed in 85 individuals from Polish families with fragile X syndrome in order to characterize mutations responsible for the disease in Poland. In all affected individuals classified on the basis of clinical features and expression of the fragile site at X(q27.3) a large expansion of the unstable sequence (full mutation) was detected. About 5% (2 of 43) of individuals with full mutation did not express the fragile site. Among normal alleles, ranging in size from 20 to 41 CGC repeats, allele with 29 repeats was the most frequent (37%). Transmission of premutated and fully mutated alleles to the offspring was always associated with size increase. No change in repeat number was found when normal alleles were transmitted. (author). 19 refs., 4 figs, 1 tab

  12. Insight into microevolution of Yersinia pestis by clustered regularly interspaced short palindromic repeats.

    Directory of Open Access Journals (Sweden)

    Yujun Cui

    Full Text Available BACKGROUND: Yersinia pestis, the pathogen of plague, has greatly influenced human history on a global scale. Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR, an element participating in immunity against phages' invasion, is composed of short repeated sequences separated by unique spacers and provides the basis of the spoligotyping technology. In the present research, three CRISPR loci were analyzed in 125 strains of Y. pestis from 26 natural plague foci of China, the former Soviet Union and Mongolia were analyzed, for validating CRISPR-based genotyping method and better understanding adaptive microevolution of Y. pestis. METHODOLOGY/PRINCIPAL FINDINGS: Using PCR amplification, sequencing and online data processing, a high degree of genetic diversity was revealed in all three CRISPR elements. The distribution of spacers and their arrays in Y. pestis strains is strongly region and focus-specific, allowing the construction of a hypothetic evolutionary model of Y. pestis. This model suggests transmission route of microtus strains that encircled Takla Makan Desert and ZhunGer Basin. Starting from Tadjikistan, one branch passed through the Kunlun Mountains, and moved to the Qinghai-Tibet Plateau. Another branch went north via the Pamirs Plateau, the Tianshan Mountains, the Altai Mountains and the Inner Mongolian Plateau. Other Y. pestis lineages might be originated from certain areas along those routes. CONCLUSIONS/SIGNIFICANCE: CRISPR can provide important information for genotyping and evolutionary research of bacteria, which will help to trace the source of outbreaks. The resulting data will make possible the development of very low cost and high-resolution assays for the systematic typing of any new isolate.

  13. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    Directory of Open Access Journals (Sweden)

    Baldwin Stephen A

    2011-03-01

    Full Text Available Abstract Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  14. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities.

    Science.gov (United States)

    Troshin, Peter V; Postis, Vincent Lg; Ashworth, Denise; Baldwin, Stephen A; McPherson, Michael J; Barton, Geoffrey J

    2011-03-07

    Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  15. Breaks in the 45S rDNA Lead to Recombination-Mediated Loss of Repeats

    NARCIS (Netherlands)

    Warmerdam, Daniel O.; van den Berg, Jeroen; Medema, Rene H.

    2016-01-01

    rDNA repeats constitute the most heavily transcribed region in the human genome. Tumors frequently display elevated levels of recombination in rDNA, indicating that the repeats are a liability to the genomic integrity of a cell. However, little is known about how cells deal with DNA double-stranded

  16. De Novo Assembly of Human Herpes Virus Type 1 (HHV-1) Genome, Mining of Non-Canonical Structures and Detection of Novel Drug-Resistance Mutations Using Short- and Long-Read Next Generation Sequencing Technologies.

    Science.gov (United States)

    Karamitros, Timokratis; Harrison, Ian; Piorkowska, Renata; Katzourakis, Aris; Magiorkinis, Gkikas; Mbisa, Jean Lutamyo

    2016-01-01

    Human herpesvirus type 1 (HHV-1) has a large double-stranded DNA genome of approximately 152 kbp that is structurally complex and GC-rich. This makes the assembly of HHV-1 whole genomes from short-read sequencing data technically challenging. To improve the assembly of HHV-1 genomes we have employed a hybrid genome assembly protocol using data from two sequencing technologies: the short-read Roche 454 and the long-read Oxford Nanopore MinION sequencers. We sequenced 18 HHV-1 cell culture-isolated clinical specimens collected from immunocompromised patients undergoing antiviral therapy. The susceptibility of the samples to several antivirals was determined by plaque reduction assay. Hybrid genome assembly resulted in a decrease in the number of contigs in 6 out of 7 samples and an increase in N(G)50 and N(G)75 of all 7 samples sequenced by both technologies. The approach also enhanced the detection of non-canonical contigs including a rearrangement between the unique (UL) and repeat (T/IRL) sequence regions of one sample that was not detectable by assembly of 454 reads alone. We detected several known and novel resistance-associated mutations in UL23 and UL30 genes. Genome-wide genetic variability ranged from genomes will be useful in determining genetic determinants of drug resistance, virulence, pathogenesis and viral evolution. The numerous, complex repeat regions of the HHV-1 genome currently remain a barrier towards this goal.

  17. Screening the sequence selectivity of DNA-binding molecules using a gold nanoparticle-based colorimetric approach.

    Science.gov (United States)

    Hurst, Sarah J; Han, Min Su; Lytton-Jean, Abigail K R; Mirkin, Chad A

    2007-09-15

    We have developed a novel competition assay that uses a gold nanoparticle (Au NP)-based, high-throughput colorimetric approach to screen the sequence selectivity of DNA-binding molecules. This assay hinges on the observation that the melting behavior of DNA-functionalized Au NP aggregates is sensitive to the concentration of the DNA-binding molecule in solution. When short, oligomeric hairpin DNA sequences were added to a reaction solution consisting of DNA-functionalized Au NP aggregates and DNA-binding molecules, these molecules may either bind to the Au NP aggregate interconnects or the hairpin stems based on their relative affinity for each. This relative affinity can be measured as a change in the melting temperature (Tm) of the DNA-modified Au NP aggregates in solution. As a proof of concept, we evaluated the selectivity of 4',6-diamidino-2-phenylindone (an AT-specific binder), ethidium bromide (a nonspecific binder), and chromomycin A (a GC-specific binder) for six sequences of hairpin DNA having different numbers of AT pairs in a five-base pair variable stem region. Our assay accurately and easily confirmed the known trends in selectivity for the DNA binders in question without the use of complicated instrumentation. This novel assay will be useful in assessing large libraries of potential drug candidates that work by binding DNA to form a drug/DNA complex.

  18. DNA Replication Profiling Using Deep Sequencing.

    Science.gov (United States)

    Saayman, Xanita; Ramos-Pérez, Cristina; Brown, Grant W

    2018-01-01

    Profiling of DNA replication during progression through S phase allows a quantitative snap-shot of replication origin usage and DNA replication fork progression. We present a method for using deep sequencing data to profile DNA replication in S. cerevisiae.

  19. Effects of sequence on DNA wrapping around histones

    Science.gov (United States)

    Ortiz, Vanessa

    2011-03-01

    A central question in biophysics is whether the sequence of a DNA strand affects its mechanical properties. In epigenetics, these are thought to influence nucleosome positioning and gene expression. Theoretical and experimental attempts to answer this question have been hindered by an inability to directly resolve DNA structure and dynamics at the base-pair level. In our previous studies we used a detailed model of DNA to measure the effects of sequence on the stability of naked DNA under bending. Sequence was shown to influence DNA's ability to form kinks, which arise when certain motifs slide past others to form non-native contacts. Here, we have now included histone-DNA interactions to see if the results obtained for naked DNA are transferable to the problem of nucleosome positioning. Different DNA sequences interacting with the histone protein complex are studied, and their equilibrium and mechanical properties are compared among themselves and with the naked case. NLM training grant to the Computation and Informatics in Biology and Medicine Training Program (NLM T15LM007359).

  20. Molecular design of sequence specific DNA alkylating agents.

    Science.gov (United States)

    Minoshima, Masafumi; Bando, Toshikazu; Shinohara, Ken-ichi; Sugiyama, Hiroshi

    2009-01-01

    Sequence-specific DNA alkylating agents have great interest for novel approach to cancer chemotherapy. We designed the conjugates between pyrrole (Py)-imidazole (Im) polyamides and DNA alkylating chlorambucil moiety possessing at different positions. The sequence-specific DNA alkylation by conjugates was investigated by using high-resolution denaturing polyacrylamide gel electrophoresis (PAGE). The results showed that polyamide chlorambucil conjugates alkylate DNA at flanking adenines in recognition sequences of Py-Im polyamides, however, the reactivities and alkylation sites were influenced by the positions of conjugation. In addition, we synthesized conjugate between Py-Im polyamide and another alkylating agent, 1-(chloromethyl)-5-hydroxy-1,2-dihydro-3H-benz[e]indole (seco-CBI). DNA alkylation reactivies by both alkylating polyamides were almost comparable. In contrast, cytotoxicities against cell lines differed greatly. These comparative studies would promote development of appropriate sequence-specific DNA alkylating polyamides against specific cancer cells.

  1. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    Science.gov (United States)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  2. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  3. DNA breaks and repair in interstitial telomere sequences: Influence of chromatin structure; Etude des cassures de l'ADN et des mecanismes de reparation dans les sequences telomeriques interstitielles: Influence de la structure chromatinienne

    Energy Technology Data Exchange (ETDEWEB)

    Revaud, D.

    2009-06-15

    Interstitial Telomeric Sequences (ITS) are over-involved in spontaneous and radiationinduced chromosome aberrations in chinese hamster cells. We have performed a study to investigate the origin of their instability, spontaneously or after low doses irradiation. Our results demonstrate that ITS have a particular chromatin structure: short nucleotide repeat length, less compaction of the 30 nm chromatin fiber, presence of G-quadruplex structures. These features would modulate breaks production and would favour the recruitment of alternative DNA repair mechanisms, which are prone to produce chromosome aberrations. These pathways could be at the origin of chromosome aberrations in ITS whereas NHEJ and HR Double Strand Break repair pathways are rather required for a correct repair in these regions. (author)

  4. Sequence determinants of human microsatellite variability

    Directory of Open Access Journals (Sweden)

    Jakobsson Mattias

    2009-12-01

    Full Text Available Abstract Background Microsatellite loci are frequently used in genomic studies of DNA sequence repeats and in population studies of genetic variability. To investigate the effect of sequence properties of microsatellites on their level of variability we have analyzed genotypes at 627 microsatellite loci in 1,048 worldwide individuals from the HGDP-CEPH cell line panel together with the DNA sequences of these microsatellites in the human RefSeq database. Results Calibrating PCR fragment lengths in individual genotypes by using the RefSeq sequence enabled us to infer repeat number in the HGDP-CEPH dataset and to calculate the mean number of repeats (as opposed to the mean PCR fragment length, under the assumption that differences in PCR fragment length reflect differences in the numbers of repeats in the embedded repeat sequences. We find the mean and maximum numbers of repeats across individuals to be positively correlated with heterozygosity. The size and composition of the repeat unit of a microsatellite are also important factors in predicting heterozygosity, with tetra-nucleotide repeat units high in G/C content leading to higher heterozygosity. Finally, we find that microsatellites containing more separate sets of repeated motifs generally have higher heterozygosity. Conclusions These results suggest that sequence properties of microsatellites have a significant impact in determining the features of human microsatellite variability.

  5. Survey of clustered regularly interspaced short palindromic repeats and their associated Cas proteins (CRISPR/Cas) systems in multiple sequenced strains of Klebsiella pneumoniae.

    Science.gov (United States)

    Ostria-Hernández, Martha Lorena; Sánchez-Vallejo, Carlos Javier; Ibarra, J Antonio; Castro-Escarpulli, Graciela

    2015-08-04

    In recent years the emergence of multidrug resistant Klebsiella pneumoniae strains has been an increasingly common event. This opportunistic species is one of the five main bacterial pathogens that cause hospital infections worldwide and multidrug resistance has been associated with the presence of high molecular weight plasmids. Plasmids are generally acquired through horizontal transfer and therefore is possible that systems that prevent the entry of foreign genetic material are inactive or absent. One of these systems is CRISPR/Cas. However, little is known regarding the clustered regularly interspaced short palindromic repeats and their associated Cas proteins (CRISPR/Cas) system in K. pneumoniae. The adaptive immune system CRISPR/Cas has been shown to limit the entry of foreign genetic elements into bacterial organisms and in some bacteria it has been shown to be involved in regulation of virulence genes. Thus in this work we used bioinformatics tools to determine the presence or absence of CRISPR/Cas systems in available K. pneumoniae genomes. The complete CRISPR/Cas system was identified in two out of the eight complete K. pneumoniae genomes sequences and in four out of the 44 available draft genomes sequences. The cas genes in these strains comprises eight cas genes similar to those found in Escherichia coli, suggesting they belong to the type I-E group, although their arrangement is slightly different. As for the CRISPR sequences, the average lengths of the direct repeats and spacers were 29 and 33 bp, respectively. BLAST searches demonstrated that 38 of the 116 spacer sequences (33%) are significantly similar to either plasmid, phage or genome sequences, while the remaining 78 sequences (67%) showed no significant similarity to other sequences. The region where the CRISPR/Cas systems were located is the same in all the Klebsiella genomes containing it, it has a syntenic architecture, and is located among genes encoding for proteins likely involved in

  6. Multiple regulatory mechanisms of hepatocyte growth factor expression in malignant cells with a short poly(dA) sequence in the HGF gene promoter.

    Science.gov (United States)

    Sakai, Kazuko; Takeda, Masayuki; Okamoto, Isamu; Nakagawa, Kazuhiko; Nishio, Kazuto

    2015-01-01

    Hepatocyte growth factor (HGF) expression is a poor prognostic factor in various types of cancer. Expression levels of HGF have been reported to be regulated by shorter poly(dA) sequences in the promoter region. In the present study, the poly(dA) mononucleotide tract in various types of human cancer cell lines was examined and compared with the HGF expression levels in those cells. Short deoxyadenosine repeat sequences were detected in five of the 55 cell lines used in the present study. The H69, IM95, CCK-81, Sui73 and H28 cells exhibited a truncated poly(dA) sequence in which the number of poly(dA) repeats was reduced by ≥5 bp. Two of the cell lines exhibited high HGF expression, determined by reverse transcription quantitative polymerase chain reaction and enzyme-linked immunosorbent assay. The CCK-81, Sui73 and H28 cells with shorter poly(dA) sequences exhibited low HGF expression. The cause of the suppression of HGF expression in the CCK-81, Sui73 and H28 cells was clarified by two approaches, suppression by methylation and single nucleotide polymorphisms in the HGF gene. Exposure to 5-Aza-dC, an inhibitor of DNA methyltransferase 1, induced an increased expression of HGF in the CCK-81 cells, but not in the other cells. Single-nucleotide polymorphism (SNP) rs72525097 in intron 1 was detected in the Sui73 and H28 cells. Taken together, it was found that the defect of poly(dA) in the HGF promoter was present in various types of cancer, including lung, stomach, colorectal, pancreas and mesothelioma. The present study proposes the negative regulation mechanisms by methylation and SNP in intron 1 of HGF for HGF expression in cancer cells with short poly(dA).

  7. DNA fingerprinting of Mycobacterium leprae strains using variable number tandem repeat (VNTR) - fragment length analysis (FLA).

    Science.gov (United States)

    Jensen, Ronald W; Rivest, Jason; Li, Wei; Vissa, Varalakshmi

    2011-07-15

    The study of the transmission of leprosy is particularly difficult since the causative agent, Mycobacterium leprae, cannot be cultured in the laboratory. The only sources of the bacteria are leprosy patients, and experimentally infected armadillos and nude mice. Thus, many of the methods used in modern epidemiology are not available for the study of leprosy. Despite an extensive global drug treatment program for leprosy implemented by the WHO, leprosy remains endemic in many countries with approximately 250,000 new cases each year. The entire M. leprae genome has been mapped and many loci have been identified that have repeated segments of 2 or more base pairs (called micro- and minisatellites). Clinical strains of M. leprae may vary in the number of tandem repeated segments (short tandem repeats, STR) at many of these loci. Variable number tandem repeat (VNTR) analysis has been used to distinguish different strains of the leprosy bacilli. Some of the loci appear to be more stable than others, showing less variation in repeat numbers, while others seem to change more rapidly, sometimes in the same patient. While the variability of certain VNTRs has brought up questions regarding their suitability for strain typing, the emerging data suggest that analyzing multiple loci, which are diverse in their stability, can be used as a valuable epidemiological tool. Multiple locus VNTR analysis (MLVA) has been used to study leprosy evolution and transmission in several countries including China, Malawi, the Philippines, and Brazil. MLVA involves multiple steps. First, bacterial DNA is extracted along with host tissue DNA from clinical biopsies or slit skin smears (SSS). The desired loci are then amplified from the extracted DNA via polymerase chain reaction (PCR). Fluorescently-labeled primers for 4-5 different loci are used per reaction, with 18 loci being amplified in a total of four reactions. The PCR products may be subjected to agarose gel electrophoresis to verify the

  8. Multineuronal Spike Sequences Repeat with Millisecond Precision

    Directory of Open Access Journals (Sweden)

    Koki eMatsumoto

    2013-06-01

    Full Text Available Cortical microcircuits are nonrandomly wired by neurons. As a natural consequence, spikes emitted by microcircuits are also nonrandomly patterned in time and space. One of the prominent spike organizations is a repetition of fixed patterns of spike series across multiple neurons. However, several questions remain unsolved, including how precisely spike sequences repeat, how the sequences are spatially organized, how many neurons participate in sequences, and how different sequences are functionally linked. To address these questions, we monitored spontaneous spikes of hippocampal CA3 neurons ex vivo using a high-speed functional multineuron calcium imaging technique that allowed us to monitor spikes with millisecond resolution and to record the location of spiking and nonspiking neurons. Multineuronal spike sequences were overrepresented in spontaneous activity compared to the statistical chance level. Approximately 75% of neurons participated in at least one sequence during our observation period. The participants were sparsely dispersed and did not show specific spatial organization. The number of sequences relative to the chance level decreased when larger time frames were used to detect sequences. Thus, sequences were precise at the millisecond level. Sequences often shared common spikes with other sequences; parts of sequences were subsequently relayed by following sequences, generating complex chains of multiple sequences.

  9. Multiple tag labeling method for DNA sequencing

    Science.gov (United States)

    Mathies, R.A.; Huang, X.C.; Quesada, M.A.

    1995-07-25

    A DNA sequencing method is described which uses single lane or channel electrophoresis. Sequencing fragments are separated in the lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radioisotope labels. 5 figs.

  10. Structural organization of glycophorin A and B genes: Glycophorin B gene evolved by homologous recombination at Alu repeat sequences

    International Nuclear Information System (INIS)

    Kudo, Shinichi; Fukuda, Minoru

    1989-01-01

    Glycophorins A (GPA) and B (GPB) are two major sialoglycoproteins of the human erythrocyte membrane. Here the authors present a comparison of the genomic structures of GPA and GPB developed by analyzing DNA clones isolated from a K562 genomic library. Nucleotide sequences of exon-intron junctions and 5' and 3' flanking sequences revealed that the GPA and GPB genes consist of 7 and 5 exons, respectively, and both genes have >95% identical sequence from the 5' flanking region to the region ∼ 1 kilobase downstream from the exon encoding the transmembrane regions. In this homologous part of the genes, GPB lacks one exon due to a point mutation at the 5' splicing site of the third intron, which inactivates the 5' cleavage event of splicing and leads to ligation of the second to the fourth exon. Following these very homologous sequences, the genomic sequences for GPA and GPB diverge significantly and no homology can be detected in their 3' end sequences. The analysis of the Alu sequences and their flanking direct repeat sequences suggest that an ancestral genomic structure has been maintained in the GPA gene, whereas the GPB gene has arisen from the acquisition of 3' sequences different from those of the GPA gene by homologous recombination at the Alu repeats during or after gene duplication

  11. Human Chromosome 7: DNA Sequence and Biology

    OpenAIRE

    Scherer, Stephen W.; Cheung, Joseph; MacDonald, Jeffrey R.; Osborne, Lucy R.; Nakabayashi, Kazuhiko; Herbrick, Jo-Anne; Carson, Andrew R.; Parker-Katiraee, Layla; Skaug, Jennifer; Khaja, Razi; Zhang, Junjun; Hudek, Alexander K.; Li, Martin; Haddad, May; Duggan, Gavin E.

    2003-01-01

    DNA sequence and annotation of the entire human chromosome 7, encompassing nearly 158 million nucleotides of DNA and 1917 gene structures, are presented. To generate a higher order description, additional structural features such as imprinted genes, fragile sites, and segmental duplications were integrated at the level of the DNA sequence with medical genetic data, including 440 chromosome rearrangement breakpoints associated with disease. This approach enabled the discovery of candidate gene...

  12. PREDICTION OF CHROMATIN STATES USING DNA SEQUENCE PROPERTIES

    KAUST Repository

    Bahabri, Rihab R.

    2013-06-01

    Activities of DNA are to a great extent controlled epigenetically through the internal struc- ture of chromatin. This structure is dynamic and is influenced by different modifications of histone proteins. Various combinations of epigenetic modification of histones pinpoint to different functional regions of the DNA determining the so-called chromatin states. How- ever, the characterization of chromatin states by the DNA sequence properties remains largely unknown. In this study we aim to explore whether DNA sequence patterns in the human genome can characterize different chromatin states. Using DNA sequence motifs we built binary classifiers for each chromatic state to eval- uate whether a given genomic sequence is a good candidate for belonging to a particular chromatin state. Of four classification algorithms (C4.5, Naive Bayes, Random Forest, and SVM) used for this purpose, the decision tree based classifiers (C4.5 and Random Forest) yielded best results among those we evaluated. Our results suggest that in general these models lack sufficient predictive power, although for four chromatin states (insulators, het- erochromatin, and two types of copy number variation) we found that presence of certain motifs in DNA sequences does imply an increased probability that such a sequence is one of these chromatin states.

  13. Genome-wide identification and validation of simple sequence repeats (SSRs) from Asparagus officinalis.

    Science.gov (United States)

    Li, Shufen; Zhang, Guojun; Li, Xu; Wang, Lianjun; Yuan, Jinhong; Deng, Chuanliang; Gao, Wujun

    2016-06-01

    Garden asparagus (Asparagus officinalis), an important vegetable cultivated worldwide, can also serve as a model dioecious plant species in the study of sex determination and sex chromosome evolution. However, limited DNA marker resources have been developed and used for this species. To expand these resources, we examined the DNA sequences for simple sequence repeats (SSRs) in 163,406 scaffolds representing approximately 400 Mbp of the A. officinalis genome. A total of 87,576 SSRs were identified in 59,565 scaffolds. The most abundant SSR repeats were trinucleotide and tetranucleotide, accounting for 29.2 and 29.1% of the total SSRs, respectively, followed by di-, penta-, hexa-, hepta-, and octanucleotides. The AG motif was most common among dinucleotides and was also the most frequent motif in the entire A. officinalis genome, representing 14.7% of all SSRs. A total of 41,917 SSR primers pairs were designed to amplify SSRs. Twenty-two genomic SSR markers were tested in 39 asparagus accessions belonging to ten cultivars and one accession of Asparagus setaceus for determination of genetic diversity. The intra-species polymorphism information content (PIC) values of the 22 genomic SSR markers were intermediate, with an average of 0.41. The genetic diversity between the ten A. officinalis cultivars was low, and the UPGMA dendrogram was largely unrelated to cultivars. It is here suggested that the sex of individuals is an important factor influencing the clustering results. The information reported here provides new information about the organization of the microsatellites in A. officinalis genome and lays a foundation for further genetic studies and breeding applications of A. officinalis and related species. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations.

    Science.gov (United States)

    Oikonomopoulos, Spyros; Wang, Yu Chang; Djambazian, Haig; Badescu, Dunarel; Ragoussis, Jiannis

    2016-08-24

    To assess the performance of the Oxford Nanopore Technologies MinION sequencing platform, cDNAs from the External RNA Controls Consortium (ERCC) RNA Spike-In mix were sequenced. This mix mimics mammalian mRNA species and consists of 92 polyadenylated transcripts with known concentration. cDNA libraries were generated using a template switching protocol to facilitate the direct comparison between different sequencing platforms. The MinION performance was assessed for its ability to sequence the cDNAs directly with good accuracy in terms of abundance and full length. The abundance of the ERCC cDNA molecules sequenced by MinION agreed with their expected concentration. No length or GC content bias was observed. The majority of cDNAs were sequenced as full length. Additionally, a complex cDNA population derived from a human HEK-293 cell line was sequenced on an Illumina HiSeq 2500, PacBio RS II and ONT MinION platforms. We observed that there was a good agreement in the measured cDNA abundance between PacBio RS II and ONT MinION (rpearson = 0.82, isoforms with length more than 700bp) and between Illumina HiSeq 2500 and ONT MinION (rpearson = 0.75). This indicates that the ONT MinION can sequence quantitatively both long and short full length cDNA molecules.

  15. Function and Regulation of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR / CRISPR Associated (Cas Systems

    Directory of Open Access Journals (Sweden)

    Peter C. Fineran

    2012-10-01

    Full Text Available Phages are the most abundant biological entities on earth and pose a constant challenge to their bacterial hosts. Thus, bacteria have evolved numerous ‘innate’ mechanisms of defense against phage, such as abortive infection or restriction/modification systems. In contrast, the clustered regularly interspaced short palindromic repeats (CRISPR systems provide acquired, yet heritable, sequence-specific ‘adaptive’ immunity against phage and other horizontally-acquired elements, such as plasmids. Resistance is acquired following viral infection or plasmid uptake when a short sequence of the foreign genome is added to the CRISPR array. CRISPRs are then transcribed and processed, generally by CRISPR associated (Cas proteins, into short interfering RNAs (crRNAs, which form part of a ribonucleoprotein complex. This complex guides the crRNA to the complementary invading nucleic acid and targets this for degradation. Recently, there have been rapid advances in our understanding of CRISPR/Cas systems. In this review, we will present the current model(s of the molecular events involved in both the acquisition of immunity and interference stages and will also address recent progress in our knowledge of the regulation of CRISPR/Cas systems.

  16. Function and regulation of clustered regularly interspaced short palindromic repeats (CRISPR) / CRISPR associated (Cas) systems.

    Science.gov (United States)

    Richter, Corinna; Chang, James T; Fineran, Peter C

    2012-10-19

    Phages are the most abundant biological entities on earth and pose a constant challenge to their bacterial hosts. Thus, bacteria have evolved numerous 'innate' mechanisms of defense against phage, such as abortive infection or restriction/modification systems. In contrast, the clustered regularly interspaced short palindromic repeats (CRISPR) systems provide acquired, yet heritable, sequence-specific 'adaptive' immunity against phage and other horizontally-acquired elements, such as plasmids. Resistance is acquired following viral infection or plasmid uptake when a short sequence of the foreign genome is added to the CRISPR array. CRISPRs are then transcribed and processed, generally by CRISPR associated (Cas) proteins, into short interfering RNAs (crRNAs), which form part of a ribonucleoprotein complex. This complex guides the crRNA to the complementary invading nucleic acid and targets this for degradation. Recently, there have been rapid advances in our understanding of CRISPR/Cas systems. In this review, we will present the current model(s) of the molecular events involved in both the acquisition of immunity and interference stages and will also address recent progress in our knowledge of the regulation of CRISPR/Cas systems.

  17. Subtyping Salmonella enterica serovar enteritidis isolates from different sources by using sequence typing based on virulence genes and clustered regularly interspaced short palindromic repeats (CRISPRs).

    Science.gov (United States)

    Liu, Fenyun; Kariyawasam, Subhashinie; Jayarao, Bhushan M; Barrangou, Rodolphe; Gerner-Smidt, Peter; Ribot, Efrain M; Knabel, Stephen J; Dudley, Edward G

    2011-07-01

    Salmonella enterica subsp. enterica serovar Enteritidis is a major cause of food-borne salmonellosis in the United States. Two major food vehicles for S. Enteritidis are contaminated eggs and chicken meat. Improved subtyping methods are needed to accurately track specific strains of S. Enteritidis related to human salmonellosis throughout the chicken and egg food system. A sequence typing scheme based on virulence genes (fimH and sseL) and clustered regularly interspaced short palindromic repeats (CRISPRs)-CRISPR-including multi-virulence-locus sequence typing (designated CRISPR-MVLST)-was used to characterize 35 human clinical isolates, 46 chicken isolates, 24 egg isolates, and 63 hen house environment isolates of S. Enteritidis. A total of 27 sequence types (STs) were identified among the 167 isolates. CRISPR-MVLST identified three persistent and predominate STs circulating among U.S. human clinical isolates and chicken, egg, and hen house environmental isolates in Pennsylvania, and an ST that was found only in eggs and humans. It also identified a potential environment-specific sequence type. Moreover, cluster analysis based on fimH and sseL identified a number of clusters, of which several were found in more than one outbreak, as well as 11 singletons. Further research is needed to determine if CRISPR-MVLST might help identify the ecological origins of S. Enteritidis strains that contaminate chickens and eggs.

  18. Development of simple sequence repeat (SSR) markers that are ...

    African Journals Online (AJOL)

    Simple sequence repeats (SSRs) markers were developed through data mining of 3,803 expressed sequence tags (ESTs) previously published. A total of 144 di- to penta-type SSRs were identified and they were screened for polymorphism between two turnip cultivars, 'Tsuda' and 'Yurugi Akamaru'. Out of 90 EST-SSRs for ...

  19. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    Directory of Open Access Journals (Sweden)

    Charlotte Rehm

    Full Text Available In prokaryotes simple sequence repeats (SSRs with unit sizes of 1-5 nucleotides (nt are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4 structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc, Xanthomonas axonopodis pv. citri str. 306 (Xac, and Nostoc sp. strain PCC7120 (Ana. In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.

  20. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    Science.gov (United States)

    Rehm, Charlotte; Wurmthaler, Lena A; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S

    2015-01-01

    In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1-5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.

  1. SSR_pipeline--computer software for the identification of microsatellite sequences from paired-end Illumina high-throughput DNA sequence data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (SSRs; for example, microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains three analysis modules along with a fourth control module that can be used to automate analyses of large volumes of data. The modules are used to (1) identify the subset of paired-end sequences that pass quality standards, (2) align paired-end reads into a single composite DNA sequence, and (3) identify sequences that possess microsatellites conforming to user specified parameters. Each of the three separate analysis modules also can be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc). All modules are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, Windows). The program suite relies on a compiled Python extension module to perform paired-end alignments. Instructions for compiling the extension from source code are provided in the documentation. Users who do not have Python installed on their computers or who do not have the ability to compile software also may choose to download packaged executable files. These files include all Python scripts, a copy of the compiled extension module, and a minimal installation of Python in a single binary executable. See program documentation for more information.

  2. Googling DNA sequences on the World Wide Web.

    Science.gov (United States)

    Hajibabaei, Mehrdad; Singer, Gregory A C

    2009-11-10

    New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bioinformatics applications. We have developed a novel algorithm and implemented it for searching species-specific genomic sequences, DNA barcodes, by using popular web-based methods such as Google. We developed an alignment independent character based algorithm based on dividing a sequence library (DNA barcodes) and query sequence to words. The actual search is conducted by conventional search tools such as freely available Google Desktop Search. We implemented our algorithm in two exemplar packages. We developed pre and post-processing software to provide customized input and output services, respectively. Our analysis of all publicly available DNA barcode sequences shows a high accuracy as well as rapid results. Our method makes use of conventional web-based technologies for specialized genetic data. It provides a robust and efficient solution for sequence search on the web. The integration of our search method for large-scale sequence libraries such as DNA barcodes provides an excellent web-based tool for accessing this information and linking it to other available categories of information on the web.

  3. Our love-hate relationship with DNA barcodes, the Y2K problem, and the search for next generation barcodes

    Directory of Open Access Journals (Sweden)

    Jeffrey M. Marcus

    2018-01-01

    Full Text Available DNA barcodes are very useful for species identification especially when identification by traditional morphological characters is difficult. However, the short mitochondrial and chloroplast barcodes currently in use often fail to distinguish between closely related species, are prone to lateral transfer, and provide inadequate phylogenetic resolution, particularly at deeper nodes. The deficiencies of short barcode identifiers are similar to the deficiencies of the short year identifiers that caused the Y2K problem in computer science. The resolution of the Y2K problem was to increase the size of the year identifiers. The performance of conventional mitochondrial COI barcodes for phylogenetics was compared with the performance of complete mitochondrial genomes and nuclear ribosomal RNA repeats obtained by genome skimming for a set of caddisfly taxa (Insect Order Trichoptera. The analysis focused on Trichoptera Family Hydropsychidae, the net-spinning caddisflies, which demonstrates many of the frustrating limitations of current barcodes. To conduct phylogenetic comparisons, complete mitochondrial genomes (15 kb each and nuclear ribosomal repeats (9 kb each from six caddisfly species were sequenced, assembled, and are reported for the first time. These sequences were analyzed in comparison with eight previously published trichopteran mitochondrial genomes and two triochopteran rRNA repeats, plus outgroup sequences from sister clade Lepidoptera (butterflies and moths. COI trees were not well-resolved, had low bootstrap support, and differed in topology from prior phylogenetic analyses of the Trichoptera. Phylogenetic trees based on mitochondrial genomes or rRNA repeats were well-resolved with high bootstrap support and were largely congruent with each other. Because they are easily sequenced by genome skimming, provide robust phylogenetic resolution at various phylogenetic depths, can better distinguish between closely related species, and (in the

  4. Graphene nanodevices for DNA sequencing

    NARCIS (Netherlands)

    Heerema, S.J.; Dekker, C.

    2016-01-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with

  5. Gomphid DNA sequence data

    Data.gov (United States)

    U.S. Environmental Protection Agency — DNA sequence data for several genetic loci. This dataset is not publicly accessible because: It's already publicly available on GenBank. It can be accessed through...

  6. Characteristics of alternating current hopping conductivity in DNA sequences

    Institute of Scientific and Technical Information of China (English)

    Ma Song-Shan; Xu Hui; Wang Huan-You; Guo Rui

    2009-01-01

    This paper presents a model to describe alternating current (AC) conductivity of DNA sequences,in which DNA is considered as a one-dimensional (1D) disordered system,and electrons transport via hopping between localized states.It finds that AC conductivity in DNA sequences increases as the frequency of the external electric field rises,and it takes the form of σac(ω)~ω2 ln2(1/ω).Also AC conductivity of DNA sequences increases with the increase of temperature,this phenomenon presents characteristics of weak temperature-dependence.Meanwhile,the AC conductivity in an off diagonally correlated case is much larger than that in the uncorrelated case of the Anderson limit in low temperatures,which indicates that the off-diagonal correlations in DNA sequences have a great effect on the AC conductivity,while at high temperature the off-diagonal correlations no longer play a vital role in electric transport. In addition,the proportion of nucleotide pairs p also plays an important role in AC electron transport of DNA sequences.For p<0.5,the conductivity of DNA sequence decreases with the increase of p,while for p > 0.5,the conductivity increases with the increase of p.

  7. Recombination-dependent replication and gene conversion homogenize repeat sequences and diversify plastid genome structure.

    Science.gov (United States)

    Ruhlman, Tracey A; Zhang, Jin; Blazier, John C; Sabir, Jamal S M; Jansen, Robert K

    2017-04-01

    There is a misinterpretation in the literature regarding the variable orientation of the small single copy region of plastid genomes (plastomes). The common phenomenon of small and large single copy inversion, hypothesized to occur through intramolecular recombination between inverted repeats (IR) in a circular, single unit-genome, in fact, more likely occurs through recombination-dependent replication (RDR) of linear plastome templates. If RDR can be primed through both intra- and intermolecular recombination, then this mechanism could not only create inversion isomers of so-called single copy regions, but also an array of alternative sequence arrangements. We used Illumina paired-end and PacBio single-molecule real-time (SMRT) sequences to characterize repeat structure in the plastome of Monsonia emarginata (Geraniaceae). We used OrgConv and inspected nucleotide alignments to infer ancestral nucleotides and identify gene conversion among repeats and mapped long (>1 kb) SMRT reads against the unit-genome assembly to identify alternative sequence arrangements. Although M. emarginata lacks the canonical IR, we found that large repeats (>1 kilobase; kb) represent ∼22% of the plastome nucleotide content. Among the largest repeats (>2 kb), we identified GC-biased gene conversion and mapping filtered, long SMRT reads to the M. emarginata unit-genome assembly revealed alternative, substoichiometric sequence arrangements. We offer a model based on RDR and gene conversion between long repeated sequences in the M. emarginata plastome and provide support that both intra-and intermolecular recombination between large repeats, particularly in repeat-rich plastomes, varies unit-genome structure while homogenizing the nucleotide sequence of repeats. © 2017 Botanical Society of America.

  8. Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

    Science.gov (United States)

    Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

    2010-05-07

    Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.

  9. Selection pressure on human STR loci and its relevance in repeat expansion disease

    KAUST Repository

    Shimada, Makoto K.

    2016-06-11

    Short Tandem Repeats (STRs) comprise repeats of one to several base pairs. Because of the high mutability due to strand slippage during DNA synthesis, rapid evolutionary change in the number of repeating units directly shapes the range of repeat-number variation according to selection pressure. However, the remaining questions include: Why are STRs causing repeat expansion diseases maintained in the human population; and why are these limited to neurodegenerative diseases? By evaluating the genome-wide selection pressure on STRs using the database we constructed, we identified two different patterns of relationship in repeat-number polymorphisms between DNA and amino-acid sequences, although both patterns are evolutionary consequences of avoiding the formation of harmful long STRs. First, a mixture of degenerate codons is represented in poly-proline (poly-P) repeats. Second, long poly-glutamine (poly-Q) repeats are favored at the protein level; however, at the DNA level, STRs encoding long poly-Qs are frequently divided by synonymous SNPs. Furthermore, significant enrichments of apoptosis and neurodevelopment were biological processes found specifically in genes encoding poly-Qs with repeat polymorphism. This suggests the existence of a specific molecular function for polymorphic and/or long poly-Q stretches. Given that the poly-Qs causing expansion diseases were longer than other poly-Qs, even in healthy subjects, our results indicate that the evolutionary benefits of long and/or polymorphic poly-Q stretches outweigh the risks of long CAG repeats predisposing to pathological hyper-expansions. Molecular pathways in neurodevelopment requiring long and polymorphic poly-Q stretches may provide a clue to understanding why poly-Q expansion diseases are limited to neurodegenerative diseases. © 2016, Springer-Verlag Berlin Heidelberg.

  10. Inter Simple Sequence Repeat DNA (ISSR) Polymorphism Utility in Haploid Nicotiana Alata Irradiated Plants for Finding Markers Associated with Gamma Irradiation and Salinity

    International Nuclear Information System (INIS)

    El-Fiki, A.; Adly, M.; El-Metabteb, G.

    2017-01-01

    Nicotiana alata is an ornamental plant. It is a member of family Solanasea. Tobacco (Nicotiana spp.) is one of the most important commercial crops in the world. Wild Nicotiana species, as a store house of genes for several diseases and pests, in addition to genes for several important phytochemicals and quality traits which are not present in cultivated varieties. Inter simple sequence repeat DNA (ISSR) analysis was used to determine the degree of genetic variation in treated haploid Nicotiana alata plants. Total genomic DNAs from different treated haploid plant lets were amplified using five specific primers. All primers were polymorphic. A total of 209 bands were amplified of which 135 (59.47%) polymorphic across the radiation treatments. Whilst, the level of polymorphism among the salinity treatments were 181 (85.6 %). Whereas, the polymorphism among the combined effects between gamma radiation doses and salinity concentrations were 283 ( 73.95% ). Treatments relationships were estimated through cluster analysis (UPGMA) based on ISSR data

  11. [Knocking-out extra domain A alternative splice fragment of fibronectin using a clustered regularly interspaced short palindromic repeats/associated proteins 9 system].

    Science.gov (United States)

    Yang, Yue; Wang, Haicheng; Xu, Shuyu; Peng, Jing; Jiang, Jiuhui; Li, Cuiying

    2015-08-01

    To investigate the effect of the fibronectin extra domain A on the aggressiveness of salivary adenoid cystic carcinoma (SACC) cells, via the clustered regularly interspaced short palindromic repeats (CRISPR)/ associated proteins (Cas) system. One sgRNA was designed to target the upstream of the genome sequences of extra domain A(EDA) exon and the downstream. Then the sgRNA was linked into plasmid PX-330 and transfected into SACC-83 cells. PCR and DNA sequence were used to testify the knockout cells, and the monoclones of EDA absent SACC cells were selected (A+C-2, A+C-6, B+C-10). CCK-8 cell proliferation and invasion was then tested in control group and the experimental group. The sgRNA was successfully linked into PX-330 plasmid. Part of adenoid cystic carcinoma cells' SACC-83 genomic EDA exon was knocked out, and the knockdown efficiency was above 70%, but the total amount of fibronectin did not change significantly. Three monoclones of EDA absent SACC- 83 cells were successfully selected with diminished migration and proliferation. The CRISPR/Cas9 system was a simplified system with relatively high knockout efficiency and EDA knockout could inhibiting SACC cell's mobility and invasiveness.

  12. ACCELERATED EVOLUTION OF LAND SNAILS MANDARINA IN THE OCEANIC BONIN ISLANDS: EVIDENCE FROM MITOCHONDRIAL DNA SEQUENCES.

    Science.gov (United States)

    Chiba, Satoshi

    1999-04-01

    An endemic land snail genus Mandarina of the oceanic Bonin (Ogasawara) Islands shows exceptionally rapid evolution not only of morphological and ecological traits, but of DNA sequence. A phylogenetic relationship based on mitochondrial DNA (mtDNA) sequences suggests that morphological differences equivalent to the differences between families were produced between Mandarina and its ancestor during the Pleistocene. The inferred phylogeny shows that species with similar morphologies and life habitats appeared repeatedly and independently in different lineages and islands at different times. Sequential adaptive radiations occurred in different islands of the Bonin Islands and species occupying arboreal, semiarboreal, and terrestrial habitat arose independently in each island. Because of a close relationship between shell morphology and life habitat, independent evolution of the same life habitat in different islands created species possesing the same shell morphology in different islands and lineages. This rapid evolution produced some incongruences between phylogenetic relationship and species taxonomy. Levels of sequence divergence of mtDNA among the species of Mandarina is extremely high. The maximum level of sequence divergence at 16S and 12S ribosomal RNA sequence within Mandarina are 18.7% and 17.7%, respectively, and this suggests that evolution of mtDNA of Mandarina is extremely rapid, more than 20 times faster than the standard rate in other animals. The present examination reveals that evolution of morphological and ecological traits occurs at extremely high rates in the time of adaptive radiation, especially in fragmented environments. © 1999 The Society for the Study of Evolution.

  13. Interspecies hybridization on DNA resequencing microarrays: efficiency of sequence recovery and accuracy of SNP detection in human, ape, and codfish mitochondrial DNA genomes sequenced on a human-specific MitoChip

    Directory of Open Access Journals (Sweden)

    Carr Steven M

    2007-09-01

    Full Text Available Abstract Background Iterative DNA "resequencing" on oligonucleotide microarrays offers a high-throughput method to measure intraspecific biodiversity, one that is especially suited to SNP-dense gene regions such as vertebrate mitochondrial (mtDNA genomes. However, costs of single-species design and microarray fabrication are prohibitive. A cost-effective, multi-species strategy is to hybridize experimental DNAs from diverse species to a common microarray that is tiled with oligonucleotide sets from multiple, homologous reference genomes. Such a strategy requires that cross-hybridization between the experimental DNAs and reference oligos from the different species not interfere with the accurate recovery of species-specific data. To determine the pattern and limits of such interspecific hybridization, we compared the efficiency of sequence recovery and accuracy of SNP identification by a 15,452-base human-specific microarray challenged with human, chimpanzee, gorilla, and codfish mtDNA genomes. Results In the human genome, 99.67% of the sequence was recovered with 100.0% accuracy. Accuracy of SNP identification declines log-linearly with sequence divergence from the reference, from 0.067 to 0.247 errors per SNP in the chimpanzee and gorilla genomes, respectively. Efficiency of sequence recovery declines with the increase of the number of interspecific SNPs in the 25b interval tiled by the reference oligonucleotides. In the gorilla genome, which differs from the human reference by 10%, and in which 46% of these 25b regions contain 3 or more SNP differences from the reference, only 88% of the sequence is recoverable. In the codfish genome, which differs from the reference by > 30%, less than 4% of the sequence is recoverable, in short islands ≥ 12b that are conserved between primates and fish. Conclusion Experimental DNAs bind inefficiently to homologous reference oligonucleotide sets on a re-sequencing microarray when their sequences differ by

  14. An extended sequence specificity for UV-induced DNA damage.

    Science.gov (United States)

    Chung, Long H; Murray, Vincent

    2018-01-01

    The sequence specificity of UV-induced DNA damage was determined with a higher precision and accuracy than previously reported. UV light induces two major damage adducts: cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). Employing capillary electrophoresis with laser-induced fluorescence and taking advantages of the distinct properties of the CPDs and 6-4PPs, we studied the sequence specificity of UV-induced DNA damage in a purified DNA sequence using two approaches: end-labelling and a polymerase stop/linear amplification assay. A mitochondrial DNA sequence that contained a random nucleotide composition was employed as the target DNA sequence. With previous methodology, the UV sequence specificity was determined at a dinucleotide or trinucleotide level; however, in this paper, we have extended the UV sequence specificity to a hexanucleotide level. With the end-labelling technique (for 6-4PPs), the consensus sequence was found to be 5'-GCTC*AC (where C* is the breakage site); while with the linear amplification procedure, it was 5'-TCTT*AC. With end-labelling, the dinucleotide frequency of occurrence was highest for 5'-TC*, 5'-TT* and 5'-CC*; whereas it was 5'-TT* for linear amplification. The influence of neighbouring nucleotides on the degree of UV-induced DNA damage was also examined. The core sequences consisted of pyrimidine nucleotides 5'-CTC* and 5'-CTT* while an A at position "1" and C at position "2" enhanced UV-induced DNA damage. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.

  15. Sequence dependence of electron-induced DNA strand breakage revealed by DNA nanoarrays

    DEFF Research Database (Denmark)

    Keller, Adrian; Rackwitz, Jenny; Cauët, Emilie

    2014-01-01

    The electronic structure of DNA is determined by its nucleotide sequence, which is for instance exploited in molecular electronics. Here we demonstrate that also the DNA strand breakage induced by low-energy electrons (18 eV) depends on the nucleotide sequence. To determine the absolute cross sec...

  16. Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) RNAs in the Porphyromonas gingivalis CRISPR-Cas I-C System.

    Science.gov (United States)

    Burmistrz, Michal; Rodriguez Martinez, Jose Ignacio; Krochmal, Daniel; Staniec, Dominika; Pyrc, Krzysztof

    2017-12-01

    The CRISPR-Cas (clustered regularly interspaced short palindromic repeat-CRISPR-associated protein) system is unique to prokaryotes and provides the majority of bacteria and archaea with immunity against nucleic acids of foreign origin. CRISPR RNAs (crRNAs) are the key element of this system, since they are responsible for its selectivity and effectiveness. Typical crRNAs consist of a spacer sequence flanked with 5' and 3' handles originating from repeat sequences that are important for recognition of these small RNAs by the Cas machinery. In this investigation, we studied the type I-C CRISPR-Cas system in Porphyromonas gingivalis , a human pathogen associated with periodontitis, rheumatoid arthritis, cardiovascular disease, and aspiration pneumonia. We demonstrated the importance of the 5' handle for crRNA recognition by the effector complex and consequently activity, as well as secondary trimming of the 3' handle, which was not affected by modifications of the repeat sequence. IMPORTANCE Porphyromonas gingivalis , a clinically relevant Gram-negative, anaerobic bacterium, is one of the major etiologic agents of periodontitis and has been linked with the development of other clinical conditions, including rheumatoid arthritis, cardiovascular disease, and aspiration pneumonia. The presented results on the biogenesis and functions of crRNAs expand our understanding of CRISPR-Cas cellular defenses in P. gingivalis and of horizontal gene transfer in bacteria. Copyright © 2017 American Society for Microbiology.

  17. Recombinational DNA repair is regulated by compartmentalization of DNA lesions at the nuclear pore complex

    DEFF Research Database (Denmark)

    Géli, Vincent; Lisby, Michael

    2015-01-01

    and colleagues shows that also physiological threats to genome integrity such as DNA secondary structure-forming triplet repeat sequences relocalize to the NPC during DNA replication. Mutants that fail to reposition the triplet repeat locus to the NPC cause repeat instability. Here, we review the types of DNA...... lesions that relocalize to the NPC, the putative mechanisms of relocalization, and the types of recombinational repair that are stimulated by the NPC, and present a model for NPC-facilitated repair....

  18. Multiplexed detection of DNA sequences using a competitive displacement assay in a microfluidic SERRS-based device.

    Science.gov (United States)

    Yazdi, Soroush H; Giles, Kristen L; White, Ian M

    2013-11-05

    We demonstrate sensitive and multiplexed detection of DNA sequences through a surface enhanced resonance Raman spectroscopy (SERRS)-based competitive displacement assay in an integrated microsystem. The use of the competitive displacement scheme, in which the target DNA sequence displaces a Raman-labeled reporter sequence that has lower affinity for the immobilized probe, enables detection of unlabeled target DNA sequences with a simple single-step procedure. In our implementation, the displacement reaction occurs in a microporous packed column of silica beads prefunctionalized with probe-reporter pairs. The use of a functionalized packed-bead column in a microfluidic channel provides two major advantages: (i) immobilization surface chemistry can be performed as a batch process instead of on a chip-by-chip basis, and (ii) the microporous network eliminates the diffusion limitations of a typical biological assay, which increases the sensitivity. Packed silica beads are also leveraged to improve the SERRS detection of the Raman-labeled reporter. Following displacement, the reporter adsorbs onto aggregated silver nanoparticles in a microfluidic mixer; the nanoparticle-reporter conjugates are then trapped and concentrated in the silica bead matrix, which leads to a significant increase in plasmonic nanoparticles and adsorbed Raman reporters within the detection volume as compared to an open microfluidic channel. The experimental results reported here demonstrate detection down to 100 pM of the target DNA sequence, and the experiments are shown to be specific, repeatable, and quantitative. Furthermore, we illustrate the advantage of using SERRS by demonstrating multiplexed detection. The sensitivity of the assay, combined with the advantages of multiplexed detection and single-step operation with unlabeled target sequences makes this method attractive for practical applications. Importantly, while we illustrate DNA sequence detection, the SERRS-based competitive

  19. The future of forensic DNA analysis

    Science.gov (United States)

    Butler, John M.

    2015-01-01

    The author's thoughts and opinions on where the field of forensic DNA testing is headed for the next decade are provided in the context of where the field has come over the past 30 years. Similar to the Olympic motto of ‘faster, higher, stronger’, forensic DNA protocols can be expected to become more rapid and sensitive and provide stronger investigative potential. New short tandem repeat (STR) loci have expanded the core set of genetic markers used for human identification in Europe and the USA. Rapid DNA testing is on the verge of enabling new applications. Next-generation sequencing has the potential to provide greater depth of coverage for information on STR alleles. Familial DNA searching has expanded capabilities of DNA databases in parts of the world where it is allowed. Challenges and opportunities that will impact the future of forensic DNA are explored including the need for education and training to improve interpretation of complex DNA profiles. PMID:26101278

  20. Characteristics of alternating current hopping conductivity in DNA sequences

    International Nuclear Information System (INIS)

    Song-Shan, Ma; Hui, Xu; Huan-You, Wang; Rui, Guo

    2009-01-01

    This paper presents a model to describe alternating current (AC) conductivity of DNA sequences, in which DNA is considered as a one-dimensional (1D) disordered system, and electrons transport via hopping between localized states. It finds that AC conductivity in DNA sequences increases as the frequency of the external electric field rises, and it takes the form of ø ac (ω) ∼ ω 2 ln 2 (1/ω). Also AC conductivity of DNA sequences increases with the increase of temperature, this phenomenon presents characteristics of weak temperature-dependence. Meanwhile, the AC conductivity in an off-diagonally correlated case is much larger than that in the uncorrelated case of the Anderson limit in low temperatures, which indicates that the off-diagonal correlations in DNA sequences have a great effect on the AC conductivity, while at high temperature the off-diagonal correlations no longer play a vital role in electric transport. In addition, the proportion of nucleotide pairs p also plays an important role in AC electron transport of DNA sequences. For p < 0.5, the conductivity of DNA sequence decreases with the increase of p, while for p ≥ 0.5, the conductivity increases with the increase of p. (cross-disciplinary physics and related areas of science and technology)

  1. Sequence-dependent DNA deformability studied using molecular dynamics simulations.

    Science.gov (United States)

    Fujii, Satoshi; Kono, Hidetoshi; Takenaka, Shigeori; Go, Nobuhiro; Sarai, Akinori

    2007-01-01

    Proteins recognize specific DNA sequences not only through direct contact between amino acids and bases, but also indirectly based on the sequence-dependent conformation and deformability of the DNA (indirect readout). We used molecular dynamics simulations to analyze the sequence-dependent DNA conformations of all 136 possible tetrameric sequences sandwiched between CGCG sequences. The deformability of dimeric steps obtained by the simulations is consistent with that by the crystal structures. The simulation results further showed that the conformation and deformability of the tetramers can highly depend on the flanking base pairs. The conformations of xATx tetramers show the most rigidity and are not affected by the flanking base pairs and the xYRx show by contrast the greatest flexibility and change their conformations depending on the base pairs at both ends, suggesting tetramers with the same central dimer can show different deformabilities. These results suggest that analysis of dimeric steps alone may overlook some conformational features of DNA and provide insight into the mechanism of indirect readout during protein-DNA recognition. Moreover, the sequence dependence of DNA conformation and deformability may be used to estimate the contribution of indirect readout to the specificity of protein-DNA recognition as well as nucleosome positioning and large-scale behavior of nucleic acids.

  2. DNA-directed alkylating ligands as potential antitumor agents: sequence specificity of alkylation by intercalating aniline mustards.

    Science.gov (United States)

    Prakash, A S; Denny, W A; Gourdie, T A; Valu, K K; Woodgate, P D; Wakelin, L P

    1990-10-23

    The sequence preferences for alkylation of a series of novel parasubstituted aniline mustards linked to the DNA-intercalating chromophore 9-aminoacridine by an alkyl chain of variable length were studied by using procedures analogous to Maxam-Gilbert reactions. The compounds alkylate DNA at both guanine and adenine sites. For mustards linked to the acridine by a short alkyl chain through a para O- or S-link group, 5'-GT sequences are the most preferred sites at which N7-guanine alkylation occurs. For analogues with longer chain lengths, the preference of 5'-GT sequences diminishes in favor of N7-adenine alkylation at the complementary 5'-AC sequence. Magnesium ions are shown to selectively inhibit alkylation at the N7 of adenine (in the major groove) by these compounds but not the alkylation at the N3 of adenine (in the minor groove) by the antitumor antibiotic CC-1065. Effects of chromophore variation were also studied by using aniline mustards linked to quinazoline and sterically hindered tert-butyl-9-aminoacridine chromophores. The results demonstrate that in this series of DNA-directed mustards the noncovalent interactions of the carrier chromophores with DNA significantly modify the sequence selectivity of alkylation by the mustard. Relationships between the DNA alkylation patterns of these compounds and their biological activities are discussed.

  3. Laser mass spectrometry for DNA sequencing, disease diagnosis, and fingerprinting

    Energy Technology Data Exchange (ETDEWEB)

    Winston Chen, C.H.; Taranenko, N.I.; Zhu, Y.F.; Chung, C.N.; Allman, S.L.

    1997-03-01

    Since laser mass spectrometry has the potential for achieving very fast DNA analysis, the authors recently applied it to DNA sequencing, DNA typing for fingerprinting, and DNA screening for disease diagnosis. Two different approaches for sequencing DNA have been successfully demonstrated. One is to sequence DNA with DNA ladders produced from Snager`s enzymatic method. The other is to do direct sequencing without DNA ladders. The need for quick DNA typing for identification purposes is critical for forensic application. The preliminary results indicate laser mass spectrometry can possibly be used for rapid DNA fingerprinting applications at a much lower cost than gel electrophoresis. Population screening for certain genetic disease can be a very efficient step to reducing medical costs through prevention. Since laser mass spectrometry can provide very fast DNA analysis, the authors applied laser mass spectrometry to disease diagnosis. Clinical samples with both base deletion and point mutation have been tested with complete success.

  4. Deep sequencing reveals distinct patterns of DNA methylation in prostate cancer.

    Science.gov (United States)

    Kim, Jung H; Dhanasekaran, Saravana M; Prensner, John R; Cao, Xuhong; Robinson, Daniel; Kalyana-Sundaram, Shanker; Huang, Christina; Shankar, Sunita; Jing, Xiaojun; Iyer, Matthew; Hu, Ming; Sam, Lee; Grasso, Catherine; Maher, Christopher A; Palanisamy, Nallasivam; Mehra, Rohit; Kominsky, Hal D; Siddiqui, Javed; Yu, Jindan; Qin, Zhaohui S; Chinnaiyan, Arul M

    2011-07-01

    Beginning with precursor lesions, aberrant DNA methylation marks the entire spectrum of prostate cancer progression. We mapped the global DNA methylation patterns in select prostate tissues and cell lines using MethylPlex-next-generation sequencing (M-NGS). Hidden Markov model-based next-generation sequence analysis identified ∼68,000 methylated regions per sample. While global CpG island (CGI) methylation was not differential between benign adjacent and cancer samples, overall promoter CGI methylation significantly increased from ~12.6% in benign samples to 19.3% and 21.8% in localized and metastatic cancer tissues, respectively (P-value prostate tissues, 2481 differentially methylated regions (DMRs) are cancer-specific, including numerous novel DMRs. A novel cancer-specific DMR in the WFDC2 promoter showed frequent methylation in cancer (17/22 tissues, 6/6 cell lines), but not in the benign tissues (0/10) and normal PrEC cells. Integration of LNCaP DNA methylation and H3K4me3 data suggested an epigenetic mechanism for alternate transcription start site utilization, and these modifications segregated into distinct regions when present on the same promoter. Finally, we observed differences in repeat element methylation, particularly LINE-1, between ERG gene fusion-positive and -negative cancers, and we confirmed this observation using pyrosequencing on a tissue panel. This comprehensive methylome map will further our understanding of epigenetic regulation in prostate cancer progression.

  5. Electrochemical detection of DNA triplet repeat expansion

    Czech Academy of Sciences Publication Activity Database

    Fojta, Miroslav; Havran, Luděk; Vojtíšková, Marie; Paleček, Emil

    2004-01-01

    Roč. 126, č. 21 (2004), s. 6532-6533 ISSN 0002-7863 R&D Projects: GA AV ČR IAA4004402; GA AV ČR IBS5004355; GA AV ČR KJB4004302; GA AV ČR KSK4055109 Institutional research plan: CEZ:AV0Z5004920 Keywords : DNA triplet repeat expansion * PCR amplification * neurodegenerative diseases Subject RIV: BO - Biophysics Impact factor: 6.903, year: 2004

  6. New insights into Trypanosoma cruzi evolution, genotyping and molecular diagnostics from satellite DNA sequence analysis.

    Directory of Open Access Journals (Sweden)

    Juan C Ramírez

    2017-12-01

    Full Text Available Trypanosoma cruzi has been subdivided into seven Discrete Typing Units (DTUs, TcI-TcVI and Tcbat. Two major evolutionary models have been proposed to explain the origin of hybrid lineages, but while it is widely accepted that TcV and TcVI are the result of genetic exchange between TcII and TcIII strains, the origin of TcIII and TcIV is still a matter of debate. T. cruzi satellite DNA (SatDNA, comprised of 195 bp units organized in tandem repeats, from both TcV and TcVI stocks were found to have SatDNA copies type TcI and TcII; whereas contradictory results were observed for TcIII stocks and no TcIV sequence has been analyzed yet. Herein, we have gone deeper into this matter analyzing 335 distinct SatDNA sequences from 19 T. cruzi stocks representative of DTUs TcI-TcVI for phylogenetic inference. Bayesian phylogenetic tree showed that all sequences were grouped in three major clusters, which corresponded to sequences from DTUs TcI/III, TcII and TcIV; whereas TcV and TcVI stocks had two sets of sequences distributed into TcI/III and TcII clusters. As expected, the lowest genetic distances were found between TcI and TcIII, and between TcV and TcVI sequences; whereas the highest ones were observed between TcII and TcI/III, and among TcIV sequences and those from the remaining DTUs. In addition, signature patterns associated to specific T. cruzi lineages were identified and new primers that improved SatDNA-based qPCR sensitivity were designed. Our findings support the theory that TcIII is not the result of a hybridization event between TcI and TcII, and that TcIV had an independent origin from the other DTUs, contributing to clarifying the evolutionary history of T. cruzi lineages. Moreover, this work opens the possibility of typing samples from Chagas disease patients with low parasitic loads and improving molecular diagnostic methods of T. cruzi infection based on SatDNA sequence amplification.

  7. New insights into Trypanosoma cruzi evolution, genotyping and molecular diagnostics from satellite DNA sequence analysis.

    Science.gov (United States)

    Ramírez, Juan C; Torres, Carolina; Curto, María de Los A; Schijman, Alejandro G

    2017-12-01

    Trypanosoma cruzi has been subdivided into seven Discrete Typing Units (DTUs), TcI-TcVI and Tcbat. Two major evolutionary models have been proposed to explain the origin of hybrid lineages, but while it is widely accepted that TcV and TcVI are the result of genetic exchange between TcII and TcIII strains, the origin of TcIII and TcIV is still a matter of debate. T. cruzi satellite DNA (SatDNA), comprised of 195 bp units organized in tandem repeats, from both TcV and TcVI stocks were found to have SatDNA copies type TcI and TcII; whereas contradictory results were observed for TcIII stocks and no TcIV sequence has been analyzed yet. Herein, we have gone deeper into this matter analyzing 335 distinct SatDNA sequences from 19 T. cruzi stocks representative of DTUs TcI-TcVI for phylogenetic inference. Bayesian phylogenetic tree showed that all sequences were grouped in three major clusters, which corresponded to sequences from DTUs TcI/III, TcII and TcIV; whereas TcV and TcVI stocks had two sets of sequences distributed into TcI/III and TcII clusters. As expected, the lowest genetic distances were found between TcI and TcIII, and between TcV and TcVI sequences; whereas the highest ones were observed between TcII and TcI/III, and among TcIV sequences and those from the remaining DTUs. In addition, signature patterns associated to specific T. cruzi lineages were identified and new primers that improved SatDNA-based qPCR sensitivity were designed. Our findings support the theory that TcIII is not the result of a hybridization event between TcI and TcII, and that TcIV had an independent origin from the other DTUs, contributing to clarifying the evolutionary history of T. cruzi lineages. Moreover, this work opens the possibility of typing samples from Chagas disease patients with low parasitic loads and improving molecular diagnostic methods of T. cruzi infection based on SatDNA sequence amplification.

  8. A family of DNA repeats in Aspergillus nidulans has assimilated degenerated retrotransposons

    DEFF Research Database (Denmark)

    Nielsen, M.L.; Hermansen, T.D.; Aleksenko, Alexei Y.

    2001-01-01

    In the course of a chromosomal walk towards the centromere of chromosome IV of Aspergillus nidulans, several cross- hybridizing genomic cosmid clones were isolated. Restriction mapping of two such clones revealed that their restriction patterns were similar in a region of at least 15 kb, indicati......) phenomenon, first described in Neurospora crassa, may have operated in A. nidulans. The data indicate that this family of repeats has assimilated mobile elements that subsequently degenerated but then underwent further duplications as a part of the host repeats....... the presence of a large repeat. The nature of the repeat was further investigated by sequencing and Southern analysis. The study revealed a family of long dispersed repeats with a high degree of sequence similarity. The number and location of the repeats vary between wild isolates. Two copies of the repeat...

  9. Sequence-specific DNA alkylation by tandem Py-Im polyamide conjugates.

    Science.gov (United States)

    Taylor, Rhys Dylan; Kawamoto, Yusuke; Hashiya, Kaori; Bando, Toshikazu; Sugiyama, Hiroshi

    2014-09-01

    Tandem N-methylpyrrole-N-methylimidazole (Py-Im) polyamides with good sequence-specific DNA-alkylating activities have been designed and synthesized. Three alkylating tandem Py-Im polyamides with different linkers, which each contained the same moiety for the recognition of a 10 bp DNA sequence, were evaluated for their reactivity and selectivity by DNA alkylation, using high-resolution denaturing gel electrophoresis. All three conjugates displayed high reactivities for the target sequence. In particular, polyamide 1, which contained a β-alanine linker, displayed the most-selective sequence-specific alkylation towards the target 10 bp DNA sequence. The tandem Py-Im polyamide conjugates displayed greater sequence-specific DNA alkylation than conventional hairpin Py-Im polyamide conjugates (4 and 5). For further research, the design of tandem Py-Im polyamide conjugates could play an important role in targeting specific gene sequences. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Biomolecule Sequencer: Next-Generation DNA Sequencing Technology for In-Flight Environmental Monitoring, Research, and Beyond

    Science.gov (United States)

    Smith, David J.; Burton, Aaron; Castro-Wallace, Sarah; John, Kristen; Stahl, Sarah E.; Dworkin, Jason Peter; Lupisella, Mark L.

    2016-01-01

    On the International Space Station (ISS), technologies capable of rapid microbial identification and disease diagnostics are not currently available. NASA still relies upon sample return for comprehensive, molecular-based sample characterization. Next-generation DNA sequencing is a powerful approach for identifying microorganisms in air, water, and surfaces onboard spacecraft. The Biomolecule Sequencer payload, manifested to SpaceX-9 and scheduled on the Increment 4748 research plan (June 2016), will assess the functionality of a commercially-available next-generation DNA sequencer in the microgravity environment of ISS. The MinION device from Oxford Nanopore Technologies (Oxford, UK) measures picoamp changes in electrical current dependent on nucleotide sequences of the DNA strand migrating through nanopores in the system. The hardware is exceptionally small (9.5 x 3.2 x 1.6 cm), lightweight (120 grams), and powered only by a USB connection. For the ISS technology demonstration, the Biomolecule Sequencer will be powered by a Microsoft Surface Pro3. Ground-prepared samples containing lambda bacteriophage, Escherichia coli, and mouse genomic DNA, will be launched and stored frozen on the ISS until experiment initiation. Immediately prior to sequencing, a crew member will collect and thaw frozen DNA samples, connect the sequencer to the Surface Pro3, inject thawed samples into a MinION flow cell, and initiate sequencing. At the completion of the sequencing run, data will be downlinked for ground analysis. Identical, synchronous ground controls will be used for data comparisons to determine sequencer functionality, run-time sequence, current dynamics, and overall accuracy. We will present our latest results from the ISS flight experiment the first time DNA has ever been sequenced in space and discuss the many potential applications of the Biomolecule Sequencer for environmental monitoring, medical diagnostics, higher fidelity and more adaptable Space Biology Human

  11. Y-Chromosome short tandem repeat, typing technology, locus ...

    African Journals Online (AJOL)

    Aghomotsegin

    2015-07-08

    Jul 8, 2015 ... Y-Chromosome short tandem repeat, typing technology, locus information and allele frequency in different population: A review. Muhanned Abdulhasan Kareem1, Ameera Omran Hussein2 and Imad Hadi Hameed2*. 1Babylon University, Centre of Environmental Research, Hilla City, Iraq. 2Department of ...

  12. X-Chromosome short tandem repeat, advantages and typing ...

    African Journals Online (AJOL)

    Microsatellites of the X-chromosome have been increasingly studied in recent years as a useful tool in forensic analysis. This review describes some details of X-chromosomal short tandem repeat (STR) analysis. Among them are: microsatellites, amplification using polymerase chain reaction (PCR) of STRs, PCR product ...

  13. Selection pressure on human STR loci and its relevance in repeat expansion disease

    KAUST Repository

    Shimada, Makoto K.; Sanbonmatsu, Ryoko; Yamaguchi-Kabata, Yumi; Yamasaki, Chisato; Suzuki, Yoshiyuki; Chakraborty, Ranajit; Gojobori, Takashi; Imanishi, Tadashi

    2016-01-01

    Short Tandem Repeats (STRs) comprise repeats of one to several base pairs. Because of the high mutability due to strand slippage during DNA synthesis, rapid evolutionary change in the number of repeating units directly shapes the range of repeat

  14. Identification and characterization of short tandem repeats in the Tibetan macaque genome based on resequencing data.

    Science.gov (United States)

    Liu, San-Xu; Hou, Wei; Zhang, Xue-Yan; Peng, Chang-Jun; Yue, Bi-Song; Fan, Zhen-Xin; Li, Jing

    2018-07-18

    The Tibetan macaque, which is endemic to China, is currently listed as a Near Endangered primate species by the International Union for Conservation of Nature (IUCN). Short tandem repeats (STRs) refer to repetitive elements of genome sequence that range in length from 1-6 bp. They are found in many organisms and are widely applied in population genetic studies. To clarify the distribution characteristics of genome-wide STRs and understand their variation among Tibetan macaques, we conducted a genome-wide survey of STRs with next-generation sequencing of five macaque samples. A total of 1 077 790 perfect STRs were mined from our assembly, with an N50 of 4 966 bp. Mono-nucleotide repeats were the most abundant, followed by tetra- and di-nucleotide repeats. Analysis of GC content and repeats showed consistent results with other macaques. Furthermore, using STR analysis software (lobSTR), we found that the proportion of base pair deletions in the STRs was greater than that of insertions in the five Tibetan macaque individuals (Pgenome showed good amplification efficiency and could be used to study population genetics in Tibetan macaques. The neighbor-joining tree classified the five macaques into two different branches according to their geographical origin, indicating high genetic differentiation between the Huangshan and Sichuan populations. We elucidated the distribution characteristics of STRs in the Tibetan macaque genome and provided an effective method for screening polymorphic STRs. Our results also lay a foundation for future genetic variation studies of macaques.

  15. Identification of Meconopsis species by a DNA barcode sequence ...

    African Journals Online (AJOL)

    Deoxyribonucleic acid (DNA) barcoding is a novel technology that uses a standard DNA sequence to facilitate species identification. Species identification is necessary for the authentication of traditional plant based medicines. Although a consensus has not been agreed regarding which DNA sequences can be used as ...

  16. Double-stranded endonuclease activity in Bacillus halodurans clustered regularly interspaced short palindromic repeats (CRISPR)-associated Cas2 protein.

    Science.gov (United States)

    Nam, Ki Hyun; Ding, Fran; Haitjema, Charles; Huang, Qingqiu; DeLisa, Matthew P; Ke, Ailong

    2012-10-19

    The CRISPR (clustered regularly interspaced short palindromic repeats) system is a prokaryotic RNA-based adaptive immune system against extrachromosomal genetic elements. Cas2 is a universally conserved core CRISPR-associated protein required for the acquisition of new spacers for CRISPR adaptation. It was previously characterized as an endoribonuclease with preference for single-stranded (ss)RNA. Here, we show using crystallography, mutagenesis, and isothermal titration calorimetry that the Bacillus halodurans Cas2 (Bha_Cas2) from the subtype I-C/Dvulg CRISPR instead possesses metal-dependent endonuclease activity against double-stranded (ds)DNA. This activity is consistent with its putative function in producing new spacers for insertion into the 5'-end of the CRISPR locus. Mutagenesis and isothermal titration calorimetry studies revealed that a single divalent metal ion (Mg(2+) or Mn(2+)), coordinated by a symmetric Asp pair in the Bha_Cas2 dimer, is involved in the catalysis. We envision that a pH-dependent conformational change switches Cas2 into a metal-binding competent conformation for catalysis. We further propose that the distinct substrate preferences among Cas2 proteins may be determined by the sequence and structure in the β1-α1 loop.

  17. Levenshtein error-correcting barcodes for multiplexed DNA sequencing

    NARCIS (Netherlands)

    Buschmann, Tilo; Bystrykh, Leonid V.

    2013-01-01

    Background: High-throughput sequencing technologies are improving in quality, capacity and costs, providing versatile applications in DNA and RNA research. For small genomes or fraction of larger genomes, DNA samples can be mixed and loaded together on the same sequencing track. This so-called

  18. Mapping vaccinia virus DNA replication origins at nucleotide level by deep sequencing.

    Science.gov (United States)

    Senkevich, Tatiana G; Bruno, Daniel; Martens, Craig; Porcella, Stephen F; Wolf, Yuri I; Moss, Bernard

    2015-09-01

    Poxviruses reproduce in the host cytoplasm and encode most or all of the enzymes and factors needed for expression and synthesis of their double-stranded DNA genomes. Nevertheless, the mode of poxvirus DNA replication and the nature and location of the replication origins remain unknown. A current but unsubstantiated model posits only leading strand synthesis starting at a nick near one covalently closed end of the genome and continuing around the other end to generate a concatemer that is subsequently resolved into unit genomes. The existence of specific origins has been questioned because any plasmid can replicate in cells infected by vaccinia virus (VACV), the prototype poxvirus. We applied directional deep sequencing of short single-stranded DNA fragments enriched for RNA-primed nascent strands isolated from the cytoplasm of VACV-infected cells to pinpoint replication origins. The origins were identified as the switching points of the fragment directions, which correspond to the transition from continuous to discontinuous DNA synthesis. Origins containing a prominent initiation point mapped to a sequence within the hairpin loop at one end of the VACV genome and to the same sequence within the concatemeric junction of replication intermediates. These findings support a model for poxvirus genome replication that involves leading and lagging strand synthesis and is consistent with the requirements for primase and ligase activities as well as earlier electron microscopic and biochemical studies implicating a replication origin at the end of the VACV genome.

  19. Sequence Capture versus Restriction Site Associated DNA Sequencing for Shallow Systematics.

    Science.gov (United States)

    Harvey, Michael G; Smith, Brian Tilston; Glenn, Travis C; Faircloth, Brant C; Brumfield, Robb T

    2016-09-01

    Sequence capture and restriction site associated DNA sequencing (RAD-Seq) are two genomic enrichment strategies for applying next-generation sequencing technologies to systematics studies. At shallow timescales, such as within species, RAD-Seq has been widely adopted among researchers, although there has been little discussion of the potential limitations and benefits of RAD-Seq and sequence capture. We discuss a series of issues that may impact the utility of sequence capture and RAD-Seq data for shallow systematics in non-model species. We review prior studies that used both methods, and investigate differences between the methods by re-analyzing existing RAD-Seq and sequence capture data sets from a Neotropical bird (Xenops minutus). We suggest that the strengths of RAD-Seq data sets for shallow systematics are the wide dispersion of markers across the genome, the relative ease and cost of laboratory work, the deep coverage and read overlap at recovered loci, and the high overall information that results. Sequence capture's benefits include flexibility and repeatability in the genomic regions targeted, success using low-quality samples, more straightforward read orthology assessment, and higher per-locus information content. The utility of a method in systematics, however, rests not only on its performance within a study, but on the comparability of data sets and inferences with those of prior work. In RAD-Seq data sets, comparability is compromised by low overlap of orthologous markers across species and the sensitivity of genetic diversity in a data set to an interaction between the level of natural heterozygosity in the samples examined and the parameters used for orthology assessment. In contrast, sequence capture of conserved genomic regions permits interrogation of the same loci across divergent species, which is preferable for maintaining comparability among data sets and studies for the purpose of drawing general conclusions about the impact of

  20. Analysis of genetic diversity of Sclerotinia sclerotiorum from eggplant by mycelial compatibility, random amplification of polymorphic DNA (RAPD and simple sequence repeat (SSR analyses

    Directory of Open Access Journals (Sweden)

    Fatih Mehmet Tok

    2016-09-01

    Full Text Available The genetic diversity and pathogenicity/virulence among 60 eggplant Sclerotinia sclerotiorum isolates collected from six different geographic regions of Turkey were analysed using mycelial compatibility groupings (MCGs, random amplified polymorphic DNA (RAPD and simple sequence repeat (SSR polymorphism. By MCG tests, the isolates were classified into 22 groups. Out of 22 MCGs, 36% were represented each by a single isolate. The isolates showed great variability for virulence regardless of MCG and geographic origin. Based on the results of RAPD and SSR analyses, 60 S. sclerotiorum isolates representing 22 MCGs were grouped in 2 and 3 distinct clusters, respectively. Analyses using RAPD and SSR markers illustrated that cluster groupings or genetic distance of S. sclerotiorum populations from eggplant were not distinctly relative to the MCG, geographical origin and virulence diversity. The patterns obtained revealed a high heterogeneity of genetic composition and suggested the occurrence of clonal and sexual reproduction of S. sclerotiorum on eggplant in the areas surveyed.

  1. High Sequence Variations in Mitochondrial DNA Control Region among Worldwide Populations of Flathead Mullet Mugil cephalus

    Directory of Open Access Journals (Sweden)

    Brian Wade Jamandre

    2014-01-01

    Full Text Available The sequence and structure of the complete mtDNA control region (CR of M. cephalus from African, Pacific, and Atlantic populations are presented in this study to assess its usefulness in phylogeographic studies of this species. The mtDNA CR sequence variations among M. cephalus populations largely exceeded intraspecific polymorphisms that are generally observed in other vertebrates. The length of CR sequence varied among M. cephalus populations due to the presence of indels and variable number of tandem repeats at the 3′ hypervariable domain. The high evolutionary rate of the CR in this species probably originated from these mutations. However, no excessive homoplasic mutations were noticed. Finally, the star shaped tree inferred from the CR polymorphism stresses a rapid radiation worldwide, in this species. The CR still appears as a good marker for phylogeographic investigations and additional worldwide samples are warranted to further investigate the genetic structure and evolution in M. cephalus.

  2. SWORDS: A statistical tool for analysing large DNA sequences

    Indian Academy of Sciences (India)

    Unknown

    These techniques are based on frequency distributions of DNA words in a large sequence, and have been packaged into a software called SWORDS. Using sequences available in ... tions with the cellular processes like recombination, replication .... in DNA sequences using certain specific probability laws. (Pevzner et al ...

  3. simple sequence repeat (SSR) markers in genetic analysis of

    African Journals Online (AJOL)

    Yomi

    2012-08-28

    1998). Cross- species amplification of soybean (Glycine max) simple sequence repeats (SSRs) within the genus and other legume genera: implications for the transferability of SSRs in plants. Mol. Biol. Evol. 15:1275-1287.

  4. Isolation and sequence analysis of the wheat B genome subtelomeric DNA.

    Science.gov (United States)

    Salina, Elena A; Sergeeva, Ekaterina M; Adonina, Irina G; Shcherban, Andrey B; Afonnikov, Dmitry A; Belcram, Harry; Huneau, Cecile; Chalhoub, Boulos

    2009-09-05

    Telomeric and subtelomeric regions are essential for genome stability and regular chromosome replication. In this work, we have characterized the wheat BAC (bacterial artificial chromosome) clones containing Spelt1 and Spelt52 sequences, which belong to the subtelomeric repeats of the B/G genomes of wheats and Aegilops species from the section Sitopsis. The BAC library from Triticum aestivum cv. Renan was screened using Spelt1 and Spelt52 as probes. Nine positive clones were isolated; of them, clone 2050O8 was localized mainly to the distal parts of wheat chromosomes by in situ hybridization. The distribution of the other clones indicated the presence of different types of repetitive sequences in BACs. Use of different approaches allowed us to prove that seven of the nine isolated clones belonged to the subtelomeric chromosomal regions. Clone 2050O8 was sequenced and its sequence of 119,737 bp was annotated. It is composed of 33% transposable elements (TEs), 8.2% Spelt52 (namely, the subfamily Spelt52.2) and five non-TE-related genes. DNA transposons are predominant, making up 24.6% of the entire BAC clone, whereas retroelements account for 8.4% of the clone length. The full-length CACTA transposon Caspar covers 11,666 bp, encoding a transposase and CTG-2 proteins, and this transposon accounts for 40% of the DNA transposons. The in situ hybridization data for 2050O8 derived subclones in combination with the BLAST search against wheat mapped ESTs (expressed sequence tags) suggest that clone 2050O8 is located in the terminal bin 4BL-10 (0.95-1.0). Additionally, four of the predicted 2050O8 genes showed significant homology to four putative orthologous rice genes in the distal part of rice chromosome 3S and confirm the synteny to wheat 4BL. Satellite DNA sequences from the subtelomeric regions of diploid wheat progenitor can be used for selecting the BAC clones from the corresponding regions of hexaploid wheat chromosomes. It has been demonstrated for the first time

  5. Analysis of T-DNA/Host-Plant DNA Junction Sequences in Single-Copy Transgenic Barley Lines

    Directory of Open Access Journals (Sweden)

    Joanne G. Bartlett

    2014-01-01

    Full Text Available Sequencing across the junction between an integrated transfer DNA (T-DNA and a host plant genome provides two important pieces of information. The junctions themselves provide information regarding the proportion of T-DNA which has integrated into the host plant genome, whilst the transgene flanking sequences can be used to study the local genetic environment of the integrated transgene. In addition, this information is important in the safety assessment of GM crops and essential for GM traceability. In this study, a detailed analysis was carried out on the right-border T-DNA junction sequences of single-copy independent transgenic barley lines. T-DNA truncations at the right-border were found to be relatively common and affected 33.3% of the lines. In addition, 14.3% of lines had rearranged construct sequence after the right border break-point. An in depth analysis of the host-plant flanking sequences revealed that a significant proportion of the T-DNAs integrated into or close to known repetitive elements. However, this integration into repetitive DNA did not have a negative effect on transgene expression.

  6. Chimeric proteins for detection and quantitation of DNA mutations, DNA sequence variations, DNA damage and DNA mismatches

    Science.gov (United States)

    McCutchen-Maloney, Sandra L.

    2002-01-01

    Chimeric proteins having both DNA mutation binding activity and nuclease activity are synthesized by recombinant technology. The proteins are of the general formula A-L-B and B-L-A where A is a peptide having DNA mutation binding activity, L is a linker and B is a peptide having nuclease activity. The chimeric proteins are useful for detection and identification of DNA sequence variations including DNA mutations (including DNA damage and mismatches) by binding to the DNA mutation and cutting the DNA once the DNA mutation is detected.

  7. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    Science.gov (United States)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  8. The complete chloroplast genome sequence of Taxus chinensis var. mairei (Taxaceae): loss of an inverted repeat region and comparative analysis with related species.

    Science.gov (United States)

    Zhang, Yanzhen; Ma, Ji; Yang, Bingxian; Li, Ruyi; Zhu, Wei; Sun, Lianli; Tian, Jingkui; Zhang, Lin

    2014-05-01

    Taxus chinensis var. mairei (Taxaceae) is a domestic variety of yew species in local China. This plant is one of the sources for paclitaxel, which is a promising antineoplastic chemotherapy drugs during the last decade. We have sequenced the complete nucleotide sequence of the chloroplast (cp) genome of T. chinensis var. mairei. The T. chinensis var. mairei cp genome is 129,513 bp in length, with 113 single copy genes and two duplicated genes (trnI-CAU, trnQ-UUG). Among the 113 single copy genes, 9 are intron-containing. Compared to other land plant cp genomes, the T. chinensis var. mairei cp genome has lost one of the large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperm such as Cycas revoluta and Ginkgo biloba L. Compared to related species, the gene order of T. chinensis var. mairei has a large inversion of ~110kb including 91 genes (from rps18 to accD) with gene contents unarranged. Repeat analysis identified 48 direct and 2 inverted repeats 30 bp long or longer with a sequence identity greater than 90%. Repeated short segments were found in genes rps18, rps19 and clpP. Analysis also revealed 22 simple sequence repeat (SSR) loci and almost all are composed of A or T. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Genome-wide cloning and sequence analysis of leucine-rich repeat receptor-like protein kinase genes in Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    Yuan Tong

    2010-01-01

    Full Text Available Abstract Background Transmembrane receptor kinases play critical roles in both animal and plant signaling pathways regulating growth, development, differentiation, cell death, and pathogenic defense responses. In Arabidopsis thaliana, there are at least 223 Leucine-rich repeat receptor-like kinases (LRR-RLKs, representing one of the largest protein families. Although functional roles for a handful of LRR-RLKs have been revealed, the functions of the majority of members in this protein family have not been elucidated. Results As a resource for the in-depth analysis of this important protein family, the complementary DNA sequences (cDNAs of 194 LRR-RLKs were cloned into the GatewayR donor vector pDONR/ZeoR and analyzed by DNA sequencing. Among them, 157 clones showed sequences identical to the predictions in the Arabidopsis sequence resource, TAIR8. The other 37 cDNAs showed gene structures distinct from the predictions of TAIR8, which was mainly caused by alternative splicing of pre-mRNA. Most of the genes have been further cloned into GatewayR destination vectors with GFP or FLAG epitope tags and have been transformed into Arabidopsis for in planta functional analysis. All clones from this study have been submitted to the Arabidopsis Biological Resource Center (ABRC at Ohio State University for full accessibility by the Arabidopsis research community. Conclusions Most of the Arabidopsis LRR-RLK genes have been isolated and the sequence analysis showed a number of alternatively spliced variants. The generated resources, including cDNA entry clones, expression constructs and transgenic plants, will facilitate further functional analysis of the members of this important gene family.

  10. Fidelity of target site duplication and sequence preference during integration of xenotropic murine leukemia virus-related virus.

    Directory of Open Access Journals (Sweden)

    Sanggu Kim

    Full Text Available Xenotropic murine leukemia virus (MLV-related virus (XMRV is a new human retrovirus associated with prostate cancer and chronic fatigue syndrome. The causal relationship of XMRV infection to human disease and the mechanism of pathogenicity have not been established. During retrovirus replication, integration of the cDNA copy of the viral RNA genome into the host cell chromosome is an essential step and involves coordinated joining of the two ends of the linear viral DNA into staggered sites on target DNA. Correct integration produces proviruses that are flanked by a short direct repeat, which varies from 4 to 6 bp among the retroviruses but is invariant for each particular retrovirus. Uncoordinated joining of the two viral DNA ends into target DNA can cause insertions, deletions, or other genomic alterations at the integration site. To determine the fidelity of XMRV integration, cells infected with XMRV were clonally expanded and DNA sequences at the viral-host DNA junctions were determined and analyzed. We found that a majority of the provirus ends were correctly processed and flanked by a 4-bp direct repeat of host DNA. A weak consensus sequence was also detected at the XMRV integration sites. We conclude that integration of XMRV DNA involves a coordinated joining of two viral DNA ends that are spaced 4 bp apart on the target DNA and proceeds with high fidelity.

  11. Short read sequence typing (SRST: multi-locus sequence types from short reads

    Directory of Open Access Journals (Sweden)

    Inouye Michael

    2012-07-01

    Full Text Available Abstract Background Multi-locus sequence typing (MLST has become the gold standard for population analyses of bacterial pathogens. This method focuses on the sequences of a small number of loci (usually seven to divide the population and is simple, robust and facilitates comparison of results between laboratories and over time. Over the last decade, researchers and population health specialists have invested substantial effort in building up public MLST databases for nearly 100 different bacterial species, and these databases contain a wealth of important information linked to MLST sequence types such as time and place of isolation, host or niche, serotype and even clinical or drug resistance profiles. Recent advances in sequencing technology mean it is increasingly feasible to perform bacterial population analysis at the whole genome level. This offers massive gains in resolving power and genetic profiling compared to MLST, and will eventually replace MLST for bacterial typing and population analysis. However given the wealth of data currently available in MLST databases, it is crucial to maintain backwards compatibility with MLST schemes so that new genome analyses can be understood in their proper historical context. Results We present a software tool, SRST, for quick and accurate retrieval of sequence types from short read sets, using inputs easily downloaded from public databases. SRST uses read mapping and an allele assignment score incorporating sequence coverage and variability, to determine the most likely allele at each MLST locus. Analysis of over 3,500 loci in more than 500 publicly accessible Illumina read sets showed SRST to be highly accurate at allele assignment. SRST output is compatible with common analysis tools such as eBURST, Clonal Frame or PhyloViz, allowing easy comparison between novel genome data and MLST data. Alignment, fastq and pileup files can also be generated for novel alleles. Conclusions SRST is a novel

  12. Torque measurements reveal sequence-specific cooperative transitions in supercoiled DNA

    Science.gov (United States)

    Oberstrass, Florian C.; Fernandes, Louis E.; Bryant, Zev

    2012-01-01

    B-DNA becomes unstable under superhelical stress and is able to adopt a wide range of alternative conformations including strand-separated DNA and Z-DNA. Localized sequence-dependent structural transitions are important for the regulation of biological processes such as DNA replication and transcription. To directly probe the effect of sequence on structural transitions driven by torque, we have measured the torsional response of a panel of DNA sequences using single molecule assays that employ nanosphere rotational probes to achieve high torque resolution. The responses of Z-forming d(pGpC)n sequences match our predictions based on a theoretical treatment of cooperative transitions in helical polymers. “Bubble” templates containing 50–100 bp mismatch regions show cooperative structural transitions similar to B-DNA, although less torque is required to disrupt strand–strand interactions. Our mechanical measurements, including direct characterization of the torsional rigidity of strand-separated DNA, establish a framework for quantitative predictions of the complex torsional response of arbitrary sequences in their biological context. PMID:22474350

  13. Aspects of coverage in medical DNA sequencing

    Directory of Open Access Journals (Sweden)

    Wilson Richard K

    2008-05-01

    Full Text Available Abstract Background DNA sequencing is now emerging as an important component in biomedical studies of diseases like cancer. Short-read, highly parallel sequencing instruments are expected to be used heavily for such projects, but many design specifications have yet to be conclusively established. Perhaps the most fundamental of these is the redundancy required to detect sequence variations, which bears directly upon genomic coverage and the consequent resolving power for discerning somatic mutations. Results We address the medical sequencing coverage problem via an extension of the standard mathematical theory of haploid coverage. The expected diploid multi-fold coverage, as well as its generalization for aneuploidy are derived and these expressions can be readily evaluated for any project. The resulting theory is used as a scaling law to calibrate performance to that of standard BAC sequencing at 8× to 10× redundancy, i.e. for expected coverages that exceed 99% of the unique sequence. A differential strategy is formalized for tumor/normal studies wherein tumor samples are sequenced more deeply than normal ones. In particular, both tumor alleles should be detected at least twice, while both normal alleles are detected at least once. Our theory predicts these requirements can be met for tumor and normal redundancies of approximately 26× and 21×, respectively. We explain why these values do not differ by a factor of 2, as might intuitively be expected. Future technology developments should prompt even deeper sequencing of tumors, but the 21× value for normal samples is essentially a constant. Conclusion Given the assumptions of standard coverage theory, our model gives pragmatic estimates for required redundancy. The differential strategy should be an efficient means of identifying potential somatic mutations for further study.

  14. Assessing the 5S ribosomal RNA heterogeneity in Arabidopsis thaliana using short RNA next generation sequencing data.

    Science.gov (United States)

    Szymanski, Maciej; Karlowski, Wojciech M

    2016-01-01

    In eukaryotes, ribosomal 5S rRNAs are products of multigene families organized within clusters of tandemly repeated units. Accumulation of genomic data obtained from a variety of organisms demonstrated that the potential 5S rRNA coding sequences show a large number of variants, often incompatible with folding into a correct secondary structure. Here, we present results of an analysis of a large set of short RNA sequences generated by the next generation sequencing techniques, to address the problem of heterogeneity of the 5S rRNA transcripts in Arabidopsis and identification of potentially functional rRNA-derived fragments.

  15. Molecular identification and characterization of clustered regularly interspaced short palindromic repeat (CRISPR) gene cluster in Taylorella equigenitalis.

    Science.gov (United States)

    Hara, Yasushi; Hayashi, Kyohei; Nakajima, Takuya; Kagawa, Shizuko; Tazumi, Akihiro; Moore, John E; Matsuda, Motoo

    2013-09-01

    Clustered regularly interspaced short palindromic repeats (CRISPRs), of approximately 10,000 base pairs (bp) in length, were shown to occur in the Japanese Taylorella equigenitalis strain, EQ59. The locus was composed of the putative CRISPRs-associated with 5 (cas5), RAMP csd1, csd2, recB, cas1, a leader region, 13 CRISPR consensus sequence repeats (each 32 bp; 5'-TCAGCCACGTTCGCGTGGCTGTGTGTTTAAAG-3'). These were in turn separated by 12 non repetitive unique spacer regions of similar length. In addition, a leader region, a transposase/IS protein, a leader region, and cas3 were also seen. All seven putative open reading frames carry their ribosome binding sites. Promoter consensus sequences at the -35 and -10 regions and putative intrinsic ρ-independent transcription terminator regions also occurred. A possible long overlap of 170 bp in length occurred between the recB and cas1 loci. Positive reverse transcription PCR signals of cas5, RAMP csd1, csd2-recB/cas1, and cas3 were generated. A putative secondary structure of the CRISPR consensus repeats was constructed. Following this, CRISPR results of the T. equigenitalis EQ59 isolate were subsequently compared with those from the Taylorella asinigenitalis MCE3 isolate.

  16. Genome-wide identification, sequence characterization, and protein-protein interaction properties of DDB1 (damaged DNA binding protein-1)-binding WD40-repeat family members in Solanum lycopersicum.

    Science.gov (United States)

    Zhu, Yunye; Huang, Shengxiong; Miao, Min; Tang, Xiaofeng; Yue, Junyang; Wang, Wenjie; Liu, Yongsheng

    2015-06-01

    One hundred DDB1 (damaged DNA binding protein-1)-binding WD40-repeat domain (DWD) family genes were identified in the S. lycopersicum genome. The DWD genes encode proteins presumably functioning as the substrate recognition subunits of the cullin4-ring ubiquitin E3 ligase complex. These findings provide candidate genes and a research platform for further gene functionality and molecular breeding study. A subclass of DDB1 (damaged DNA binding protein-1)-binding WD40-repeat domain (DWD) family proteins has been demonstrated to function as the substrate recognition subunits of the cullin4-ring ubiquitin E3 ligase complex. However, little information is available about the cognate subfamily genes in tomato (S. lycopersicum). In this study, based on the recently released tomato genome sequences, 100 tomato genes encoding DWD proteins that potentially interact with DDB1 were identified and characterized, including analyses of the detailed annotations, chromosome locations and compositions of conserved amino acid domains. In addition, a phylogenetic tree, which comprises of three main groups, of the subfamily genes was constructed. The physical interaction between tomato DDB1 and 14 representative DWD proteins was determined by yeast two-hybrid and co-immunoprecipitation assays. The subcellular localization of these 14 representative DWD proteins was determined. Six of them were localized in both nucleus and cytoplasm, seven proteins exclusively in cytoplasm, and one protein either in nucleus and cytoplasm, or exclusively in cytoplasm. Comparative genomic analysis demonstrated that the expansion of these subfamily members in tomato predominantly resulted from two whole-genome triplication events in the evolution history.

  17. Automated methods for single-stranded DNA isolation and dideoxynucleotide DNA sequencing reactions on a robotic workstation

    International Nuclear Information System (INIS)

    Mardis, E.R.; Roe, B.A.

    1989-01-01

    Automated procedures have been developed for both the simultaneous isolation of 96 single-stranded M13 chimeric template DNAs in less than two hours, and for simultaneously pipetting 24 dideoxynucleotide sequencing reactions on a commercially available laboratory workstation. The DNA sequencing results obtained by either radiolabeled or fluorescent methods are consistent with the premise that automation of these portions of DNA sequencing projects will improve the reproducibility of the DNA isolation and the procedures for these normally labor-intensive steps provides an approach for rapid acquisition of large amounts of high quality, reproducible DNA sequence data

  18. Highly sensitive polymerase chain reaction-free quantum dot-based quantification of forensic genomic DNA

    International Nuclear Information System (INIS)

    Tak, Yu Kyung; Kim, Won Young; Kim, Min Jung; Han, Eunyoung; Han, Myun Soo; Kim, Jong Jin; Kim, Wook; Lee, Jong Eun; Song, Joon Myong

    2012-01-01

    Highlights: ► Genomic DNA quantification were performed using a quantum dot-labeled Alu sequence. ► This probe provided PCR-free determination of human genomic DNA. ► Qdot-labeled Alu probe-hybridized genomic DNAs had a 2.5-femtogram detection limit. ► Qdot-labeled Alu sequence was used to assess DNA samples for human identification. - Abstract: Forensic DNA samples can degrade easily due to exposure to light and moisture at the crime scene. In addition, the amount of DNA acquired at a criminal site is inherently limited. This limited amount of human DNA has to be quantified accurately after the process of DNA extraction. The accurately quantified extracted genomic DNA is then used as a DNA template in polymerase chain reaction (PCR) amplification for short tandem repeat (STR) human identification. Accordingly, highly sensitive and human-specific quantification of forensic DNA samples is an essential issue in forensic study. In this work, a quantum dot (Qdot)-labeled Alu sequence was developed as a probe to simultaneously satisfy both the high sensitivity and human genome selectivity for quantification of forensic DNA samples. This probe provided PCR-free determination of human genomic DNA and had a 2.5-femtogram detection limit due to the strong emission and photostability of the Qdot. The Qdot-labeled Alu sequence has been used successfully to assess 18 different forensic DNA samples for STR human identification.

  19. Exact Tandem Repeats Analyzer (E-TRA): A new program for DNA ...

    Indian Academy of Sciences (India)

    Unknown

    Advanced user defined parameters/options let the researchers use different minimum motif repeats ... E-TRA, we used 5,465,605 human EST sequences derived from 18,814,550 ..... repeat rates of T-cells, embryo and testis were higher.

  20. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  1. Sequencing Intractable DNA to Close Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Hurt, Jr., Richard Ashley [ORNL; Brown, Steven D [ORNL; Podar, Mircea [ORNL; Palumbo, Anthony Vito [ORNL; Elias, Dwayne A [ORNL

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  2. Short tandem repeat profiling: part of an overall strategy for reducing the frequency of cell misidentification.

    Science.gov (United States)

    Nims, Raymond W; Sykes, Greg; Cottrill, Karin; Ikonomi, Pranvera; Elmore, Eugene

    2010-12-01

    The role of cell authentication in biomedical science has received considerable attention, especially within the past decade. This quality control attribute is now beginning to be given the emphasis it deserves by granting agencies and by scientific journals. Short tandem repeat (STR) profiling, one of a few DNA profiling technologies now available, is being proposed for routine identification (authentication) of human cell lines, stem cells, and tissues. The advantage of this technique over methods such as isoenzyme analysis, karyotyping, human leukocyte antigen typing, etc., is that STR profiling can establish identity to the individual level, provided that the appropriate number and types of loci are evaluated. To best employ this technology, a standardized protocol and a data-driven, quality-controlled, and publically searchable database will be necessary. This public STR database (currently under development) will enable investigators to rapidly authenticate human-based cultures to the individual from whom the cells were sourced. Use of similar approaches for non-human animal cells will require developing other suitable loci sets. While implementing STR analysis on a more routine basis should significantly reduce the frequency of cell misidentification, additional technologies may be needed as part of an overall authentication paradigm. For instance, isoenzyme analysis, PCR-based DNA amplification, and sequence-based barcoding methods enable rapid confirmation of a cell line's species of origin while screening against cross-contaminations, especially when the cells present are not recognized by the species-specific STR method. Karyotyping may also be needed as a supporting tool during establishment of an STR database. Finally, good cell culture practices must always remain a major component of any effort to reduce the frequency of cell misidentification.

  3. Mechanism of duplex DNA destabilization by RNA-guided Cas9 nuclease during target interrogation.

    Science.gov (United States)

    Mekler, Vladimir; Minakhin, Leonid; Severinov, Konstantin

    2017-05-23

    The prokaryotic clustered regularly interspaced short palindromic repeats (CRISPR)-associated 9 (Cas9) endonuclease cleaves double-stranded DNA sequences specified by guide RNA molecules and flanked by a protospacer adjacent motif (PAM) and is widely used for genome editing in various organisms. The RNA-programmed Cas9 locates the target site by scanning genomic DNA. We sought to elucidate the mechanism of initial DNA interrogation steps that precede the pairing of target DNA with guide RNA. Using fluorometric and biochemical assays, we studied Cas9/guide RNA complexes with model DNA substrates that mimicked early intermediates on the pathway to the final Cas9/guide RNA-DNA complex. The results show that Cas9/guide RNA binding to PAM favors separation of a few PAM-proximal protospacer base pairs allowing initial target interrogation by guide RNA. The duplex destabilization is mediated, in part, by Cas9/guide RNA affinity for unpaired segments of nontarget strand DNA close to PAM. Furthermore, our data indicate that the entry of double-stranded DNA beyond a short threshold distance from PAM into the Cas9/single-guide RNA (sgRNA) interior is hindered. We suggest that the interactions unfavorable for duplex DNA binding promote DNA bending in the PAM-proximal region during early steps of Cas9/guide RNA-DNA complex formation, thus additionally destabilizing the protospacer duplex. The mechanism that emerges from our analysis explains how the Cas9/sgRNA complex is able to locate the correct target sequence efficiently while interrogating numerous nontarget sequences associated with correct PAMs.

  4. Molecular structure and chromosome distribution of three repetitive DNA families in Anemone hortensis L. (Ranunculaceae).

    Science.gov (United States)

    Mlinarec, Jelena; Chester, Mike; Siljak-Yakovlev, Sonja; Papes, Drazena; Leitch, Andrew R; Besendorfer, Visnja

    2009-01-01

    The structure, abundance and location of repetitive DNA sequences on chromosomes can characterize the nature of higher plant genomes. Here we report on three new repeat DNA families isolated from Anemone hortensis L.; (i) AhTR1, a family of satellite DNA (stDNA) composed of a 554-561 bp long EcoRV monomer; (ii) AhTR2, a stDNA family composed of a 743 bp long HindIII monomer and; (iii) AhDR, a repeat family composed of a 945 bp long HindIII fragment that exhibits some sequence similarity to Ty3/gypsy-like retroelements. Fluorescence in-situ hybridization (FISH) to metaphase chromosomes of A. hortensis (2n = 16) revealed that both AhTR1 and AhTR2 sequences co-localized with DAPI-positive AT-rich heterochromatic regions. AhTR1 sequences occur at intercalary DAPI bands while AhTR2 sequences occur at 8-10 terminally located heterochromatic blocks. In contrast AhDR sequences are dispersed over all chromosomes as expected of a Ty3/gypsy-like element. AhTR2 and AhTR1 repeat families include polyA- and polyT-tracks, AT/TA-motifs and a pentanucleotide sequence (CAAAA) that may have consequences for chromatin packing and sequence homogeneity. AhTR2 repeats also contain TTTAGGG motifs and degenerate variants. We suggest that they arose by interspersion of telomeric repeats with subtelomeric repeats, before hybrid unit(s) amplified through the heterochromatic domain. The three repetitive DNA families together occupy approximately 10% of the A. hortensis genome. Comparative analyses of eight Anemone species revealed that the divergence of the A. hortensis genome was accompanied by considerable modification and/or amplification of repeats.

  5. Automated genotyping of dinucleotide repeat markers

    Energy Technology Data Exchange (ETDEWEB)

    Perlin, M.W.; Hoffman, E.P. [Carnegie Mellon Univ., Pittsburgh, PA (United States)]|[Univ. of Pittsburgh, PA (United States)

    1994-09-01

    The dinucleotide repeats (i.e., microsatellites) such as CA-repeats are a highly polymorphic, highly abundant class of PCR-amplifiable markers that have greatly streamlined genetic mapping experimentation. It is expected that over 30,000 such markers (including tri- and tetranucleotide repeats) will be characterized for routine use in the next few years. Since only size determination, and not sequencing, is required to determine alleles, in principle, dinucleotide repeat genotyping is easily performed on electrophoretic gels, and can be automated using DNA sequencers. Unfortunately, PCR stuttering with these markers generates not one band for each allele, but a pattern of bands. Since closely spaced alleles must be disambiguated by human scoring, this poses a key obstacle to full automation. We have developed methods that overcome this obstacle. Our model is that the observed data is generated by arithmetic superposition (i.e., convolution) of multiple allele patterns. By quantitatively measuring the size of each component band, and exploiting the unique stutter pattern associated with each marker, closely spaced alleles can be deconvolved; this unambiguously reconstructs the {open_quotes}true{close_quotes} allele bands, with stutter artifact removed. We used this approach in a system for automated diagnosis of (X-linked) Duchenne muscular dystrophy; four multiplexed CA-repeats within the dystrophin gene were assayed on a DNA sequencer. Our method accurately detected small variations in gel migration that shifted the allele size estimate. In 167 nonmutated alleles, 89% (149/167) showed no size variation, 9% (15/167) showed 1 bp variation, and 2% (3/167) showed 2 bp variation. We are currently developing a library of dinucleotide repeat patterns; together with our deconvolution methods, this library will enable fully automated genotyping of dinucleotide repeats from sizing data.

  6. Rate-determining Step of Flap Endonuclease 1 (FEN1) Reflects a Kinetic Bias against Long Flaps and Trinucleotide Repeat Sequences.

    Science.gov (United States)

    Tarantino, Mary E; Bilotti, Katharina; Huang, Ji; Delaney, Sarah

    2015-08-21

    Flap endonuclease 1 (FEN1) is a structure-specific nuclease responsible for removing 5'-flaps formed during Okazaki fragment maturation and long patch base excision repair. In this work, we use rapid quench flow techniques to examine the rates of 5'-flap removal on DNA substrates of varying length and sequence. Of particular interest are flaps containing trinucleotide repeats (TNR), which have been proposed to affect FEN1 activity and cause genetic instability. We report that FEN1 processes substrates containing flaps of 30 nucleotides or fewer at comparable single-turnover rates. However, for flaps longer than 30 nucleotides, FEN1 kinetically discriminates substrates based on flap length and flap sequence. In particular, FEN1 removes flaps containing TNR sequences at a rate slower than mixed sequence flaps of the same length. Furthermore, multiple-turnover kinetic analysis reveals that the rate-determining step of FEN1 switches as a function of flap length from product release to chemistry (or a step prior to chemistry). These results provide a kinetic perspective on the role of FEN1 in DNA replication and repair and contribute to our understanding of FEN1 in mediating genetic instability of TNR sequences. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  7. Short Interspersed Nuclear Element (SINE) Sequences in the Genome of the Human Pathogenic Fungus Aspergillus fumigatus Af293.

    Science.gov (United States)

    Kanhayuwa, Lakkhana; Coutts, Robert H A

    2016-01-01

    Novel families of short interspersed nuclear element (SINE) sequences in the human pathogenic fungus Aspergillus fumigatus, clinical isolate Af293, were identified and categorised into tRNA-related and 5S rRNA-related SINEs. Eight predicted tRNA-related SINE families originating from different tRNAs, and nominated as AfuSINE2 sequences, contained target site duplications of short direct repeat sequences (4-14 bp) flanking the elements, an extended tRNA-unrelated region and typical features of RNA polymerase III promoter sequences. The elements ranged in size from 140-493 bp and were present in low copy number in the genome and five out of eight were actively transcribed. One putative tRNAArg-derived sequence, AfuSINE2-1a possessed a unique feature of repeated trinucleotide ACT residues at its 3'-terminus. This element was similar in sequence to the I-4_AO element found in A. oryzae and an I-1_AF long nuclear interspersed element-like sequence identified in A. fumigatus Af293. Families of 5S rRNA-related SINE sequences, nominated as AfuSINE3, were also identified and their 5'-5S rRNA-related regions show 50-65% and 60-75% similarity to respectively A. fumigatus 5S rRNAs and SINE3-1_AO found in A. oryzae. A. fumigatus Af293 contains five copies of AfuSINE3 sequences ranging in size from 259-343 bp and two out of five AfuSINE3 sequences were actively transcribed. Investigations on AfuSINE distribution in the fungal genome revealed that the elements are enriched in pericentromeric and subtelomeric regions and inserted within gene-rich regions. We also demonstrated that some, but not all, AfuSINE sequences are targeted by host RNA silencing mechanisms. Finally, we demonstrated that infection of the fungus with mycoviruses had no apparent effects on SINE activity.

  8. The complete chloroplast genome sequence of the chlorophycean green alga Scenedesmus obliquus reveals a compact gene organization and a biased distribution of genes on the two DNA strands

    Science.gov (United States)

    de Cambiaire, Jean-Charles; Otis, Christian; Lemieux, Claude; Turmel, Monique

    2006-01-01

    Background The phylum Chlorophyta contains the majority of the green algae and is divided into four classes. While the basal position of the Prasinophyceae is well established, the divergence order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae (UTC) remains uncertain. The five complete chloroplast DNA (cpDNA) sequences currently available for representatives of these classes display considerable variability in overall structure, gene content, gene density, intron content and gene order. Among these genomes, that of the chlorophycean green alga Chlamydomonas reinhardtii has retained the least ancestral features. The two single-copy regions, which are separated from one another by the large inverted repeat (IR), have similar sizes, rather than unequal sizes, and differ radically in both gene contents and gene organizations relative to the single-copy regions of prasinophyte and ulvophyte cpDNAs. To gain insights into the various changes that underwent the chloroplast genome during the evolution of chlorophycean green algae, we have sequenced the cpDNA of Scenedesmus obliquus, a member of a distinct chlorophycean lineage. Results The 161,452 bp IR-containing genome of Scenedesmus features single-copy regions of similar sizes, encodes 96 genes, i.e. only two additional genes (infA and rpl12) relative to its Chlamydomonas homologue and contains seven group I and two group II introns. It is clearly more compact than the four UTC algal cpDNAs that have been examined so far, displays the lowest proportion of short repeats among these algae and shows a stronger bias in clustering of genes on the same DNA strand compared to Chlamydomonas cpDNA. Like the latter genome, Scenedesmus cpDNA displays only a few ancestral gene clusters. The two chlorophycean genomes share 11 gene clusters that are not found in previously sequenced trebouxiophyte and ulvophyte cpDNAs as well as a few genes that have an unusual structure; however, their single-copy regions differ

  9. A single whole-body low dose X-irradiation does not affect L1, B1 and IAP repeat element DNA methylation longitudinally.

    Directory of Open Access Journals (Sweden)

    Michelle R Newman

    Full Text Available The low dose radioadaptive response has been shown to be protective against high doses of radiation as well as aging-induced genomic instability. We hypothesised that a single whole-body exposure of low dose radiation would induce a radioadaptive response thereby reducing or abrogating aging-related changes in repeat element DNA methylation in mice. Following sham or 10 mGy X-irradiation, serial peripheral blood sampling was performed and differences in Long Interspersed Nucleic Element 1 (L1, B1 and Intracisternal-A-Particle (IAP repeat element methylation between samples were assessed using high resolution melt analysis of PCR amplicons. By 420 days post-irradiation, neither radiation- or aging-related changes in the methylation of peripheral blood, spleen or liver L1, B1 and IAP elements were observed. Analysis of the spleen and liver tissues of cohorts of untreated aging mice showed that the 17-19 month age group exhibited higher repeat element methylation than younger or older mice, with no overall decline in methylation detected with age. This is the first temporal analysis of the effect of low dose radiation on repeat element methylation in mouse peripheral blood and the first to examine the long term effect of this dose on repeat element methylation in a radiosensitive tissue (spleen and a tissue fundamental to the aging process (liver. Our data indicate that the methylation of murine DNA repeat elements can fluctuate with age, but unlike human studies, do not demonstrate an overall aging-related decline. Furthermore, our results indicate that a low dose of ionising radiation does not induce detectable changes to murine repeat element DNA methylation in the tissues and at the time-points examined in this study. This radiation dose is relevant to human diagnostic radiation exposures and suggests that a dose of 10 mGy X-rays, unlike high dose radiation, does not cause significant short or long term changes to repeat element or global DNA

  10. The Paramecium germline genome provides a niche for intragenic parasitic DNA: evolutionary dynamics of internal eliminated sequences.

    Science.gov (United States)

    Arnaiz, Olivier; Mathy, Nathalie; Baudry, Céline; Malinsky, Sophie; Aury, Jean-Marc; Denby Wilkes, Cyril; Garnier, Olivier; Labadie, Karine; Lauderdale, Benjamin E; Le Mouël, Anne; Marmignon, Antoine; Nowacki, Mariusz; Poulain, Julie; Prajer, Malgorzata; Wincker, Patrick; Meyer, Eric; Duharcourt, Sandra; Duret, Laurent; Bétermier, Mireille; Sperling, Linda

    2012-01-01

    Insertions of parasitic DNA within coding sequences are usually deleterious and are generally counter-selected during evolution. Thanks to nuclear dimorphism, ciliates provide unique models to study the fate of such insertions. Their germline genome undergoes extensive rearrangements during development of a new somatic macronucleus from the germline micronucleus following sexual events. In Paramecium, these rearrangements include precise excision of unique-copy Internal Eliminated Sequences (IES) from the somatic DNA, requiring the activity of a domesticated piggyBac transposase, PiggyMac. We have sequenced Paramecium tetraurelia germline DNA, establishing a genome-wide catalogue of -45,000 IESs, in order to gain insight into their evolutionary origin and excision mechanism. We obtained direct evidence that PiggyMac is required for excision of all IESs. Homology with known P. tetraurelia Tc1/mariner transposons, described here, indicates that at least a fraction of IESs derive from these elements. Most IES insertions occurred before a recent whole-genome duplication that preceded diversification of the P. aurelia species complex, but IES invasion of the Paramecium genome appears to be an ongoing process. Once inserted, IESs decay rapidly by accumulation of deletions and point substitutions. Over 90% of the IESs are shorter than 150 bp and present a remarkable size distribution with a -10 bp periodicity, corresponding to the helical repeat of double-stranded DNA and suggesting DNA loop formation during assembly of a transpososome-like excision complex. IESs are equally frequent within and between coding sequences; however, excision is not 100% efficient and there is selective pressure against IES insertions, in particular within highly expressed genes. We discuss the possibility that ancient domestication of a piggyBac transposase favored subsequent propagation of transposons throughout the germline by allowing insertions in coding sequences, a fraction of the

  11. The Paramecium germline genome provides a niche for intragenic parasitic DNA: evolutionary dynamics of internal eliminated sequences.

    Directory of Open Access Journals (Sweden)

    Olivier Arnaiz

    Full Text Available Insertions of parasitic DNA within coding sequences are usually deleterious and are generally counter-selected during evolution. Thanks to nuclear dimorphism, ciliates provide unique models to study the fate of such insertions. Their germline genome undergoes extensive rearrangements during development of a new somatic macronucleus from the germline micronucleus following sexual events. In Paramecium, these rearrangements include precise excision of unique-copy Internal Eliminated Sequences (IES from the somatic DNA, requiring the activity of a domesticated piggyBac transposase, PiggyMac. We have sequenced Paramecium tetraurelia germline DNA, establishing a genome-wide catalogue of -45,000 IESs, in order to gain insight into their evolutionary origin and excision mechanism. We obtained direct evidence that PiggyMac is required for excision of all IESs. Homology with known P. tetraurelia Tc1/mariner transposons, described here, indicates that at least a fraction of IESs derive from these elements. Most IES insertions occurred before a recent whole-genome duplication that preceded diversification of the P. aurelia species complex, but IES invasion of the Paramecium genome appears to be an ongoing process. Once inserted, IESs decay rapidly by accumulation of deletions and point substitutions. Over 90% of the IESs are shorter than 150 bp and present a remarkable size distribution with a -10 bp periodicity, corresponding to the helical repeat of double-stranded DNA and suggesting DNA loop formation during assembly of a transpososome-like excision complex. IESs are equally frequent within and between coding sequences; however, excision is not 100% efficient and there is selective pressure against IES insertions, in particular within highly expressed genes. We discuss the possibility that ancient domestication of a piggyBac transposase favored subsequent propagation of transposons throughout the germline by allowing insertions in coding sequences, a

  12. Simple sequence repeat (SSR)-based genetic variability among ...

    African Journals Online (AJOL)

    The objective of this study was to compare if simple sequence repeat (SSR) markers could correctly identify peanut genotypes with difference in specific leaf weight (SLW) and relative water content (RWC). Four peanut genotypes and two water regimes (FC and 1/3 available water; 1/3 AW) were arranged in factorial ...

  13. Recurrence plot analysis of DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    Wu Zuobing [State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing 100080 (China)]. E-mail: wuzb@lnm.imech.ac.cn

    2004-11-15

    Recurrence plot technique of DNA sequences is established on metric representation and employed to analyze correlation structure of nucleotide strings. It is found that, in the transference of nucleotide strings, a human DNA fragment has a major correlation distance, but a yeast chromosome's correlation distance has a constant increasing.

  14. Importance of the Sequence-Directed DNA Shape for Specific Binding Site Recognition by the Estrogen-Related Receptor

    Directory of Open Access Journals (Sweden)

    Kareem Mohideen-Abdul

    2017-06-01

    Full Text Available Most nuclear receptors (NRs bind DNA as dimers, either as hetero- or as homodimers on DNA sequences organized as two half-sites with specific orientation and spacing. The dimerization of NRs on their cognate response elements (REs involves specific protein–DNA and protein–protein interactions. The estrogen-related receptor (ERR belongs to the steroid hormone nuclear receptor (SHR family and shares strong similarity in its DNA-binding domain (DBD with that of the estrogen receptor (ER. In vitro, ERR binds with high affinity inverted repeat REs with a 3-bps spacing (IR3, but in vivo, it preferentially binds to single half-site REs extended at the 5′-end by 3 bp [estrogen-related response element (ERREs], thus explaining why ERR was often inferred as a purely monomeric receptor. Since its C-terminal ligand-binding domain is known to homodimerize with a strong dimer interface, we investigated the binding behavior of the isolated DBDs to different REs using electrophoretic migration, multi-angle static laser light scattering (MALLS, non-denaturing mass spectrometry, and nuclear magnetic resonance. In contrast to ER DBD, ERR DBD binds as a monomer to EREs (IR3, such as the tff1 ERE-IR3, but we identified a DNA sequence composed of an extended half-site embedded within an IR3 element (embedded ERRE/IR3, where stable dimer binding is observed. Using a series of chimera and mutant DNA sequences of ERREs and IR3 REs, we have found the key determinants for the binding of ERR DBD as a dimer. Our results suggest that the sequence-directed DNA shape is more important than the exact nucleotide sequence for the binding of ERR DBD to DNA as a dimer. Our work underlines the importance of the shape-driven DNA readout mechanisms based on minor groove recognition and electrostatic potential. These conclusions may apply not only to ERR but also to other members of the SHR family, such as androgen or glucocorticoid, for which a strong well-conserved half

  15. Comparative Analysis of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) of Streptococcus thermophilus St-I and its Bacteriophage-Insensitive Mutants (BIM) Derivatives.

    Science.gov (United States)

    Li, Wan; Bian, Xin; Evivie, Smith Etareri; Huo, Gui-Cheng

    2016-09-01

    The CRISPR-Cas (CRISPR together with CRISPR-associated proteins) modules are the adaptive immune system, acting as an adaptive and heritable immune system in bacteria and archaea. CRISPR-based immunity acts by integrating short virus sequences in the cell's CRISPR locus, allowing the cell to remember, recognize, and clear infections. In this study, the homology of CRISPRs sequence in BIMs (bacteriophage-insensitive mutants) of Streptococcus thermophilus St-I were analyzed. Secondary structures of the repeats and the PAMs (protospacer-associated motif) of each CRISPR locus were also predicted. Results showed that CRISPR1 has 27 repeat-spacer units, 5 of them had duplicates; CRISPR2 has one repeat-spacer unit; CRISPR3 has 28 repeat-spacer units. Only BIM1 had a new spacer acquisition in CRISPR3, while BIM2 and BIM3 had no new spacers' insertion, thus indicating that while most CRISPR1 were more active than CRISPR3, new spacer acquisition occurred just in CRSPR3 in some situations. These findings will help establish the foundation for the study of CRSPR-Cas systems in lactic acid bacteria.

  16. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    Science.gov (United States)

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  17. Chromatid interchanges at intrachromosomal telomeric DNA sequences

    International Nuclear Information System (INIS)

    Fernandez, J.L.; Vazquez-Gundin, F.; Bilbao, A.; Gosalvez, J.; Goyanes, V.

    1997-01-01

    Chinese hamster Don cells were exposed to X-rays, mitomycin C and teniposide (VM-26) to induce chromatid exchanges (quadriradials and triradials). After fluorescence in situ hybridization (FISH) of telomere sequences it was found that interstitial telomere-like DNA sequence arrays presented around five times more breakage-rearrangements than the genome overall. This high recombinogenic capacity was independent of the clastogen, suggesting that this susceptibility is not related to the initial mechanisms of DNA damage. (author)

  18. Development of a defined-sequence DNA system for use in DNA misrepair studies

    International Nuclear Information System (INIS)

    Sutton, S.; Tobias, C.A.

    1984-01-01

    The authors have developed a system that allows them to study cellular DNA repair processes at the molecular level. In particular, the authors are using this system to examine the consequences of a misrepair of radiation-induced DNA damage, as a function of dose. The cells being used are specially engineered haploid yeast cells. Maintained in the cells, at one copy per cell, is a cen plasmid, a plasmid that behaves like a functional chromosome. This plasmid carries a small defined sequence of DNA from the E. coli lac z gene. It is this lac z region (called the alpha region) that serves as the target for radiation damage. Two copies of the complimentary portion of the lac z gene are integrated into the yeast genome. Irradiated cells are screened for possible mutation in the alpha region by testing the cells' ability to hydrolyze xgal, a lactose substrate. The DNA of interest is then extracted from the cells, sequenced, and the sequence is compared to that of the control. Unlike the usual defined-sequence DNA systems, theirs is an in vivo system. A disadvantage is the relatively high background mutation rate. Results achieved with this system, as well as future applications, are discussed

  19. Nucleotide sequence of a cDNA for branched chain acyltransferase with analysis of the deduced protein structure

    International Nuclear Information System (INIS)

    Hummel, K.B.; Litwer, S.; Bradford, A.P.; Aitken, A.; Danner, D.J.; Yeaman, S.J.

    1988-01-01

    Nucleotide sequence was determined for a 1.6-kilobase human cDNA putative for the branched chain acyltransferase protein of the branched chain α-ketoacid dehydrogenase complex. Translation of the sequence reveals an open reading frame encoding a 315-amino acid protein of molecular weight 35,759 followed by 560 bases of 3'-untranslated sequence. Three repeats of the polyadenylation signal hexamer ATTAAA are present prior to the polyadenylate tail. Within the open reading frame is a 10-amino acid fragment which matches exactly the amino acid sequence around the lipoate-lysine residue in bovine kidney branched chain acyltransferase, thus confirming the identity of the cDNA. Analysis of the deduced protein structure for the human branched chain acyltransferase revealed an organization into domains similar to that reported for the acyltransferase proteins of the pyruvate and α-ketoglutarate dehydrogenase complexes. This similarity in organization suggests that a more detailed analysis of the proteins will be required to explain the individual substrate and multienzyme complex specificity shown by these acyltransferases

  20. Development of biometric DNA ink for authentication security.

    Science.gov (United States)

    Hashiyada, Masaki

    2004-10-01

    Among the various types of biometric personal identification systems, DNA provides the most reliable personal identification. It is intrinsically digital and unchangeable while the person is alive, and even after his/her death. Increasing the number of DNA loci examined can enhance the power of discrimination. This report describes the development of DNA ink, which contains synthetic DNA mixed with printing inks. Single-stranded DNA fragments encoding a personalized set of short tandem repeats (STR) were synthesized. The sequence was defined as follows. First, a decimal DNA personal identification (DNA-ID) was established based on the number of STRs in the locus. Next, this DNA-ID was encrypted using a binary, 160-bit algorithm, using a hashing function to protect privacy. Since this function is irreversible, no one can recover the original information from the encrypted code. Finally, the bit series generated above is transformed into base sequences, and double-stranded DNA fragments are amplified by the polymerase chain reaction (PCR) to protect against physical attacks. Synthesized DNA was detected successfully after samples printed in DNA ink were subjected to several resistance tests used to assess the stability of printing inks. Endurance test results showed that this DNA ink would be suitable for practical use as a printing ink and was resistant to 40 hours of ultraviolet exposure, performance commensurate with that of photogravure ink. Copyright 2004 Tohoku University Medical Press

  1. Adenoviral DNA replication: DNA sequences and enzymes required for initiation in vitro

    International Nuclear Information System (INIS)

    Stillman, B.W.; Tamanoi, F.

    1983-01-01

    In this paper evidence is provided that the 140,000-dalton DNA polymerase is encoded by the adenoviral genome and is required for the initiation of DNA replication in vitro. The DNA sequences in the template DNA that are required for the initiation of replication have also been identified, using both plasmid DNAs and synthetic oligodeoxyribonucleotides. 48 references, 7 figures, 1 table

  2. DNA minor groove electrostatic potential: influence of sequence-specific transitions of the torsion angle gamma and deoxyribose conformations.

    Science.gov (United States)

    Zhitnikova, M Y; Shestopalova, A V

    2017-11-01

    The structural adjustments of the sugar-phosphate DNA backbone (switching of the γ angle (O5'-C5'-C4'-C3') from canonical to alternative conformations and/or C2'-endo → C3'-endo transition of deoxyribose) lead to the sequence-specific changes in accessible surface area of both polar and non-polar atoms of the grooves and the polar/hydrophobic profile of the latter ones. The distribution of the minor groove electrostatic potential is likely to be changing as a result of such conformational rearrangements in sugar-phosphate DNA backbone. Our analysis of the crystal structures of the short free DNA fragments and calculation of their electrostatic potentials allowed us to determine: (1) the number of classical and alternative γ angle conformations in the free B-DNA; (2) changes in the minor groove electrostatic potential, depending on the conformation of the sugar-phosphate DNA backbone; (3) the effect of the DNA sequence on the minor groove electrostatic potential. We have demonstrated that the structural adjustments of the DNA double helix (the conformations of the sugar-phosphate backbone and the minor groove dimensions) induce changes in the distribution of the minor groove electrostatic potential and are sequence-specific. Therefore, these features of the minor groove sizes and distribution of minor groove electrostatic potential can be used as a signal for recognition of the target DNA sequence by protein in the implementation of the indirect readout mechanism.

  3. PSSRdb: a relational database of polymorphic simple sequence repeats extracted from prokaryotic genomes.

    Science.gov (United States)

    Kumar, Pankaj; Chaitanya, Pasumarthy S; Nagarajaram, Hampapathalu A

    2011-01-01

    PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1-6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.

  4. RANDNA: a random DNA sequence generator.

    Science.gov (United States)

    Piva, Francesco; Principato, Giovanni

    2006-01-01

    Monte Carlo simulations are useful to verify the significance of data. Genomic regularities, such as the nucleotide correlations or the not uniform distribution of the motifs throughout genomic or mature mRNA sequences, exist and their significance can be checked by means of the Monte Carlo test. The test needs good quality random sequences in order to work, moreover they should have the same nucleotide distribution as the sequences in which the regularities have been found. Random DNA sequences are also useful to estimate the background score of an alignment, that is a threshold below which the resulting score is merely due to chance. We have developed RANDNA, a free software which allows to produce random DNA or RNA sequences setting both their length and the percentage of nucleotide composition. Sequences having the same nucleotide distribution of exonic, intronic or intergenic sequences can be generated. Its graphic interface makes it possible to easily set the parameters that characterize the sequences being produced and saved in a text format file. The pseudo-random number generator function of Borland Delphi 6 is used, since it guarantees a good randomness, a long cycle length and a high speed. We have checked the quality of sequences generated by the software, by means of well-known tests, both by themselves and versus genuine random sequences. We show the good quality of the generated sequences. The software, complete with examples and documentation, is freely available to users from: http://www.introni.it/en/software.

  5. Characterization of the major formamidopyrimidine-DNA glycosylase homolog in Mycobacterium tuberculosis and its linkage to variable tandem repeats.

    Science.gov (United States)

    Olsen, Ingrid; Balasingham, Seetha V; Davidsen, Tonje; Debebe, Ephrem; Rødland, Einar A; van Soolingen, Dick; Kremer, Kristin; Alseth, Ingrun; Tønjum, Tone

    2009-07-01

    The ability to repair DNA damage is likely to play an important role in the survival of facultative intracellular parasites because they are exposed to high levels of reactive oxygen species and nitrogen intermediates inside phagocytes. Correcting oxidative damage in purines and pyrimidines is the primary function of the enzymes formamidopyrimidine (faPy)-DNA glycosylase (Fpg) and endonuclease VIII (Nei) of the base excision repair pathway, respectively. Four gene homologs, belonging to the fpg/nei family, have been identified in Mycobacterium tuberculosis H37Rv. The recombinant protein encoded by M. tuberculosis Rv2924c, termed Mtb-Fpg1, was overexpressed, purified and biochemically characterized. The enzyme removed faPy and 5-hydroxycytosine lesions, as well as 8-oxo-7,8-dihydroguanine (8oxoG) opposite to C, T and G. Mtb-Fpg1 thus exhibited substrate specificities typical for Fpg enzymes. Although Mtb-fpg1 showed nearly complete nucleotide sequence conservation in 32 M. tuberculosis isolates, the region upstream of Mtb-fpg1 in these strains contained tandem repeat motifs of variable length. A relationship between repeat length and Mtb-fpg1 expression level was demonstrated in M. tuberculosis strains, indicating that an increased length of the tandem repeats positively influenced the expression levels of Mtb-fpg1. This is the first example of such a tandem repeat region of variable length being linked to the expression level of a bacterial gene.

  6. Google matrix analysis of DNA sequences.

    Science.gov (United States)

    Kandiah, Vivek; Shepelyansky, Dima L

    2013-01-01

    For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  7. Google matrix analysis of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Vivek Kandiah

    Full Text Available For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW. At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  8. Chaos game representation (CGR)-walk model for DNA sequences

    International Nuclear Information System (INIS)

    Jie, Gao; Zhen-Yuan, Xu

    2009-01-01

    Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model. (cross-disciplinary physics and related areas of science and technology)

  9. Sequence-Dependent Diastereospecific and Diastereodivergent Crosslinking of DNA by Decarbamoylmitomycin C.

    Science.gov (United States)

    Aguilar, William; Paz, Manuel M; Vargas, Anayatzinc; Clement, Cristina C; Cheng, Shu-Yuan; Champeil, Elise

    2018-04-20

    Mitomycin C (MC), a potent antitumor drug, and decarbamoylmitomycin C (DMC), a derivative lacking the carbamoyl group, form highly cytotoxic DNA interstrand crosslinks. The major interstrand crosslink formed by DMC is the C1'' epimer of the major crosslink formed by MC. The molecular basis for the stereochemical configuration exhibited by DMC was investigated using biomimetic synthesis. The formation of DNA-DNA crosslinks by DMC is diastereospecific and diastereodivergent: Only the 1''S-diastereomer of the initially formed monoadduct can form crosslinks at GpC sequences, and only the 1''R-diastereomer of the monoadduct can form crosslinks at CpG sequences. We also show that CpG and GpC sequences react with divergent diastereoselectivity in the first alkylation step: 1"S stereochemistry is favored at GpC sequences and 1''R stereochemistry is favored at CpG sequences. Therefore, the first alkylation step results, at each sequence, in the selective formation of the diastereomer able to generate an interstrand DNA-DNA crosslink after the "second arm" alkylation. Examination of the known DNA adduct pattern obtained after treatment of cancer cell cultures with DMC indicates that the GpC sequence is the major target for the formation of DNA-DNA crosslinks in vivo by this drug. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Roles of repetitive sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bell, G.I.

    1991-12-31

    The DNA of higher eukaryotes contains many repetitive sequences. The study of repetitive sequences is important, not only because many have important biological function, but also because they provide information on genome organization, evolution and dynamics. In this paper, I will first discuss some generic effects that repetitive sequences will have upon genome dynamics and evolution. In particular, it will be shown that repetitive sequences foster recombination among, and turnover of, the elements of a genome. I will then consider some examples of repetitive sequences, notably minisatellite sequences and telomere sequences as examples of tandem repeats, without and with respectively known function, and Alu sequences as an example of interspersed repeats. Some other examples will also be considered in less detail.

  11. Phylogenetic analysis of Gossypium L. using restriction fragment length polymorphism of repeated sequences.

    Science.gov (United States)

    Zhang, Meiping; Rong, Ying; Lee, Mi-Kyung; Zhang, Yang; Stelly, David M; Zhang, Hong-Bin

    2015-10-01

    Cotton is the world's leading textile fiber crop and is also grown as a bioenergy and food crop. Knowledge of the phylogeny of closely related species and the genome origin and evolution of polyploid species is significant for advanced genomics research and breeding. We have reconstructed the phylogeny of the cotton genus, Gossypium L., and deciphered the genome origin and evolution of its five polyploid species by restriction fragment analysis of repeated sequences. Nuclear DNA of 84 accessions representing 35 species and all eight genomes of the genus were analyzed. The phylogenetic tree of the genus was reconstructed using the parsimony method on 1033 polymorphic repeated sequence restriction fragments. The genome origin of its polyploids was determined by calculating the diploid-polyploid restriction fragment correspondence (RFC). The tree is consistent with the morphological classification, genome designation and geographic distribution of the species at subgenus, section and subsection levels. Gossypium lobatum (D7) was unambiguously shown to have the highest RFC with the D-subgenomes of all five polyploids of the genus, while the common ancestor of Gossypium herbaceum (A1) and Gossypium arboreum (A2) likely contributed to the A-subgenomes of the polyploids. These results provide a comprehensive phylogenetic tree of the cotton genus and new insights into the genome origin and evolution of its polyploid species. The results also further demonstrate a simple, rapid and inexpensive method suitable for phylogenetic analysis of closely related species, especially congeneric species, and the inference of genome origin of polyploids that constitute over 70 % of flowering plants.

  12. Assessing the fidelity of ancient DNA sequences amplified from nuclear genes

    DEFF Research Database (Denmark)

    Binladen, Jonas; Wiuf, Carsten Henrik; Gilbert, M. Thomas P.

    2006-01-01

    To date, the field of ancient DNA has relied almost exclusively on mitochondrial DNA (mtDNA) sequences. However, a number of recent studies have reported the successful recovery of ancient nuclear DNA (nuDNA) sequences, thereby allowing the characterization of genetic loci directly involved...... in phenotypic traits of extinct taxa. It is well documented that postmortem damage in ancient mtDNA can lead to the generation of artifactual sequences. However, as yet no one has thoroughly investigated the damage spectrum in ancient nuDNA. By comparing clone sequences from 23 fossil specimens, recovered from...... adenine), respectively. Type 2 transitions are by far the most dominant and increase relative to those of type 1 with damage load. The results suggest that the deamination of cytosine (and 5-methyl cytosine) to uracil (and thymine) is the main cause of miscoding lesions in both ancient mtDNA and nu...

  13. Comparison of variable region 3 sequences of human immunodeficiency virus type 1 from infected children with the RNA and DNA sequences of the virus populations of their mothers.

    Science.gov (United States)

    Scarlatti, G; Leitner, T; Halapi, E; Wahlberg, J; Marchisio, P; Clerici-Schoeller, M A; Wigzell, H; Fenyö, E M; Albert, J; Uhlén, M

    1993-01-01

    We have compared the variable region 3 sequences from 10 human immunodeficiency virus type 1 (HIV-1)-infected infants to virus sequences from the corresponding mothers. The sequences were derived from DNA of uncultured peripheral blood mononuclear cells (PBMC), DNA of cultured PBMC, and RNA from serum collected at or shortly after delivery. The infected infants, in contrast to the mothers, harbored homogeneous virus populations. Comparison of sequences from the children and clones derived from DNA of the corresponding mothers showed that the transmitted virus represented either a minor or a major virus population of the mother. In contrast to an earlier study, we found no evidence of selection of minor virus variants during transmission. Furthermore, the transmitted virus variant did not show any characteristic molecular features. In some cases the transmitted virus was more related to the virus RNA population of the mother and in other cases it was more related to the virus DNA population. This suggests that either cell-free or cell-associated virus may be transmitted. These data will help AIDS researchers to understand the mechanism of transmission and to plan strategies for prevention of transmission. PMID:8446584

  14. Characterization of new Schistosoma mansoni microsatellite loci in sequences obtained from public DNA databases and microsatellite enriched genomic libraries

    Directory of Open Access Journals (Sweden)

    Rodrigues NB

    2002-01-01

    Full Text Available In the last decade microsatellites have become one of the most useful genetic markers used in a large number of organisms due to their abundance and high level of polymorphism. Microsatellites have been used for individual identification, paternity tests, forensic studies and population genetics. Data on microsatellite abundance comes preferentially from microsatellite enriched libraries and DNA sequence databases. We have conducted a search in GenBank of more than 16,000 Schistosoma mansoni ESTs and 42,000 BAC sequences. In addition, we obtained 300 sequences from CA and AT microsatellite enriched genomic libraries. The sequences were searched for simple repeats using the RepeatMasker software. Of 16,022 ESTs, we detected 481 (3% sequences that contained 622 microsatellites (434 perfect, 164 imperfect and 24 compounds. Of the 481 ESTs, 194 were grouped in 63 clusters containing 2 to 15 ESTs per cluster. Polymorphisms were observed in 16 clusters. The 287 remaining ESTs were orphan sequences. Of the 42,017 BAC end sequences, 1,598 (3.8% contained microsatellites (2,335 perfect, 287 imperfect and 79 compounds. The 1,598 BAC end sequences 80 were grouped into 17 clusters containing 3 to 17 BAC end sequences per cluster. Microsatellites were present in 67 out of 300 sequences from microsatellite enriched libraries (55 perfect, 38 imperfect and 15 compounds. From all of the observed loci 55 were selected for having the longest perfect repeats and flanking regions that allowed the design of primers for PCR amplification. Additionally we describe two new polymorphic microsatellite loci.

  15. Multiple aspects of ATP-dependent nucleosome translocation by RSC and Mi-2 are directed by the underlying DNA sequence.

    Directory of Open Access Journals (Sweden)

    Joke J F A van Vugt

    Full Text Available BACKGROUND: Chromosome structure, DNA metabolic processes and cell type identity can all be affected by changing the positions of nucleosomes along chromosomal DNA, a reaction that is catalysed by SNF2-type ATP-driven chromatin remodelers. Recently it was suggested that in vivo, more than 50% of the nucleosome positions can be predicted simply by DNA sequence, especially within promoter regions. This seemingly contrasts with remodeler induced nucleosome mobility. The ability of remodeling enzymes to mobilise nucleosomes over short DNA distances is well documented. However, the nucleosome translocation processivity along DNA remains elusive. Furthermore, it is unknown what determines the initial direction of movement and how new nucleosome positions are adopted. METHODOLOGY/PRINCIPAL FINDINGS: We have used AFM imaging and high resolution PAGE of mononucleosomes on 600 and 2500 bp DNA molecules to analyze ATP-dependent nucleosome repositioning by native and recombinant SNF2-type enzymes. We report that the underlying DNA sequence can control the initial direction of translocation, translocation distance, as well as the new positions adopted by nucleosomes upon enzymatic mobilization. Within a strong nucleosomal positioning sequence both recombinant Drosophila Mi-2 (CHD-type and native RSC from yeast (SWI/SNF-type repositioned the nucleosome at 10 bp intervals, which are intrinsic to the positioning sequence. Furthermore, RSC-catalyzed nucleosome translocation was noticeably more efficient when beyond the influence of this sequence. Interestingly, under limiting ATP conditions RSC preferred to position the nucleosome with 20 bp intervals within the positioning sequence, suggesting that native RSC preferentially translocates nucleosomes with 15 to 25 bp DNA steps. CONCLUSIONS/SIGNIFICANCE: Nucleosome repositioning thus appears to be influenced by both remodeler intrinsic and DNA sequence specific properties that interplay to define ATPase

  16. Thermodynamics of sequence-specific binding of PNA to DNA

    DEFF Research Database (Denmark)

    Ratilainen, T; Holmén, A; Tuite, E

    2000-01-01

    For further characterization of the hybridization properties of peptide nucleic acids (PNAs), the thermodynamics of hybridization of mixed sequence PNA-DNA duplexes have been studied. We have characterized the binding of PNA to DNA in terms of binding affinity (perfectly matched duplexes) and seq......For further characterization of the hybridization properties of peptide nucleic acids (PNAs), the thermodynamics of hybridization of mixed sequence PNA-DNA duplexes have been studied. We have characterized the binding of PNA to DNA in terms of binding affinity (perfectly matched duplexes...

  17. The complete chloroplast DNA sequence of the green alga Oltmannsiellopsis viridis reveals a distinctive quadripartite architecture in the chloroplast genome of early diverging ulvophytes

    Directory of Open Access Journals (Sweden)

    Lemieux Claude

    2006-02-01

    Full Text Available Abstract Background The phylum Chlorophyta contains the majority of the green algae and is divided into four classes. The basal position of the Prasinophyceae has been well documented, but the divergence order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae is currently debated. The four complete chloroplast DNA (cpDNA sequences presently available for representatives of these classes have revealed extensive variability in overall structure, gene content, intron composition and gene order. The chloroplast genome of Pseudendoclonium (Ulvophyceae, in particular, is characterized by an atypical quadripartite architecture that deviates from the ancestral type by a large inverted repeat (IR featuring an inverted rRNA operon and a small single-copy (SSC region containing 14 genes normally found in the large single-copy (LSC region. To gain insights into the nature of the events that led to the reorganization of the chloroplast genome in the Ulvophyceae, we have determined the complete cpDNA sequence of Oltmannsiellopsis viridis, a representative of a distinct, early diverging lineage. Results The 151,933 bp IR-containing genome of Oltmannsiellopsis differs considerably from Pseudendoclonium and other chlorophyte cpDNAs in intron content and gene order, but shares close similarities with its ulvophyte homologue at the levels of quadripartite architecture, gene content and gene density. Oltmannsiellopsis cpDNA encodes 105 genes, contains five group I introns, and features many short dispersed repeats. As in Pseudendoclonium cpDNA, the rRNA genes in the IR are transcribed toward the single copy region featuring the genes typically found in the ancestral LSC region, and the opposite single copy region harbours genes characteristic of both the ancestral SSC and LSC regions. The 52 genes that were transferred from the ancestral LSC to SSC region include 12 of those observed in Pseudendoclonium cpDNA. Surprisingly, the overall gene organization of

  18. Enhanced throughput for infrared automated DNA sequencing

    Science.gov (United States)

    Middendorf, Lyle R.; Gartside, Bill O.; Humphrey, Pat G.; Roemer, Stephen C.; Sorensen, David R.; Steffens, David L.; Sutter, Scott L.

    1995-04-01

    Several enhancements have been developed and applied to infrared automated DNA sequencing resulting in significantly higher throughput. A 41 cm sequencing gel (31 cm well- to-read distance) combines high resolution of DNA sequencing fragments with optimized run times yielding two runs per day of 500 bases per sample. A 66 cm sequencing gel (56 cm well-to-read distance) produces sequence read lengths of up to 1000 bases for ds and ss templates using either T7 polymerase or cycle-sequencing protocols. Using a multichannel syringe to load 64 lanes allows 16 samples (compatible with 96-well format) to be visualized for each run. The 41 cm gel configuration allows 16,000 bases per day (16 samples X 500 bases/sample X 2 ten hour runs/day) to be sequenced with the advantages of infrared technology. Enhancements to internal labeling techniques using an infrared-labeled dATP molecule (Boehringer Mannheim GmbH, Penzberg, Germany; Sequenase (U.S. Biochemical) have also been made. The inclusion of glycerol in the sequencing reactions yields greatly improved results for some primer and template combinations. The inclusion of (alpha) -Thio-dNTP's in the labeling reaction increases signal intensity two- to three-fold.

  19. Methylation patterns of repetitive DNA sequences in germ cells of Mus musculus.

    Science.gov (United States)

    Sanford, J; Forrester, L; Chapman, V; Chandley, A; Hastie, N

    1984-03-26

    The major and the minor satellite sequences of Mus musculus were undermethylated in both sperm and oocyte DNAs relative to the amount of undermethylation observed in adult somatic tissue DNA. This hypomethylation was specific for satellite sequences in sperm DNA. Dispersed repetitive and low copy sequences show a high degree of methylation in sperm DNA; however, a dispersed repetitive sequence was undermethylated in oocyte DNA. This finding suggests a difference in the amount of total genomic DNA methylation between sperm and oocyte DNA. The methylation levels of the minor satellite sequences did not change during spermiogenesis, and were not associated with the onset of meiosis or a specific stage in sperm development.

  20. Isolation and sequence analysis of the wheat B genome subtelomeric DNA

    Directory of Open Access Journals (Sweden)

    Huneau Cecile

    2009-09-01

    Full Text Available Abstract Background Telomeric and subtelomeric regions are essential for genome stability and regular chromosome replication. In this work, we have characterized the wheat BAC (bacterial artificial chromosome clones containing Spelt1 and Spelt52 sequences, which belong to the subtelomeric repeats of the B/G genomes of wheats and Aegilops species from the section Sitopsis. Results The BAC library from Triticum aestivum cv. Renan was screened using Spelt1 and Spelt52 as probes. Nine positive clones were isolated; of them, clone 2050O8 was localized mainly to the distal parts of wheat chromosomes by in situ hybridization. The distribution of the other clones indicated the presence of different types of repetitive sequences in BACs. Use of different approaches allowed us to prove that seven of the nine isolated clones belonged to the subtelomeric chromosomal regions. Clone 2050O8 was sequenced and its sequence of 119 737 bp was annotated. It is composed of 33% transposable elements (TEs, 8.2% Spelt52 (namely, the subfamily Spelt52.2 and five non-TE-related genes. DNA transposons are predominant, making up 24.6% of the entire BAC clone, whereas retroelements account for 8.4% of the clone length. The full-length CACTA transposon Caspar covers 11 666 bp, encoding a transposase and CTG-2 proteins, and this transposon accounts for 40% of the DNA transposons. The in situ hybridization data for 2050O8 derived subclones in combination with the BLAST search against wheat mapped ESTs (expressed sequence tags suggest that clone 2050O8 is located in the terminal bin 4BL-10 (0.95-1.0. Additionally, four of the predicted 2050O8 genes showed significant homology to four putative orthologous rice genes in the distal part of rice chromosome 3S and confirm the synteny to wheat 4BL. Conclusion Satellite DNA sequences from the subtelomeric regions of diploid wheat progenitor can be used for selecting the BAC clones from the corresponding regions of hexaploid wheat

  1. Structural and Functional Characterization of an Archaeal Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated Complex for Antiviral Defense (CASCADE)

    DEFF Research Database (Denmark)

    Lintner, Nathanael G; Kerou, Melina; Brumfield, Susan K

    2011-01-01

    In response to viral infection, many prokaryotes incorporate fragments of virus-derived DNA into loci called clustered regularly interspaced short palindromic repeats (CRISPRs). The loci are then transcribed, and the processed CRISPR transcripts are used to target invading viral DNA and RNA....... The Escherichia coli "CRISPR-associated complex for antiviral defense" (CASCADE) is central in targeting invading DNA. Here we report the structural and functional characterization of an archaeal CASCADE (aCASCADE) from Sulfolobus solfataricus. Tagged Csa2 (Cas7) expressed in S. solfataricus co-purifies with Cas5......a-, Cas6-, Csa5-, and Cas6-processed CRISPR-RNA (crRNA). Csa2, the dominant protein in aCASCADE, forms a stable complex with Cas5a. Transmission electron microscopy reveals a helical complex of variable length, perhaps due to substoichiometric amounts of other CASCADE components. A recombinant Csa2...

  2. DNA-PK dependent targeting of DNA-ends to a protein complex assembled on matrix attachment region DNA sequences

    International Nuclear Information System (INIS)

    Mauldin, S.K.; Getts, R.C.; Perez, M.L.; DiRienzo, S.; Stamato, T.D.

    2003-01-01

    Full text: We find that nuclear protein extracts from mammalian cells contain an activity that allows DNA ends to associate with circular pUC18 plasmid DNA. This activity requires the catalytic subunit of DNA-PK (DNA-PKcs) and Ku since it was not observed in mutants lacking Ku or DNA-PKcs but was observed when purified Ku/DNA-PKcs was added to these mutant extracts. Competition experiments between pUC18 and pUC18 plasmids containing various nuclear matrix attachment region (MAR) sequences suggest that DNA ends preferentially associate with plasmids containing MAR DNA sequences. At a 1:5 mass ratio of MAR to pUC18, approximately equal amounts of DNA end binding to the two plasmids were observed, while at a 1:1 ratio no pUC18 end-binding was observed. Calculation of relative binding activities indicates that DNA-end binding activities to MAR sequences was 7 to 21 fold higher than pUC18. Western analysis of proteins bound to pUC18 and MAR plasmids indicates that XRCC4, DNA ligase IV, scaffold attachment factor A, topoisomerase II, and poly(ADP-ribose) polymerase preferentially associate with the MAR plasmid in the absence or presence of DNA ends. In contrast, Ku and DNA-PKcs were found on the MAR plasmid only in the presence of DNA ends. After electroporation of a 32P-labeled DNA probe into human cells and cell fractionation, 87% of the total intercellular radioactivity remained in nuclei after a 0.5M NaCl extraction suggesting the probe was strongly bound in the nucleus. The above observations raise the possibility that DNA-PK targets DNA-ends to a repair and/or DNA damage signaling complex which is assembled on MAR sites in the nucleus

  3. Effective DNA Inhibitors of Cathepsin G by In Vitro Selection

    Science.gov (United States)

    Gatto, Barbara; Vianini, Elena; Lucatello, Lorena; Sissi, Claudia; Moltrasio, Danilo; Pescador, Rodolfo; Porta, Roberto; Palumbo, Manlio

    2008-01-01

    Cathepsin G (CatG) is a chymotrypsin-like protease released upon degranulation of neutrophils. In several inflammatory and ischaemic diseases the impaired balance between CatG and its physiological inhibitors leads to tissue destruction and platelet aggregation. Inhibitors of CatG are suitable for the treatment of inflammatory diseases and procoagulant conditions. DNA released upon the death of neutrophils at injury sites binds CatG. Moreover, short DNA fragments are more inhibitory than genomic DNA. Defibrotide, a single stranded polydeoxyribonucleotide with antithrombotic effect is also a potent CatG inhibitor. Given the above experimental evidences we employed a selection protocol to assess whether DNA inhibition of CatG may be ascribed to specific sequences present in defibrotide DNA. A Selex protocol was applied to identify the single-stranded DNA sequences exhibiting the highest affinity for CatG, the diversity of a combinatorial pool of oligodeoxyribonucleotides being a good representation of the complexity found in defibrotide. Biophysical and biochemical studies confirmed that the selected sequences bind tightly to the target enzyme and also efficiently inhibit its catalytic activity. Sequence analysis carried out to unveil a motif responsible for CatG recognition showed a recurrence of alternating TG repeats in the selected CatG binders, adopting an extended conformation that grants maximal interaction with the highly charged protein surface. This unprecedented finding is validated by our results showing high affinity and inhibition of CatG by specific DNA sequences of variable length designed to maximally reduce pairing/folding interactions. PMID:19325843

  4. Effective DNA Inhibitors of Cathepsin G by In Vitro Selection

    Directory of Open Access Journals (Sweden)

    Manlio Palumbo

    2008-06-01

    Full Text Available Cathepsin G (CatG is a chymotrypsin-like protease released upon degranulation of neutrophils. In several inflammatory and ischaemic diseases the impaired balance between CatG and its physiological inhibitors leads to tissue destruction and platelet aggregation. Inhibitors of CatG are suitable for the treatment of inflammatory diseases and procoagulant conditions. DNA released upon the death of neutrophils at injury sites binds CatG. Moreover, short DNA fragments are more inhibitory than genomic DNA. Defibrotide, a single stranded polydeoxyribonucleotide with antithrombotic effect is also a potent CatG inhibitor. Given the above experimental evidences we employed a selection protocol to assess whether DNA inhibition of CatG may be ascribed to specific sequences present in defibrotide DNA. A Selex protocol was applied to identify the single-stranded DNA sequences exhibiting the highest affinity for CatG, the diversity of a combinatorial pool of oligodeoxyribonucleotides being a good representation of the complexity found in defibrotide. Biophysical and biochemical studies confirmed that the selected sequences bind tightly to the target enzyme and also efficiently inhibit its catalytic activity. Sequence analysis carried out to unveil a motif responsible for CatG recognition showed a recurrence of alternating TG repeats in the selected CatG binders, adopting an extended conformation that grants maximal interaction with the highly charged protein surface. This unprecedented finding is validated by our results showing high affinity and inhibition of CatG by specific DNA sequences of variable length designed to maximally reduce pairing/folding interactions.

  5. Contrasting Patterns of rDNA Homogenization within the Zygosaccharomyces rouxii Species Complex

    Science.gov (United States)

    Chand Dakal, Tikam; Giudici, Paolo; Solieri, Lisa

    2016-01-01

    Arrays of repetitive ribosomal DNA (rDNA) sequences are generally expected to evolve as a coherent family, where repeats within such a family are more similar to each other than to orthologs in related species. The continuous homogenization of repeats within individual genomes is a recombination process termed concerted evolution. Here, we investigated the extent and the direction of concerted evolution in 43 yeast strains of the Zygosaccharomyces rouxii species complex (Z. rouxii, Z. sapae, Z. mellis), by analyzing two portions of the 35S rDNA cistron, namely the D1/D2 domains at the 5’ end of the 26S rRNA gene and the segment including the internal transcribed spacers (ITS) 1 and 2 (ITS regions). We demonstrate that intra-genomic rDNA sequence variation is unusually frequent in this clade and that rDNA arrays in single genomes consist of an intermixing of Z. rouxii, Z. sapae and Z. mellis-like sequences, putatively evolved by reticulate evolutionary events that involved repeated hybridization between lineages. The levels and distribution of sequence polymorphisms vary across rDNA repeats in different individuals, reflecting four patterns of rDNA evolution: I) rDNA repeats that are homogeneous within a genome but are chimeras derived from two parental lineages via recombination: Z. rouxii in the ITS region and Z. sapae in the D1/D2 region; II) intra-genomic rDNA repeats that retain polymorphisms only in ITS regions; III) rDNA repeats that vary only in their D1/D2 domains; IV) heterogeneous rDNA arrays that have both polymorphic ITS and D1/D2 regions. We argue that an ongoing process of homogenization following allodiplodization or incomplete lineage sorting gave rise to divergent evolutionary trajectories in different strains, depending upon temporal, structural and functional constraints. We discuss the consequences of these findings for Zygosaccharomyces species delineation and, more in general, for yeast barcoding. PMID:27501051

  6. Detecting differential DNA methylation from sequencing of bisulfite converted DNA of diverse species.

    Science.gov (United States)

    Huh, Iksoo; Wu, Xin; Park, Taesung; Yi, Soojin V

    2017-07-21

    DNA methylation is one of the most extensively studied epigenetic modifications of genomic DNA. In recent years, sequencing of bisulfite-converted DNA, particularly via next-generation sequencing technologies, has become a widely popular method to study DNA methylation. This method can be readily applied to a variety of species, dramatically expanding the scope of DNA methylation studies beyond the traditionally studied human and mouse systems. In parallel to the increasing wealth of genomic methylation profiles, many statistical tools have been developed to detect differentially methylated loci (DMLs) or differentially methylated regions (DMRs) between biological conditions. We discuss and summarize several key properties of currently available tools to detect DMLs and DMRs from sequencing of bisulfite-converted DNA. However, the majority of the statistical tools developed for DML/DMR analyses have been validated using only mammalian data sets, and less priority has been placed on the analyses of invertebrate or plant DNA methylation data. We demonstrate that genomic methylation profiles of non-mammalian species are often highly distinct from those of mammalian species using examples of honey bees and humans. We then discuss how such differences in data properties may affect statistical analyses. Based on these differences, we provide three specific recommendations to improve the power and accuracy of DML and DMR analyses of invertebrate data when using currently available statistical tools. These considerations should facilitate systematic and robust analyses of DNA methylation from diverse species, thus advancing our understanding of DNA methylation. © The Author 2017. Published by Oxford University Press.

  7. Next Generation DNA Sequencing and the Future of Genomic Medicine

    OpenAIRE

    Anderson, Matthew W.; Schrijver, Iris

    2010-01-01

    In the years since the first complete human genome sequence was reported, there has been a rapid development of technologies to facilitate high-throughput sequence analysis of DNA (termed “next-generation” sequencing). These novel approaches to DNA sequencing offer the promise of complete genomic analysis at a cost feasible for routine clinical diagnostics. However, the ability to more thoroughly interrogate genomic sequence raises a number of important issues with regard to result interpreta...

  8. Comparative genomics and repetitive sequence divergence in the species of diploid Nicotiana section Alatae.

    Science.gov (United States)

    Lim, K Yoong; Kovarik, Ales; Matyasek, Roman; Chase, Mark W; Knapp, Sandra; McCarthy, Elizabeth; Clarkson, James J; Leitch, Andrew R

    2006-12-01

    Combining phylogenetic reconstructions of species relationships with comparative genomic approaches is a powerful way to decipher evolutionary events associated with genome divergence. Here, we reconstruct the history of karyotype and tandem repeat evolution in species of diploid Nicotiana section Alatae. By analysis of plastid DNA, we resolved two clades with high bootstrap support, one containing N. alata, N. langsdorffii, N. forgetiana and N. bonariensis (called the n = 9 group) and another containing N. plumbaginifolia and N. longiflora (called the n = 10 group). Despite little plastid DNA sequence divergence, we observed, via fluorescent in situ hybridization, substantial chromosomal repatterning, including altered chromosome numbers, structure and distribution of repeats. Effort was focussed on 35S and 5S nuclear ribosomal DNA (rDNA) and the HRS60 satellite family of tandem repeats comprising the elements HRS60, NP3R and NP4R. We compared divergence of these repeats in diploids and polyploids of Nicotiana. There are dramatic shifts in the distribution of the satellite repeats and complete replacement of intergenic spacers (IGSs) of 35S rDNA associated with divergence of the species in section Alatae. We suggest that sequence homogenization has replaced HRS60 family repeats at sub-telomeric regions, but that this process may not occur, or occurs more slowly, when the repeats are found at intercalary locations. Sequence homogenization acts more rapidly (at least two orders of magnitude) on 35S rDNA than 5S rDNA and sub-telomeric satellite sequences. This rapid rate of divergence is analogous to that found in polyploid species, and is therefore, in plants, not only associated with polyploidy.

  9. repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects.

    Science.gov (United States)

    Liu, Bin; Liu, Fule; Fang, Longyun; Wang, Xiaolong; Chou, Kuo-Chen

    2015-04-15

    In order to develop powerful computational predictors for identifying the biological features or attributes of DNAs, one of the most challenging problems is to find a suitable approach to effectively represent the DNA sequences. To facilitate the studies of DNAs and nucleotides, we developed a Python package called representations of DNAs (repDNA) for generating the widely used features reflecting the physicochemical properties and sequence-order effects of DNAs and nucleotides. There are three feature groups composed of 15 features. The first group calculates three nucleic acid composition features describing the local sequence information by means of kmers; the second group calculates six autocorrelation features describing the level of correlation between two oligonucleotides along a DNA sequence in terms of their specific physicochemical properties; the third group calculates six pseudo nucleotide composition features, which can be used to represent a DNA sequence with a discrete model or vector yet still keep considerable sequence-order information via the physicochemical properties of its constituent oligonucleotides. In addition, these features can be easily calculated based on both the built-in and user-defined properties via using repDNA. The repDNA Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repDNA/. bliu@insun.hit.edu.cn or kcchou@gordonlifescience.org Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Next Generation Sequencing of Ancient DNA: Requirements, Strategies and Perspectives

    Directory of Open Access Journals (Sweden)

    Michael Knapp

    2010-07-01

    Full Text Available The invention of next-generation-sequencing has revolutionized almost all fields of genetics, but few have profited from it as much as the field of ancient DNA research. From its beginnings as an interesting but rather marginal discipline, ancient DNA research is now on its way into the centre of evolutionary biology. In less than a year from its invention next-generation-sequencing had increased the amount of DNA sequence data available from extinct organisms by several orders of magnitude. Ancient DNA  research is now not only adding a temporal aspect to evolutionary studies and allowing for the observation of evolution in real time, it also provides important data to help understand the origins of our own species. Here we review progress that has been made in next-generation-sequencing of ancient DNA over the past five years and evaluate sequencing strategies and future directions.

  11. Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA

    Science.gov (United States)

    Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.

    2017-07-01

    DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.

  12. DNA watermarks in non-coding regulatory sequences

    Directory of Open Access Journals (Sweden)

    Pyka Martin

    2009-07-01

    Full Text Available Abstract Background DNA watermarks can be applied to identify the unauthorized use of genetically modified organisms. It has been shown that coding regions can be used to encrypt information into living organisms by using the DNA-Crypt algorithm. Yet, if the sequence of interest presents a non-coding DNA sequence, either the function of a resulting functional RNA molecule or a regulatory sequence, such as a promoter, could be affected. For our studies we used the small cytoplasmic RNA 1 in yeast and the lac promoter region of Escherichia coli. Findings The lac promoter was deactivated by the integrated watermark. In addition, the RNA molecules displayed altered configurations after introducing a watermark, but surprisingly were functionally intact, which has been verified by analyzing the growth characteristics of both wild type and watermarked scR1 transformed yeast cells. In a third approach we introduced a second overlapping watermark into the lac promoter, which did not affect the promoter activity. Conclusion Even though the watermarked RNA and one of the watermarked promoters did not show any significant differences compared to the wild type RNA and wild type promoter region, respectively, it cannot be generalized that other RNA molecules or regulatory sequences behave accordingly. Therefore, we do not recommend integrating watermark sequences into regulatory regions.

  13. [Structural organization of 5S ribosomal DNA of Rosa rugosa].

    Science.gov (United States)

    Tynkevych, Iu O; Volkov, R A

    2014-01-01

    In order to clarify molecular organization of the genomic region encoding 5S rRNA in diploid species Rosa rugosa several 5S rDNA repeated units were cloned and sequenced. Analysis of the obtained sequences revealed that only one length variant of 5S rDNA repeated units, which contains intact promoter elements in the intergenic spacer region (IGS) and appears to be transcriptionally active is present in the genome. Additionally, a limited number of 5S rDNA pseudogenes lacking a portion of coding sequence and the complete IGS was detected. A high level of sequence similarity (from 93.7 to 97.5%) between the IGS of major 5S rDNA variants of East Asian R. rugosa and North American R. nitida was found indicating comparatively recent divergence of these species.

  14. Screening of SHOX gene sequence variants in Saudi Arabian children with idiopathic short stature.

    Science.gov (United States)

    Alharthi, Abdulla A; El-Hallous, Ehab I; Talaat, Iman M; Alghamdi, Hamed A; Almalki, Matar I; Gaber, Ahmed

    2017-10-01

    Short stature affects approximately 2%-3% of children, representing one of the most frequent disorders for which clinical attention is sought during childhood. Despite assumed genetic heterogeneity, mutations or deletions in the short stature homeobox-containing gene ( SHOX ) are frequently detected in subjects with short stature. Idiopathic short stature (ISS) refers to patients with short stature for various unknown reasons. The goal of this study was to screen all the exons of SHOX to identify related mutations. We screened all the exons of SHOX for mutations analysis in 105 ISS children patients (57 girls and 48 boys) living in Taif governorate, KSA using a direct DNA sequencing method. Height, arm span, and sitting height were recorded, and subischial leg length was calculated. A total of 30 of 105 ISS patients (28%) contained six polymorphic variants in exons 1, 2, 4, and 6. One mutation was found in the DNA domain binding region of exon 4. Three of these polymorphic variants were novel, while the others were reported previously. There were no significant differences in anthropometric measures in ISS patients with and without identifiable polymorphic variants in SHOX . In Saudi Arabia ISS patients, rather than SHOX , it is possible that new genes are involved in longitudinal growth. Additional molecular analysis is required to diagnose and understand the etiology of this disease.

  15. Mitochondrial DNA sequence evolution in shorebird populations

    NARCIS (Netherlands)

    Wenink, P.W.

    1994-01-01

    This thesis describes the global molecular population structure of two shorebird species, in particular of the dunlin, Calidris alpina, by means of comparative sequence analysis of the most variable part of the mitochondrial DNA (mtDNA) genome. There are several reasons

  16. Anaplasma phagocytophilum in Danish sheep: confirmation by DNA sequencing

    Directory of Open Access Journals (Sweden)

    Thamsborg Stig M

    2009-12-01

    Full Text Available Abstract Background The presence of Anaplasma phagocytophilum, an Ixodes ricinus transmitted bacterium, was investigated in two flocks of Danish grazing lambs. Direct PCR detection was performed on DNA extracted from blood and serum with subsequent confirmation by DNA sequencing. Methods 31 samples obtained from clinically normal lambs in 2000 from Fussingø, Jutland and 12 samples from ten lambs and two ewes from a clinical outbreak at Feddet, Zealand in 2006 were included in the study. Some of the animals from Feddet had shown clinical signs of polyarthritis and general unthriftiness prior to sampling. DNA extraction was optimized from blood and serum and detection achieved by a 16S rRNA targeted PCR with verification of the product by DNA sequencing. Results Five DNA extracts were found positive by PCR, including two samples from 2000 and three from 2006. For both series of samples the product was verified as A. phagocytophilum by DNA sequencing. Conclusions A. phagocytophilum was detected by molecular methods for the first time in Danish grazing lambs during the two seasons investigated (2000 and 2006.

  17. Isolation of a sex-linked DNA sequence in cranes.

    Science.gov (United States)

    Duan, W; Fuerst, P A

    2001-01-01

    A female-specific DNA fragment (CSL-W; crane sex-linked DNA on W chromosome) was cloned from female whooping cranes (Grus americana). From the nucleotide sequence of CSL-W, a set of polymerase chain reaction (PCR) primers was identified which amplify a 227-230 bp female-specific fragment from all existing crane species and some other noncrane species. A duplicated versions of the DNA segment, which is found to have a larger size (231-235 bp) than CSL-W in both sexes, was also identified, and was designated CSL-NW (crane sex-linked DNA on non-W chromosome). The nucleotide similarity between the sequences of CSL-W and CSL-NW from whooping cranes was 86.3%. The CSL primers do not amplify any sequence from mammalian DNA, limiting the potential for contamination from human sources. Using the CSL primers in combination with a quick DNA extraction method allows the noninvasive identification of crane gender in less than 10 h. A test of the methodology was carried out on fully developed body feathers from 18 captive cranes and resulted in 100% successful identification.

  18. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  19. A specific family of interspersed repeats (SINEs facilitates meiotic synapsis in mammals

    Directory of Open Access Journals (Sweden)

    Johnson Matthew E

    2013-01-01

    Full Text Available Abstract Background Errors during meiosis that affect synapsis and recombination between homologous chromosomes contribute to aneuploidy and infertility in humans. Despite the clinical relevance of these defects, we know very little about the mechanisms by which homologous chromosomes interact with one another during mammalian meiotic prophase. Further, we remain ignorant of the way in which chromosomal DNA complexes with the meiosis-specific structure that tethers homologs, the synaptonemal complex (SC, and whether specific DNA elements are necessary for this interaction. Results In the present study we utilized chromatin immunoprecipitation (ChIP and DNA sequencing to demonstrate that the axial elements of the mammalian SC are markedly enriched for a specific family of interspersed repeats, short interspersed elements (SINEs. Further, we refine the role of the repeats to specific sub-families of SINEs, B1 in mouse and AluY in old world monkey (Macaca mulatta. Conclusions Because B1 and AluY elements are the most actively retrotransposing SINEs in mice and rhesus monkeys, respectively, our observations imply that they may serve a dual function in axial element binding; i.e., as the anchoring point for the SC but possibly also as a suppressor/regulator of retrotransposition.

  20. Methylation patterns of repetitive DNA sequences in germ cells of Mus musculus.

    OpenAIRE

    Sanford, J; Forrester, L; Chapman, V; Chandley, A; Hastie, N

    1984-01-01

    The major and the minor satellite sequences of Mus musculus were undermethylated in both sperm and oocyte DNAs relative to the amount of undermethylation observed in adult somatic tissue DNA. This hypomethylation was specific for satellite sequences in sperm DNA. Dispersed repetitive and low copy sequences show a high degree of methylation in sperm DNA; however, a dispersed repetitive sequence was undermethylated in oocyte DNA. This finding suggests a difference in the amount of total genomic...

  1. A 28,000 Years Old Cro-Magnon mtDNA Sequence Differs from All Potentially Contaminating Modern Sequences

    Science.gov (United States)

    Caramelli, David; Milani, Lucio; Vai, Stefania; Modi, Alessandra; Pecchioli, Elena; Girardi, Matteo; Pilli, Elena; Lari, Martina; Lippi, Barbara; Ronchitelli, Annamaria; Mallegni, Francesco; Casoli, Antonella; Bertorelle, Giorgio; Barbujani, Guido

    2008-01-01

    Background DNA sequences from ancient speciments may in fact result from undetected contamination of the ancient specimens by modern DNA, and the problem is particularly challenging in studies of human fossils. Doubts on the authenticity of the available sequences have so far hampered genetic comparisons between anatomically archaic (Neandertal) and early modern (Cro-Magnoid) Europeans. Methodology/Principal Findings We typed the mitochondrial DNA (mtDNA) hypervariable region I in a 28,000 years old Cro-Magnoid individual from the Paglicci cave, in Italy (Paglicci 23) and in all the people who had contact with the sample since its discovery in 2003. The Paglicci 23 sequence, determined through the analysis of 152 clones, is the Cambridge reference sequence, and cannot possibly reflect contamination because it differs from all potentially contaminating modern sequences. Conclusions/Significance: The Paglicci 23 individual carried a mtDNA sequence that is still common in Europe, and which radically differs from those of the almost contemporary Neandertals, demonstrating a genealogical continuity across 28,000 years, from Cro-Magnoid to modern Europeans. Because all potential sources of modern DNA contamination are known, the Paglicci 23 sample will offer a unique opportunity to get insight for the first time into the nuclear genes of early modern Europeans. PMID:18628960

  2. A 28,000 years old Cro-Magnon mtDNA sequence differs from all potentially contaminating modern sequences.

    Directory of Open Access Journals (Sweden)

    David Caramelli

    Full Text Available BACKGROUND: DNA sequences from ancient specimens may in fact result from undetected contamination of the ancient specimens by modern DNA, and the problem is particularly challenging in studies of human fossils. Doubts on the authenticity of the available sequences have so far hampered genetic comparisons between anatomically archaic (Neandertal and early modern (Cro-Magnoid Europeans. METHODOLOGY/PRINCIPAL FINDINGS: We typed the mitochondrial DNA (mtDNA hypervariable region I in a 28,000 years old Cro-Magnoid individual from the Paglicci cave, in Italy (Paglicci 23 and in all the people who had contact with the sample since its discovery in 2003. The Paglicci 23 sequence, determined through the analysis of 152 clones, is the Cambridge reference sequence, and cannot possibly reflect contamination because it differs from all potentially contaminating modern sequences. CONCLUSIONS/SIGNIFICANCE: The Paglicci 23 individual carried a mtDNA sequence that is still common in Europe, and which radically differs from those of the almost contemporary Neandertals, demonstrating a genealogical continuity across 28,000 years, from Cro-Magnoid to modern Europeans. Because all potential sources of modern DNA contamination are known, the Paglicci 23 sample will offer a unique opportunity to get insight for the first time into the nuclear genes of early modern Europeans.

  3. DNA replication stress restricts ribosomal DNA copy number.

    Science.gov (United States)

    Salim, Devika; Bradford, William D; Freeland, Amy; Cady, Gillian; Wang, Jianmin; Pruitt, Steven C; Gerton, Jennifer L

    2017-09-01

    Ribosomal RNAs (rRNAs) in budding yeast are encoded by ~100-200 repeats of a 9.1kb sequence arranged in tandem on chromosome XII, the ribosomal DNA (rDNA) locus. Copy number of rDNA repeat units in eukaryotic cells is maintained far in excess of the requirement for ribosome biogenesis. Despite the importance of the repeats for both ribosomal and non-ribosomal functions, it is currently not known how "normal" copy number is determined or maintained. To identify essential genes involved in the maintenance of rDNA copy number, we developed a droplet digital PCR based assay to measure rDNA copy number in yeast and used it to screen a yeast conditional temperature-sensitive mutant collection of essential genes. Our screen revealed that low rDNA copy number is associated with compromised DNA replication. Further, subculturing yeast under two separate conditions of DNA replication stress selected for a contraction of the rDNA array independent of the replication fork blocking protein, Fob1. Interestingly, cells with a contracted array grew better than their counterparts with normal copy number under conditions of DNA replication stress. Our data indicate that DNA replication stresses select for a smaller rDNA array. We speculate that this liberates scarce replication factors for use by the rest of the genome, which in turn helps cells complete DNA replication and continue to propagate. Interestingly, tumors from mini chromosome maintenance 2 (MCM2)-deficient mice also show a loss of rDNA repeats. Our data suggest that a reduction in rDNA copy number may indicate a history of DNA replication stress, and that rDNA array size could serve as a diagnostic marker for replication stress. Taken together, these data begin to suggest the selective pressures that combine to yield a "normal" rDNA copy number.

  4. DNA replication stress restricts ribosomal DNA copy number

    Science.gov (United States)

    Salim, Devika; Bradford, William D.; Freeland, Amy; Cady, Gillian; Wang, Jianmin

    2017-01-01

    Ribosomal RNAs (rRNAs) in budding yeast are encoded by ~100–200 repeats of a 9.1kb sequence arranged in tandem on chromosome XII, the ribosomal DNA (rDNA) locus. Copy number of rDNA repeat units in eukaryotic cells is maintained far in excess of the requirement for ribosome biogenesis. Despite the importance of the repeats for both ribosomal and non-ribosomal functions, it is currently not known how “normal” copy number is determined or maintained. To identify essential genes involved in the maintenance of rDNA copy number, we developed a droplet digital PCR based assay to measure rDNA copy number in yeast and used it to screen a yeast conditional temperature-sensitive mutant collection of essential genes. Our screen revealed that low rDNA copy number is associated with compromised DNA replication. Further, subculturing yeast under two separate conditions of DNA replication stress selected for a contraction of the rDNA array independent of the replication fork blocking protein, Fob1. Interestingly, cells with a contracted array grew better than their counterparts with normal copy number under conditions of DNA replication stress. Our data indicate that DNA replication stresses select for a smaller rDNA array. We speculate that this liberates scarce replication factors for use by the rest of the genome, which in turn helps cells complete DNA replication and continue to propagate. Interestingly, tumors from mini chromosome maintenance 2 (MCM2)-deficient mice also show a loss of rDNA repeats. Our data suggest that a reduction in rDNA copy number may indicate a history of DNA replication stress, and that rDNA array size could serve as a diagnostic marker for replication stress. Taken together, these data begin to suggest the selective pressures that combine to yield a “normal” rDNA copy number. PMID:28915237

  5. DNA replication stress restricts ribosomal DNA copy number.

    Directory of Open Access Journals (Sweden)

    Devika Salim

    2017-09-01

    Full Text Available Ribosomal RNAs (rRNAs in budding yeast are encoded by ~100-200 repeats of a 9.1kb sequence arranged in tandem on chromosome XII, the ribosomal DNA (rDNA locus. Copy number of rDNA repeat units in eukaryotic cells is maintained far in excess of the requirement for ribosome biogenesis. Despite the importance of the repeats for both ribosomal and non-ribosomal functions, it is currently not known how "normal" copy number is determined or maintained. To identify essential genes involved in the maintenance of rDNA copy number, we developed a droplet digital PCR based assay to measure rDNA copy number in yeast and used it to screen a yeast conditional temperature-sensitive mutant collection of essential genes. Our screen revealed that low rDNA copy number is associated with compromised DNA replication. Further, subculturing yeast under two separate conditions of DNA replication stress selected for a contraction of the rDNA array independent of the replication fork blocking protein, Fob1. Interestingly, cells with a contracted array grew better than their counterparts with normal copy number under conditions of DNA replication stress. Our data indicate that DNA replication stresses select for a smaller rDNA array. We speculate that this liberates scarce replication factors for use by the rest of the genome, which in turn helps cells complete DNA replication and continue to propagate. Interestingly, tumors from mini chromosome maintenance 2 (MCM2-deficient mice also show a loss of rDNA repeats. Our data suggest that a reduction in rDNA copy number may indicate a history of DNA replication stress, and that rDNA array size could serve as a diagnostic marker for replication stress. Taken together, these data begin to suggest the selective pressures that combine to yield a "normal" rDNA copy number.

  6. Mouse tetranectin: cDNA sequence, tissue-specific expression, and chromosomal mapping

    DEFF Research Database (Denmark)

    Ibaraki, K; Kozak, C A; Wewer, U M

    1995-01-01

    regulation, mouse tetranectin cDNA was cloned from a 16-day-old mouse embryo library. Sequence analysis revealed a 992-bp cDNA with an open reading frame of 606 bp, which is identical in length to the human tetranectin cDNA. The deduced amino acid sequence showed high homology to the human cDNA with 76......(s) of tetranectin. The sequence analysis revealed a difference in both sequence and size of the noncoding regions between mouse and human cDNAs. Northern analysis of the various tissues from mouse, rat, and cow showed the major transcript(s) to be approximately 1 kb, which is similar in size to that observed...

  7. Isolation and analysis of high quality nuclear DNA with reduced organellar DNA for plant genome sequencing and resequencing

    Directory of Open Access Journals (Sweden)

    Zdepski Anna

    2011-05-01

    Full Text Available Abstract Background High throughput sequencing (HTS technologies have revolutionized the field of genomics by drastically reducing the cost of sequencing, making it feasible for individual labs to sequence or resequence plant genomes. Obtaining high quality, high molecular weight DNA from plants poses significant challenges due to the high copy number of chloroplast and mitochondrial DNA, as well as high levels of phenolic compounds and polysaccharides. Multiple methods have been used to isolate DNA from plants; the CTAB method is commonly used to isolate total cellular DNA from plants that contain nuclear DNA, as well as chloroplast and mitochondrial DNA. Alternatively, DNA can be isolated from nuclei to minimize chloroplast and mitochondrial DNA contamination. Results We describe optimized protocols for isolation of nuclear DNA from eight different plant species encompassing both monocot and eudicot species. These protocols use nuclei isolation to minimize chloroplast and mitochondrial DNA contamination. We also developed a protocol to determine the number of chloroplast and mitochondrial DNA copies relative to the nuclear DNA using quantitative real time PCR (qPCR. We compared DNA isolated from nuclei to total cellular DNA isolated with the CTAB method. As expected, DNA isolated from nuclei consistently yielded nuclear DNA with fewer chloroplast and mitochondrial DNA copies, as compared to the total cellular DNA prepared with the CTAB method. This protocol will allow for analysis of the quality and quantity of nuclear DNA before starting a plant whole genome sequencing or resequencing experiment. Conclusions Extracting high quality, high molecular weight nuclear DNA in plants has the potential to be a bottleneck in the era of whole genome sequencing and resequencing. The methods that are described here provide a framework for researchers to extract and quantify nuclear DNA in multiple types of plants.

  8. Statistical assignment of DNA sequences using Bayesian phylogenetics

    DEFF Research Database (Denmark)

    Terkelsen, Kasper Munch; Boomsma, Wouter Krogh; Huelsenbeck, John P.

    2008-01-01

    We provide a new automated statistical method for DNA barcoding based on a Bayesian phylogenetic analysis. The method is based on automated database sequence retrieval, alignment, and phylogenetic analysis using a custom-built program for Bayesian phylogenetic analysis. We show on real data...... that the method outperforms Blast searches as a measure of confidence and can help eliminate 80% of all false assignment based on best Blast hit. However, the most important advance of the method is that it provides statistically meaningful measures of confidence. We apply the method to a re......-analysis of previously published ancient DNA data and show that, with high statistical confidence, most of the published sequences are in fact of Neanderthal origin. However, there are several cases of chimeric sequences that are comprised of a combination of both Neanderthal and modern human DNA....

  9. Intricate interactions between the bloom-forming cyanobacterium Microcystis aeruginosa and foreign genetic elements, revealed by diversified clustered regularly interspaced short palindromic repeat (CRISPR) signatures.

    Science.gov (United States)

    Kuno, Sotaro; Yoshida, Takashi; Kaneko, Takakazu; Sako, Yoshihiko

    2012-08-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) confer sequence-dependent, adaptive resistance in prokaryotes against viruses and plasmids via incorporation of short sequences, called spacers, derived from foreign genetic elements. CRISPR loci are thus considered to provide records of past infections. To describe the host-parasite (i.e., cyanophages and plasmids) interactions involving the bloom-forming freshwater cyanobacterium Microcystis aeruginosa, we investigated CRISPR in four M. aeruginosa strains and in two previously sequenced genomes. The number of spacers in each locus was larger than the average among prokaryotes. All spacers were strain specific, except for a string of 11 spacers shared in two closely related strains, suggesting diversification of the loci. Using CRISPR repeat-based PCR, 24 CRISPR genotypes were identified in a natural cyanobacterial community. Among 995 unique spacers obtained, only 10 sequences showed similarity to M. aeruginosa phage Ma-LMM01. Of these, six spacers showed only silent or conservative nucleotide mutations compared to Ma-LMM01 sequences, suggesting a strategy by the cyanophage to avert CRISPR immunity dependent on nucleotide identity. These results imply that host-phage interactions can be divided into M. aeruginosa-cyanophage combinations rather than pandemics of population-wide infectious cyanophages. Spacer similarity also showed frequent exposure of M. aeruginosa to small cryptic plasmids that were observed only in a few strains. Thus, the diversification of CRISPR implies that M. aeruginosa has been challenged by diverse communities (almost entirely uncharacterized) of cyanophages and plasmids.

  10. Sequence of a cDNA encoding turtle high mobility group 1 protein.

    Science.gov (United States)

    Zheng, Jifang; Hu, Bi; Wu, Duansheng

    2005-07-01

    In order to understand sequence information about turtle HMG1 gene, a cDNA encoding HMG1 protein of the Chinese soft-shell turtle (Pelodiscus sinensis) was amplified by RT-PCR from kidney total RNA, and was cloned, sequenced and analyzed. The results revealed that the open reading frame (ORF) of turtle HMG1 cDNA is 606 bp long. The ORF codifies 202 amino acid residues, from which two DNA-binding domains and one polyacidic region are derived. The DNA-binding domains share higher amino acid identity with homologues sequences of chicken (96.5%) and mammalian (74%) than homologues sequence of rainbow trout (67%). The polyacidic region shows 84.6% amino acid homology with the equivalent region of chicken HMG1 cDNA. Turtle HMG1 protein contains 3 Cys residues located at completely conserved positions. Conservation in sequence and structure suggests that the functions of turtle HMG1 cDNA may be highly conserved during evolution. To our knowledge, this is the first report of HMG1 cDNA sequence in any reptilian.

  11. Applications of Engineered DNA-Binding Molecules Such as TAL Proteins and the CRISPR/Cas System in Biology Research

    Directory of Open Access Journals (Sweden)

    Toshitsugu Fujita

    2015-09-01

    Full Text Available Engineered DNA-binding molecules such as transcription activator-like effector (TAL or TALE proteins and the clustered regularly interspaced short palindromic repeats (CRISPR and CRISPR-associated proteins (Cas (CRISPR/Cas system have been used extensively for genome editing in cells of various types and species. The sequence-specific DNA-binding activities of these engineered DNA-binding molecules can also be utilized for other purposes, such as transcriptional activation, transcriptional repression, chromatin modification, visualization of genomic regions, and isolation of chromatin in a locus-specific manner. In this review, we describe applications of these engineered DNA-binding molecules for biological purposes other than genome editing.

  12. Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing.

    Science.gov (United States)

    Hribová, Eva; Neumann, Pavel; Matsumoto, Takashi; Roux, Nicolas; Macas, Jirí; Dolezel, Jaroslav

    2010-09-16

    Bananas and plantains (Musa spp.) are grown in more than a hundred tropical and subtropical countries and provide staple food for hundreds of millions of people. They are seed-sterile crops propagated clonally and this makes them vulnerable to a rapid spread of devastating diseases and at the same time hampers breeding improved cultivars. Although the socio-economic importance of bananas and plantains cannot be overestimated, they remain outside the focus of major research programs. This slows down the study of nuclear genome and the development of molecular tools to facilitate banana improvement. In this work, we report on the first thorough characterization of the repeat component of the banana (M. acuminata cv. 'Calcutta 4') genome. Analysis of almost 100 Mb of sequence data (0.15× genome coverage) permitted partial sequence reconstruction and characterization of repetitive DNA, making up about 30% of the genome. The results showed that the banana repeats are predominantly made of various types of Ty1/copia and Ty3/gypsy retroelements representing 16 and 7% of the genome respectively. On the other hand, DNA transposons were found to be rare. In addition to new families of transposable elements, two new satellite repeats were discovered and found useful as cytogenetic markers. To help in banana sequence annotation, a specific Musa repeat database was created, and its utility was demonstrated by analyzing the repeat composition of 62 genomic BAC clones. A low-depth 454 sequencing of banana nuclear genome provided the largest amount of DNA sequence data available until now for Musa and permitted reconstruction of most of the major types of DNA repeats. The information obtained in this study improves the knowledge of the long-range organization of banana chromosomes, and provides sequence resources needed for repeat masking and annotation during the Musa genome sequencing project. It also provides sequence data for isolation of DNA markers to be used in genetic

  13. Short Interspersed Nuclear Element (SINE Sequences in the Genome of the Human Pathogenic Fungus Aspergillus fumigatus Af293.

    Directory of Open Access Journals (Sweden)

    Lakkhana Kanhayuwa

    Full Text Available Novel families of short interspersed nuclear element (SINE sequences in the human pathogenic fungus Aspergillus fumigatus, clinical isolate Af293, were identified and categorised into tRNA-related and 5S rRNA-related SINEs. Eight predicted tRNA-related SINE families originating from different tRNAs, and nominated as AfuSINE2 sequences, contained target site duplications of short direct repeat sequences (4-14 bp flanking the elements, an extended tRNA-unrelated region and typical features of RNA polymerase III promoter sequences. The elements ranged in size from 140-493 bp and were present in low copy number in the genome and five out of eight were actively transcribed. One putative tRNAArg-derived sequence, AfuSINE2-1a possessed a unique feature of repeated trinucleotide ACT residues at its 3'-terminus. This element was similar in sequence to the I-4_AO element found in A. oryzae and an I-1_AF long nuclear interspersed element-like sequence identified in A. fumigatus Af293. Families of 5S rRNA-related SINE sequences, nominated as AfuSINE3, were also identified and their 5'-5S rRNA-related regions show 50-65% and 60-75% similarity to respectively A. fumigatus 5S rRNAs and SINE3-1_AO found in A. oryzae. A. fumigatus Af293 contains five copies of AfuSINE3 sequences ranging in size from 259-343 bp and two out of five AfuSINE3 sequences were actively transcribed. Investigations on AfuSINE distribution in the fungal genome revealed that the elements are enriched in pericentromeric and subtelomeric regions and inserted within gene-rich regions. We also demonstrated that some, but not all, AfuSINE sequences are targeted by host RNA silencing mechanisms. Finally, we demonstrated that infection of the fungus with mycoviruses had no apparent effects on SINE activity.

  14. Phylogenomics of Phrynosomatid Lizards: Conflicting Signals from Sequence Capture versus Restriction Site Associated DNA Sequencing

    Science.gov (United States)

    Leaché, Adam D.; Chavez, Andreas S.; Jones, Leonard N.; Grummer, Jared A.; Gottscho, Andrew D.; Linkem, Charles W.

    2015-01-01

    Sequence capture and restriction site associated DNA sequencing (RADseq) are popular methods for obtaining large numbers of loci for phylogenetic analysis. These methods are typically used to collect data at different evolutionary timescales; sequence capture is primarily used for obtaining conserved loci, whereas RADseq is designed for discovering single nucleotide polymorphisms (SNPs) suitable for population genetic or phylogeographic analyses. Phylogenetic questions that span both “recent” and “deep” timescales could benefit from either type of data, but studies that directly compare the two approaches are lacking. We compared phylogenies estimated from sequence capture and double digest RADseq (ddRADseq) data for North American phrynosomatid lizards, a species-rich and diverse group containing nine genera that began diversifying approximately 55 Ma. Sequence capture resulted in 584 loci that provided a consistent and strong phylogeny using concatenation and species tree inference. However, the phylogeny estimated from the ddRADseq data was sensitive to the bioinformatics steps used for determining homology, detecting paralogs, and filtering missing data. The topological conflicts among the SNP trees were not restricted to any particular timescale, but instead were associated with short internal branches. Species tree analysis of the largest SNP assembly, which also included the most missing data, supported a topology that matched the sequence capture tree. This preferred phylogeny provides strong support for the paraphyly of the earless lizard genera Holbrookia and Cophosaurus, suggesting that the earless morphology either evolved twice or evolved once and was subsequently lost in Callisaurus. PMID:25663487

  15. Dialects of the DNA Uptake Sequence in Neisseriaceae

    Science.gov (United States)

    Frye, Stephan A.; Nilsen, Mariann; Tønjum, Tone; Ambur, Ole Herman

    2013-01-01

    In all sexual organisms, adaptations exist that secure the safe reassortment of homologous alleles and prevent the intrusion of potentially hazardous alien DNA. Some bacteria engage in a simple form of sex known as transformation. In the human pathogen Neisseria meningitidis and in related bacterial species, transformation by exogenous DNA is regulated by the presence of a specific DNA Uptake Sequence (DUS), which is present in thousands of copies in the respective genomes. DUS affects transformation by limiting DNA uptake and recombination in favour of homologous DNA. The specific mechanisms of DUS–dependent genetic transformation have remained elusive. Bioinformatic analyses of family Neisseriaceae genomes reveal eight distinct variants of DUS. These variants are here termed DUS dialects, and their effect on interspecies commutation is demonstrated. Each of the DUS dialects is remarkably conserved within each species and is distributed consistent with a robust Neisseriaceae phylogeny based on core genome sequences. The impact of individual single nucleotide transversions in DUS on meningococcal transformation and on DNA binding and uptake is analysed. The results show that a DUS core 5′-CTG-3′ is required for transformation and that transversions in this core reduce DNA uptake more than two orders of magnitude although the level of DNA binding remains less affected. Distinct DUS dialects are efficient barriers to interspecies recombination in N. meningitidis, N. elongata, Kingella denitrificans, and Eikenella corrodens, despite the presence of the core sequence. The degree of similarity between the DUS dialect of the recipient species and the donor DNA directly correlates with the level of transformation and DNA binding and uptake. Finally, DUS–dependent transformation is documented in the genera Eikenella and Kingella for the first time. The results presented here advance our understanding of the function and evolution of DUS and genetic transformation

  16. Dialects of the DNA uptake sequence in Neisseriaceae.

    Directory of Open Access Journals (Sweden)

    Stephan A Frye

    2013-04-01

    Full Text Available In all sexual organisms, adaptations exist that secure the safe reassortment of homologous alleles and prevent the intrusion of potentially hazardous alien DNA. Some bacteria engage in a simple form of sex known as transformation. In the human pathogen Neisseria meningitidis and in related bacterial species, transformation by exogenous DNA is regulated by the presence of a specific DNA Uptake Sequence (DUS, which is present in thousands of copies in the respective genomes. DUS affects transformation by limiting DNA uptake and recombination in favour of homologous DNA. The specific mechanisms of DUS-dependent genetic transformation have remained elusive. Bioinformatic analyses of family Neisseriaceae genomes reveal eight distinct variants of DUS. These variants are here termed DUS dialects, and their effect on interspecies commutation is demonstrated. Each of the DUS dialects is remarkably conserved within each species and is distributed consistent with a robust Neisseriaceae phylogeny based on core genome sequences. The impact of individual single nucleotide transversions in DUS on meningococcal transformation and on DNA binding and uptake is analysed. The results show that a DUS core 5'-CTG-3' is required for transformation and that transversions in this core reduce DNA uptake more than two orders of magnitude although the level of DNA binding remains less affected. Distinct DUS dialects are efficient barriers to interspecies recombination in N. meningitidis, N. elongata, Kingella denitrificans, and Eikenella corrodens, despite the presence of the core sequence. The degree of similarity between the DUS dialect of the recipient species and the donor DNA directly correlates with the level of transformation and DNA binding and uptake. Finally, DUS-dependent transformation is documented in the genera Eikenella and Kingella for the first time. The results presented here advance our understanding of the function and evolution of DUS and genetic

  17. SAAS: Short Amino Acid Sequence - A Promising Protein Secondary Structure Prediction Method of Single Sequence

    Directory of Open Access Journals (Sweden)

    Zhou Yuan Wu

    2013-07-01

    Full Text Available In statistical methods of predicting protein secondary structure, many researchers focus on single amino acid frequencies in α-helices, β-sheets, and so on, or the impact near amino acids on an amino acid forming a secondary structure. But the paper considers a short sequence of amino acids (3, 4, 5 or 6 amino acids as integer, and statistics short sequence's probability forming secondary structure. Also, many researchers select low homologous sequences as statistical database. But this paper select whole PDB database. In this paper we propose a strategy to predict protein secondary structure using simple statistical method. Numerical computation shows that, short amino acids sequence as integer to statistics, which can easy see trend of short sequence forming secondary structure, and it will work well to select large statistical database (whole PDB database without considering homologous, and Q3 accuracy is ca. 74% using this paper proposed simple statistical method, but accuracy of others statistical methods is less than 70%.

  18. Cloning, sequencing, and expression of cDNA for human β-glucuronidase

    International Nuclear Information System (INIS)

    Oshima, A.; Kyle, J.W.; Miller, R.D.

    1987-01-01

    The authors report here the cDNA sequence for human placental β-glucuronidase (β-D-glucuronoside glucuronosohydrolase, EC 3.2.1.31) and demonstrate expression of the human enzyme in transfected COS cells. They also sequenced a partial cDNA clone from human fibroblasts that contained a 153-base-pair deletion within the coding sequence and found a second type of cDNA clone from placenta that contained the same deletion. Nuclease S1 mapping studies demonstrated two types of mRNAs in human placenta that corresponded to the two types of cDNA clones isolated. The NH 2 -terminal amino acid sequence determined for human spleen β-glucuronidase agreed with that inferred from the DNA sequence of the two placental clones, beginning at amino acid 23, suggesting a cleaved signal sequence of 22 amino acids. When transfected into COS cells, plasmids containing either placental clone expressed an immunoprecipitable protein that contained N-linked oligosaccharides as evidenced by sensitivity to endoglycosidase F. However, only transfection with the clone containing the 153-base-pair segment led to expression of human β-glucuronidase activity. These studies provide the sequence for the full-length cDNA for human β-glucuronidase, demonstrate the existence of two populations of mRNA for β-glucuronidase in human placenta, only one of which specifies a catalytically active enzyme, and illustrate the importance of expression studies in verifying that a cDNA is functionally full-length

  19. Unique CCT repeats mediate transcription of the TWIST1 gene in mesenchymal cell lines

    International Nuclear Information System (INIS)

    Ohkuma, Mizue; Funato, Noriko; Higashihori, Norihisa; Murakami, Masanori; Ohyama, Kimie; Nakamura, Masataka

    2007-01-01

    TWIST1, a basic helix-loop-helix transcription factor, plays critical roles in embryo development, cancer metastasis and mesenchymal progenitor differentiation. Little is known about transcriptional regulation of TWIST1 expression. Here we identified DNA sequences responsible for TWIST1 expression in mesenchymal lineage cell lines. Reporter assays with TWIST1 promoter mutants defined the -102 to -74 sequences that are essential for TWIST1 expression in human and mouse mesenchymal cell lines. Tandem repeats of CCT, but not putative CREB and NF-κB sites in the sequences substantially supported activity of the TWIST1 promoter. Electrophoretic mobility shift assay demonstrated that the DNA sequences with the CCT repeats formed complexes with nuclear factors, containing, at least, Sp1 and Sp3. These results suggest critical implication of the CCT repeats in association with Sp1 and Sp3 factors in sustaining expression of the TWIST1 gene in mesenchymal cells

  20. Comparison of cDNA-derived protein sequences of the human fibronectin and vitronectin receptor α-subunits and platelet glycoprotein IIb

    International Nuclear Information System (INIS)

    Fitzgerald, L.A.; Poncz, M.; Steiner, B.; Rall, S.C. Jr.; Bennett, J.S.; Phillips, D.R.

    1987-01-01

    The fibronectin receptor (FnR), the vitronectin receptor (VnR), and the platelet membrane glycoprotein (GP) IIb-IIIa complex are members of a family of cell adhesion receptors, which consist of noncovalently associated α- and β-subunits. The present study was designed to compare the cDNA-derived protein sequences of the α-subunits of human FnR, VnR, and platelet GP IIb. cDNA clones for the α-subunit of the FnR (FnR/sub α/) were obtained from a human umbilical vein endothelial (HUVE) cell library by using an oligonucleotide probe designed from a peptide sequence of platelet GP IIb. cDNA clones for platelet GP IIb were isolated from a cDNA expression library of human erythroleukemia cells by using antibodies. cDNA clones of the VnR α-subunit (VnR/sub α/) were obtained from the HUVE cell library by using an oligonucleotide probe from the partial cDNA sequence for the VnR/sub α/. Translation of these sequences showed that the FNR/sub α/, the VnR/sub α/, and GP IIb are composed of disulfide-linked large (858-871 amino acids) and small (137-158 amino acids) chains that are posttranslationally processed from a single mRNA. A single hydrophobic segment located near the carboxyl terminus of each small chain appears to be a transmembrane domain. The large chains appear to be entirely extracellular, and each contains four repeated putative Ca 2+ -binding domains of about 30 amino acids that have sequence similarities to other Ca 2+ -binding proteins. The identity among the protein sequences of the three receptor α-subunits ranges from 36.1% to 44.5%, with the Ca 2+ -binding domains having the greatest homology. These proteins apparently evolved by a process of gene duplication

  1. Isolation of human simple repeat loci by hybridization selection.

    Science.gov (United States)

    Armour, J A; Neumann, R; Gobert, S; Jeffreys, A J

    1994-04-01

    We have isolated short tandem repeat arrays from the human genome, using a rapid method involving filter hybridization to enrich for tri- or tetranucleotide tandem repeats. About 30% of clones from the enriched library cross-hybridize with probes containing trimeric or tetrameric tandem arrays, facilitating the rapid isolation of large numbers of clones. In an initial analysis of 54 clones, 46 different tandem arrays were identified. Analysis of these tandem repeat loci by PCR showed that 24 were polymorphic in length; substantially higher levels of polymorphism were displayed by the tetrameric repeat loci isolated than by the trimeric repeats. Primary mapping of these loci by linkage analysis showed that they derive from 17 chromosomes, including the X chromosome. We anticipate the use of this strategy for the efficient isolation of tandem repeats from other sources of genomic DNA, including DNA from flow-sorted chromosomes, and from other species.

  2. Obesity-induced sperm DNA methylation changes at satellite repeats are reprogrammed in rat offspring

    Directory of Open Access Journals (Sweden)

    Neil A Youngson

    2016-01-01

    Full Text Available There is now strong evidence that the paternal contribution to offspring phenotype at fertilisation is more than just DNA. However, the identity and mechanisms of this nongenetic inheritance are poorly understood. One of the more important questions in this research area is: do changes in sperm DNA methylation have phenotypic consequences for offspring? We have previously reported that offspring of obese male rats have altered glucose metabolism compared with controls and that this effect was inherited through nongenetic means. Here, we describe investigations into sperm DNA methylation in a new cohort using the same protocol. Male rats on a high-fat diet were 30% heavier than control-fed males at the time of mating (16-19 weeks old, n = 14/14. A small (0.25% increase in total 5-methyl-2Ͳ-deoxycytidine was detected in obese rat spermatozoa by liquid chromatography tandem mass spectrometry. Examination of the repetitive fraction of the genome with methyl-CpG binding domain protein-enriched genome sequencing (MBD-Seq and pyrosequencing revealed that retrotransposon DNA methylation states in spermatozoa were not affected by obesity, but methylation at satellite repeats throughout the genome was increased. However, examination of muscle, liver, and spermatozoa from male 27-week-old offspring from obese and control fathers (both groups from n = 8 fathers revealed that normal DNA methylation levels were restored during offspring development. Furthermore, no changes were found in three genomic imprints in obese rat spermatozoa. Our findings have implications for transgenerational epigenetic reprogramming. They suggest that postfertilization mechanisms exist for normalising some environmentally-induced DNA methylation changes in sperm cells.

  3. Twisting right to left: A…A mismatch in a CAG trinucleotide repeat overexpansion provokes left-handed Z-DNA conformation.

    Directory of Open Access Journals (Sweden)

    Noorain Khan

    2015-04-01

    Full Text Available Conformational polymorphism of DNA is a major causative factor behind several incurable trinucleotide repeat expansion disorders that arise from overexpansion of trinucleotide repeats located in coding/non-coding regions of specific genes. Hairpin DNA structures that are formed due to overexpansion of CAG repeat lead to Huntington's disorder and spinocerebellar ataxias. Nonetheless, DNA hairpin stem structure that generally embraces B-form with canonical base pairs is poorly understood in the context of periodic noncanonical A…A mismatch as found in CAG repeat overexpansion. Molecular dynamics simulations on DNA hairpin stems containing A…A mismatches in a CAG repeat overexpansion show that A…A dictates local Z-form irrespective of starting glycosyl conformation, in sharp contrast to canonical DNA duplex. Transition from B-to-Z is due to the mechanistic effect that originates from its pronounced nonisostericity with flanking canonical base pairs facilitated by base extrusion, backbone and/or base flipping. Based on these structural insights we envisage that such an unusual DNA structure of the CAG hairpin stem may have a role in disease pathogenesis. As this is the first study that delineates the influence of a single A…A mismatch in reversing DNA helicity, it would further have an impact on understanding DNA mismatch repair.

  4. Triplet repeat DNA structures and human genetic disease: dynamic ...

    Indian Academy of Sciences (India)

    Unknown

    formed at the loop-outs. [Sinden R R, Potaman V N, Oussatcheva E A, Pearson C E, Lyubchenko Y L and Shlyakhtenko L S 2002 Triplet repeat DNA structures .... 36–39. 40–121 Huntingtin/polyglutamine expansion. Spinocerebellar ataxia 1. SCA1. 6p23. (CAG)n. 6–44. –. 39–82 (pure) Ataxin-1/polyglutamine expansion.

  5. Network clustering coefficient approach to DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

    2006-05-15

    In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.

  6. Mapping Base Modifications in DNA by Transverse-Current Sequencing

    Science.gov (United States)

    Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.

    2018-02-01

    Sequencing DNA modifications and lesions, such as methylation of cytosine and oxidation of guanine, is even more important and challenging than sequencing the genome itself. The traditional methods for detecting DNA modifications are either insensitive to these modifications or require additional processing steps to identify a particular type of modification. Transverse-current sequencing in nanopores can potentially identify the canonical bases and base modifications in the same run. In this work, we demonstrate that the most common DNA epigenetic modifications and lesions can be detected with any predefined accuracy based on their tunneling current signature. Our results are based on simulations of the nanopore tunneling current through DNA molecules, calculated using nonequilibrium electron-transport methodology within an effective multiorbital model derived from first-principles calculations, followed by a base-calling algorithm accounting for neighbor current-current correlations. This methodology can be integrated with existing experimental techniques to improve base-calling fidelity.

  7. Survey and analysis of simple sequence repeats in the Laccaria bicolor genome, with development of microsatellite markers

    Energy Technology Data Exchange (ETDEWEB)

    Labbe, Jessy L [ORNL; Murat, Claude [INRA, Nancy, France; Morin, Emmanuelle [INRA, Nancy, France; Le Tacon, F [UMR, France; Martin, Francis [INRA, Nancy, France

    2011-01-01

    It is becoming clear that simple sequence repeats (SSRs) play a significant role in fungal genome organization, and they are a large source of genetic markers for population genetics and meiotic maps. We identified SSRs in the Laccaria bicolor genome by in silico survey and analyzed their distribution in the different genomic regions. We also compared the abundance and distribution of SSRs in L. bicolor with those of the following fungal genomes: Phanerochaete chrysosporium, Coprinopsis cinerea, Ustilago maydis, Cryptococcus neoformans, Aspergillus nidulans, Magnaporthe grisea, Neurospora crassa and Saccharomyces cerevisiae. Using the MISA computer program, we detected 277,062 SSRs in the L. bicolor genome representing 8% of the assembled genomic sequence. Among the analyzed basidiomycetes, L. bicolor exhibited the highest SSR density although no correlation between relative abundance and the genome sizes was observed. In most genomes the short motifs (mono- to trinucleotides) were more abundant than the longer repeated SSRs. Generally, in each organism, the occurrence, relative abundance, and relative density of SSRs decreased as the repeat unit increased. Furthermore, each organism had its own common and longest SSRs. In the L. bicolor genome, most of the SSRs were located in intergenic regions (73.3%) and the highest SSR density was observed in transposable elements (TEs; 6,706 SSRs/Mb). However, 81% of the protein-coding genes contained SSRs in their exons, suggesting that SSR polymorphism may alter gene phenotypes. Within a L. bicolor offspring, sequence polymorphism of 78 SSRs was mainly detected in non-TE intergenic regions. Unlike previously developed microsatellite markers, these new ones are spread throughout the genome; these markers could have immediate applications in population genetics.

  8. Non-radioactive detection of trinucleotide repeat size variability.

    Science.gov (United States)

    Tomé, Stéphanie; Nicole, Annie; Gomes-Pereira, Mario; Gourdon, Genevieve

    2014-03-06

    Many human diseases are associated with the abnormal expansion of unstable trinucleotide repeat sequences. The mechanisms of trinucleotide repeat size mutation have not been fully dissected, and their understanding must be grounded on the detailed analysis of repeat size distributions in human tissues and animal models. Small-pool PCR (SP-PCR) is a robust, highly sensitive and efficient PCR-based approach to assess the levels of repeat size variation, providing both quantitative and qualitative data. The method relies on the amplification of a very low number of DNA molecules, through sucessive dilution of a stock genomic DNA solution. Radioactive Southern blot hybridization is sensitive enough to detect SP-PCR products derived from single template molecules, separated by agarose gel electrophoresis and transferred onto DNA membranes. We describe a variation of the detection method that uses digoxigenin-labelled locked nucleic acid probes. This protocol keeps the sensitivity of the original method, while eliminating the health risks associated with the manipulation of radiolabelled probes, and the burden associated with their regulation, manipulation and waste disposal.

  9. Characterization of Satellite DNA Sequences from the Commercially Important Marine Rotifers Brachionus rotundiformis and Brachionus plicatilis.

    Science.gov (United States)

    Boehm; Gibson; Lubzens

    2000-01-01

    This study was initiated to search for species-specific and strain-specific satellite DNA sequences for which oligonucleotide primers could be designed to differentiate between various commercially important strains of the marine monogonont rotifers Brachionus rotundiformis and Brachionus plicatilis. Two unrelated, highly reiterated satellite sequences were cloned and characterized. The eight sequenced monomers from B. rotundiformis and six from B. plicatilis had low intrarepeat variability and were similar in their overall lengths, A + T compositions, and high degrees of repeated motif substructure. However, hybridizations to 19 representative strains, sequence characterizations, and GenBank searches indicated that these two satellites are morphotype-specific and population-specific, respectively, and share little homology to each other or to other characterized sequences in the database. Primer pairs designed for the B. rotundiformis satellite confirmed hybridization specificities on polymerase chain reaction and could serve as a useful molecular diagnostic tool to identify strains belonging to the SS morphotype, which are gaining widespread usage as first feeds for marine fish in commercial production.

  10. Automated extraction of DNA from clothing

    DEFF Research Database (Denmark)

    Stangegaard, Michael; Hjort, Benjamin Benn; Nøhr Hansen, Thomas

    2011-01-01

    Presence of PCR inhibitors in extracted DNA may interfere with the subsequent quantification and short tandem repeat (STR) reactions used in forensic genetic DNA typing. We have compared three automated DNA extraction methods based on magnetic beads with a manual method with the aim of reducing...

  11. Repeat Sequence Proteins as Matrices for Nanocomposites

    Energy Technology Data Exchange (ETDEWEB)

    Drummy, L.; Koerner, H; Phillips, D; McAuliffe, J; Kumar, M; Farmer, B; Vaia, R; Naik, R

    2009-01-01

    Recombinant protein-inorganic nanocomposites comprised of exfoliated Na+ montmorillonite (MMT) in a recombinant protein matrix based on silk-like and elastin-like amino acid motifs (silk elastin-like protein (SELP)) were formed via a solution blending process. Charged residues along the protein backbone are shown to dominate long-range interactions, whereas the SELP repeat sequence leads to local protein/MMT compatibility. Up to a 50% increase in room temperature modulus and a comparable decrease in high temperature coefficient of thermal expansion occur for cast films containing 2-10 wt.% MMT.

  12. Evaluation of Mammalian Interspersed Repeats to investigate the goat genome

    Directory of Open Access Journals (Sweden)

    P. Mariani

    2010-01-01

    Full Text Available Among the repeated sequences present in most eukaryotic genomes, SINEs (Short Interspersed Nuclear Elements are widely used to investigate evolution in the mammalian order (Buchanan et al., 1999. One family of these repetitive sequences, the MIR (Mammalian Interspersed Repeats; Jurka et al., 1995, is ubiquitous in all mammals.MIR elements are tRNA-derived SINEs and are identifiable by a conserved core region of about 70 nucleotides.

  13. DNA sequences from the quagga, an extinct member of the horse family.

    Science.gov (United States)

    Higuchi, R; Bowman, B; Freiberger, M; Ryder, O A; Wilson, A C

    To determine whether DNA survives and can be recovered from the remains of extinct creatures, we have examined dried muscle from a museum specimen of the quagga, a zebra-like species (Equus quagga) that became extinct in 1883 (ref. 1). We report that DNA was extracted from this tissue in amounts approaching 1% of that expected from fresh muscle, and that the DNA was of relatively low molecular weight. Among the many clones obtained from the quagga DNA, two containing pieces of mitochondrial DNA (mtDNA) were sequenced. These sequences, comprising 229 nucleotide pairs, differ by 12 base substitutions from the corresponding sequences of mtDNA from a mountain zebra, an extant member of the genus Equus. The number, nature and locations of the substitutions imply that there has been little or no postmortem modification of the quagga DNA sequences, and that the two species had a common ancestor 3-4 Myr ago, consistent with fossil evidence concerning the age of the genus Equus.

  14. High Throughput Sample Preparation and Analysis for DNA Sequencing, PCR and Combinatorial Screening of Catalysis Based on Capillary Array Technique

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Yonghua [Iowa State Univ., Ames, IA (United States)

    2000-01-01

    Sample preparation has been one of the major bottlenecks for many high throughput analyses. The purpose of this research was to develop new sample preparation and integration approach for DNA sequencing, PCR based DNA analysis and combinatorial screening of homogeneous catalysis based on multiplexed capillary electrophoresis with laser induced fluorescence or imaging UV absorption detection. The author first introduced a method to integrate the front-end tasks to DNA capillary-array sequencers. protocols for directly sequencing the plasmids from a single bacterial colony in fused-silica capillaries were developed. After the colony was picked, lysis was accomplished in situ in the plastic sample tube using either a thermocycler or heating block. Upon heating, the plasmids were released while chromsomal DNA and membrane proteins were denatured and precipitated to the bottom of the tube. After adding enzyme and Sanger reagents, the resulting solution was aspirated into the reaction capillaries by a syringe pump, and cycle sequencing was initiated. No deleterious effect upon the reaction efficiency, the on-line purification system, or the capillary electrophoresis separation was observed, even though the crude lysate was used as the template. Multiplexed on-line DNA sequencing data from 8 parallel channels allowed base calling up to 620 bp with an accuracy of 98%. The entire system can be automatically regenerated for repeated operation. For PCR based DNA analysis, they demonstrated that capillary electrophoresis with UV detection can be used for DNA analysis starting from clinical sample without purification. After PCR reaction using cheek cell, blood or HIV-1 gag DNA, the reaction mixtures was injected into the capillary either on-line or off-line by base stacking. The protocol was also applied to capillary array electrophoresis. The use of cheaper detection, and the elimination of purification of DNA sample before or after PCR reaction, will make this approach an

  15. Substrate sequence selectivity of APOBEC3A implicates intra-DNA interactions.

    Science.gov (United States)

    Silvas, Tania V; Hou, Shurong; Myint, Wazo; Nalivaika, Ellen; Somasundaran, Mohan; Kelch, Brian A; Matsuo, Hiroshi; Kurt Yilmaz, Nese; Schiffer, Celia A

    2018-05-14

    The APOBEC3 (A3) family of human cytidine deaminases is renowned for providing a first line of defense against many exogenous and endogenous retroviruses. However, the ability of these proteins to deaminate deoxycytidines in ssDNA makes A3s a double-edged sword. When overexpressed, A3s can mutate endogenous genomic DNA resulting in a variety of cancers. Although the sequence context for mutating DNA varies among A3s, the mechanism for substrate sequence specificity is not well understood. To characterize substrate specificity of A3A, a systematic approach was used to quantify the affinity for substrate as a function of sequence context, length, secondary structure, and solution pH. We identified the A3A ssDNA binding motif as (T/C)TC(A/G), which correlated with enzymatic activity. We also validated that A3A binds RNA in a sequence specific manner. A3A bound tighter to substrate binding motif within a hairpin loop compared to linear oligonucleotide, suggesting A3A affinity is modulated by substrate structure. Based on these findings and previously published A3A-ssDNA co-crystal structures, we propose a new model with intra-DNA interactions for the molecular mechanism underlying A3A sequence preference. Overall, the sequence and structural preferences identified for A3A leads to a new paradigm for identifying A3A's involvement in mutation of endogenous or exogenous DNA.

  16. DNA cross-linking by dehydromonocrotaline lacks apparent base sequence preference.

    Science.gov (United States)

    Rieben, W Kurt; Coulombe, Roger A

    2004-12-01

    Pyrrolizidine alkaloids (PAs) are ubiquitous plant toxins, many of which, upon oxidation by hepatic mixed-function oxidases, become reactive bifunctional pyrrolic electrophiles that form DNA-DNA and DNA-protein cross-links. The anti-mitotic, toxic, and carcinogenic action of PAs is thought to be caused, at least in part, by these cross-links. We wished to determine whether the activated PA pyrrole dehydromonocrotaline (DHMO) exhibits base sequence preferences when cross-linked to a set of model duplex poly A-T 14-mer oligonucleotides with varying internal and/or end 5'-d(CG), 5'-d(GC), 5'-d(TA), 5'-d(CGCG), or 5'-d(GCGC) sequences. DHMO-DNA cross-links were assessed by electrophoretic mobility shift assay (EMSA) of 32P endlabeled oligonucleotides and by HPLC analysis of cross-linked DNAs enzymatically digested to their constituent deoxynucleosides. The degree of DNA cross-links depended upon the concentration of the pyrrole, but not on the base sequence of the oligonucleotide target. Likewise, HPLC chromatograms of cross-linked and digested DNAs showed no discernible sequence preference for any nucleotide. Added glutathione, tyrosine, cysteine, and aspartic acid, but not phenylalanine, threonine, serine, lysine, or methionine competed with DNA as alternate nucleophiles for cross-linking by DHMO. From these data it appears that DHMO exhibits no strong base preference when forming cross-links with DNA, and that some cellular nucleophiles can inhibit DNA cross-link formation.

  17. [Whole Genome Sequencing of Human mtDNA Based on Ion Torrent PGM™ Platform].

    Science.gov (United States)

    Cao, Y; Zou, K N; Huang, J P; Ma, K; Ping, Y

    2017-08-01

    To analyze and detect the whole genome sequence of human mitochondrial DNA (mtDNA) by Ion Torrent PGM™ platform and to study the differences of mtDNA sequence in different tissues. Samples were collected from 6 unrelated individuals by forensic postmortem examination, including chest blood, hair, costicartilage, nail, skeletal muscle and oral epithelium. Amplification of whole genome sequence of mtDNA was performed by 4 pairs of primer. Libraries were constructed with Ion Shear™ Plus Reagents kit and Ion Plus Fragment Library kit. Whole genome sequencing of mtDNA was performed using Ion Torrent PGM™ platform. Sanger sequencing was used to determine the heteroplasmy positions and the mutation positions on HVⅠ region. The whole genome sequence of mtDNA from all samples were amplified successfully. Six unrelated individuals belonged to 6 different haplotypes. Different tissues in one individual had heteroplasmy difference. The heteroplasmy positions and the mutation positions on HVⅠ region were verified by Sanger sequencing. After a consistency check by the Kappa method, it was found that the results of mtDNA sequence had a high consistency in different tissues. The testing method used in present study for sequencing the whole genome sequence of human mtDNA can detect the heteroplasmy difference in different tissues, which have good consistency. The results provide guidance for the further applications of mtDNA in forensic science. Copyright© by the Editorial Department of Journal of Forensic Medicine

  18. Exposing Students to Repeat Photography: Increasing Cultural Understanding on a Short-Term Study Abroad

    Science.gov (United States)

    Lemmons, Kelly K.; Brannstrom, Christian; Hurd, Danielle

    2014-01-01

    Traditionally, repeat photography has been used to analyze land cover change. This paper describes how repeat photography may be used as a tool to enhance the short-term study abroad experience by facilitating cultural interaction and understanding. We present evidence from two cases and suggest a five-step repeat photography method for educators…

  19. Structural and biochemical analysis of nuclease domain of clustered regularly interspaced short palindromic repeat (CRISPR)-associated protein 3 (Cas3).

    Science.gov (United States)

    Mulepati, Sabin; Bailey, Scott

    2011-09-09

    RNA transcribed from clustered regularly interspaced short palindromic repeats (CRISPRs) protects many prokaryotes from invasion by foreign DNA such as viruses, conjugative plasmids, and transposable elements. Cas3 (CRISPR-associated protein 3) is essential for this CRISPR protection and is thought to mediate cleavage of the foreign DNA through its N-terminal histidine-aspartate (HD) domain. We report here the 1.8 Å crystal structure of the HD domain of Cas3 from Thermus thermophilus HB8. Structural and biochemical studies predict that this enzyme binds two metal ions at its active site. We also demonstrate that the single-stranded DNA endonuclease activity of this T. thermophilus domain is activated not by magnesium but by transition metal ions such as manganese and nickel. Structure-guided mutagenesis confirms the importance of the metal-binding residues for the nuclease activity and identifies other active site residues. Overall, these results provide a framework for understanding the role of Cas3 in the CRISPR system.

  20. DNA sequence responsible for the amplification of adjacent genes.

    Science.gov (United States)

    Pasion, S G; Hartigan, J A; Kumar, V; Biswas, D K

    1987-10-01

    A 10.3-kb DNA fragment in the 5'-flanking region of the rat prolactin (rPRL) gene was isolated from F1BGH(1)2C1, a strain of rat pituitary tumor cells (GH cells) that produces prolactin in response to 5-bromodeoxyuridine (BrdU). Following transfection and integration into genomic DNA of recipient mouse L cells, this DNA induced amplification of the adjacent thymidine kinase gene from Herpes simplex virus type 1 (HSV1TK). We confirmed the ability of this "Amplicon" sequence to induce amplification of other linked or unlinked genes in DNA-mediated gene transfer studies. When transferred into the mouse L cells with the 10.3-5'rPRL gene sequence of BrdU-responsive cells, both the human growth hormone and the HSV1TK genes are amplified in response to 5-bromodeoxyuridine. This observation is substantiated by BrdU-induced amplification of the cotransferred bacterial Neo gene. Cotransfection studies reveal that the BrdU-induced amplification capability is associated with a 4-kb DNA sequence in the 5'-flanking region of the rPRL gene of BrdU-responsive cells. These results demonstrate that genes of heterologous origin, linked or unlinked, and selected or unselected, can be coamplified when located within the amplification boundary of the Amplicon sequence.

  1. PCR-Free Enrichment of Mitochondrial DNA from Human Blood and Cell Lines for High Quality Next-Generation DNA Sequencing.

    Directory of Open Access Journals (Sweden)

    Meetha P Gould

    Full Text Available Recent advances in sequencing technology allow for accurate detection of mitochondrial sequence variants, even those in low abundance at heteroplasmic sites. Considerable sequencing cost savings can be achieved by enriching samples for mitochondrial (relative to nuclear DNA. Reduction in nuclear DNA (nDNA content can also help to avoid false positive variants resulting from nuclear mitochondrial sequences (numts. We isolate intact mitochondrial organelles from both human cell lines and blood components using two separate methods: a magnetic bead binding protocol and differential centrifugation. DNA is extracted and further enriched for mitochondrial DNA (mtDNA by an enzyme digest. Only 1 ng of the purified DNA is necessary for library preparation and next generation sequence (NGS analysis. Enrichment methods are assessed and compared using mtDNA (versus nDNA content as a metric, measured by using real-time quantitative PCR and NGS read analysis. Among the various strategies examined, the optimal is differential centrifugation isolation followed by exonuclease digest. This strategy yields >35% mtDNA reads in blood and cell lines, which corresponds to hundreds-fold enrichment over baseline. The strategy also avoids false variant calls that, as we show, can be induced by the long-range PCR approaches that are the current standard in enrichment procedures. This optimization procedure allows mtDNA enrichment for efficient and accurate massively parallel sequencing, enabling NGS from samples with small amounts of starting material. This will decrease costs by increasing the number of samples that may be multiplexed, ultimately facilitating efforts to better understand mitochondria-related diseases.

  2. The complete chloroplast genome sequence of Abies nephrolepis (Pinaceae: Abietoideae

    Directory of Open Access Journals (Sweden)

    Dong-Keun Yi

    2016-06-01

    Full Text Available The plant chloroplast (cp genome has maintained a relatively conserved structure and gene content throughout evolution. Cp genome sequences have been used widely for resolving evolutionary and phylogenetic issues at various taxonomic levels of plants. Here, we report the complete cp genome of Abies nephrolepis. The A. nephrolepis cp genome is 121,336 base pairs (bp in length including a pair of short inverted repeat regions (IRa and IRb of 139 bp each separated by a small single copy (SSC region of 54,323 bp (SSC and a large single copy region of 66,735 bp (LSC. It contains 114 genes, 68 of which are protein coding genes, 35 tRNA and four rRNA genes, six open reading frames, and one pseudogene. Seventeen repeat units and 64 simple sequence repeats (SSR have been detected in A. nephrolepis cp genome. Large IR sequences locate in 42-kb inversion points (1186 bp. The A. nephrolepis cp genome is identical to Abies koreana’s which is closely related to taxa. Pairwise comparison between two cp genomes revealed 140 polymorphic sites in each. Complete cp genome sequence of A. nephrolepis has a significant potential to provide information on the evolutionary pattern of Abietoideae and valuable data for development of DNA markers for easy identification and classification.

  3. High-Resolution Melting (HRM) of Hypervariable Mitochondrial DNA Regions for Forensic Science.

    Science.gov (United States)

    Dos Santos Rocha, Alípio; de Amorim, Isis Salviano Soares; Simão, Tatiana de Almeida; da Fonseca, Adenilson de Souza; Garrido, Rodrigo Grazinoli; Mencalha, Andre Luiz

    2018-03-01

    Forensic strategies commonly are proceeding by analysis of short tandem repeats (STRs); however, new additional strategies have been proposed for forensic science. Thus, this article standardized the high-resolution melting (HRM) of DNA for forensic analyzes. For HRM, mitochondrial DNA (mtDNA) from eight individuals were extracted from mucosa swabs by DNAzol reagent, samples were amplified by PCR and submitted to HRM analysis to identify differences in hypervariable (HV) regions I and II. To confirm HRM, all PCR products were DNA sequencing. The data suggest that is possible discriminate DNA from different samples by HRM curves. Also, uncommon dual-dissociation was identified in a single PCR product, increasing HRM analyzes by evaluation of melting peaks. Thus, HRM is accurate and useful to screening small differences in HVI and HVII regions from mtDNA and increase the efficiency of laboratory routines based on forensic genetics. © 2017 American Academy of Forensic Sciences.

  4. The impact of targeting repetitive BamHI-W sequences on the sensitivity and precision of EBV DNA quantification.

    Directory of Open Access Journals (Sweden)

    Armen Sanosyan

    Full Text Available Viral load monitoring and early Epstein-Barr virus (EBV DNA detection are essential in routine laboratory testing, especially in preemptive management of Post-transplant Lymphoproliferative Disorder. Targeting the repetitive BamHI-W sequence was shown to increase the sensitivity of EBV DNA quantification, but the variability of BamHI-W reiterations was suggested to be a source of quantification bias. We aimed to assess the extent of variability associated with BamHI-W PCR and its impact on the sensitivity of EBV DNA quantification using the 1st WHO international standard, EBV strains and clinical samples.Repetitive BamHI-W- and LMP2 single- sequences were amplified by in-house qPCRs and BXLF-1 sequence by a commercial assay (EBV R-gene™, BioMerieux. Linearity and limits of detection of in-house methods were assessed. The impact of repeated versus single target sequences on EBV DNA quantification precision was tested on B95.8 and Raji cell lines, possessing 11 and 7 copies of the BamHI-W sequence, respectively, and on clinical samples.BamHI-W qPCR demonstrated a lower limit of detection compared to LMP2 qPCR (2.33 log10 versus 3.08 log10 IU/mL; P = 0.0002. BamHI-W qPCR underestimated the EBV DNA load on Raji strain which contained fewer BamHI-W copies than the WHO standard derived from the B95.8 EBV strain (mean bias: - 0.21 log10; 95% CI, -0.54 to 0.12. Comparison of BamHI-W qPCR versus LMP2 and BXLF-1 qPCR showed an acceptable variability between EBV DNA levels in clinical samples with the mean bias being within 0.5 log10 IU/mL EBV DNA, whereas a better quantitative concordance was observed between LMP2 and BXLF-1 assays.Targeting BamHI-W resulted to a higher sensitivity compared to LMP2 but the variable reiterations of BamHI-W segment are associated with higher quantification variability. BamHI-W can be considered for clinical and therapeutic monitoring to detect an early EBV DNA and a dynamic change in viral load.

  5. The impact of targeting repetitive BamHI-W sequences on the sensitivity and precision of EBV DNA quantification.

    Science.gov (United States)

    Sanosyan, Armen; Fayd'herbe de Maudave, Alexis; Bollore, Karine; Zimmermann, Valérie; Foulongne, Vincent; Van de Perre, Philippe; Tuaillon, Edouard

    2017-01-01

    Viral load monitoring and early Epstein-Barr virus (EBV) DNA detection are essential in routine laboratory testing, especially in preemptive management of Post-transplant Lymphoproliferative Disorder. Targeting the repetitive BamHI-W sequence was shown to increase the sensitivity of EBV DNA quantification, but the variability of BamHI-W reiterations was suggested to be a source of quantification bias. We aimed to assess the extent of variability associated with BamHI-W PCR and its impact on the sensitivity of EBV DNA quantification using the 1st WHO international standard, EBV strains and clinical samples. Repetitive BamHI-W- and LMP2 single- sequences were amplified by in-house qPCRs and BXLF-1 sequence by a commercial assay (EBV R-gene™, BioMerieux). Linearity and limits of detection of in-house methods were assessed. The impact of repeated versus single target sequences on EBV DNA quantification precision was tested on B95.8 and Raji cell lines, possessing 11 and 7 copies of the BamHI-W sequence, respectively, and on clinical samples. BamHI-W qPCR demonstrated a lower limit of detection compared to LMP2 qPCR (2.33 log10 versus 3.08 log10 IU/mL; P = 0.0002). BamHI-W qPCR underestimated the EBV DNA load on Raji strain which contained fewer BamHI-W copies than the WHO standard derived from the B95.8 EBV strain (mean bias: - 0.21 log10; 95% CI, -0.54 to 0.12). Comparison of BamHI-W qPCR versus LMP2 and BXLF-1 qPCR showed an acceptable variability between EBV DNA levels in clinical samples with the mean bias being within 0.5 log10 IU/mL EBV DNA, whereas a better quantitative concordance was observed between LMP2 and BXLF-1 assays. Targeting BamHI-W resulted to a higher sensitivity compared to LMP2 but the variable reiterations of BamHI-W segment are associated with higher quantification variability. BamHI-W can be considered for clinical and therapeutic monitoring to detect an early EBV DNA and a dynamic change in viral load.

  6. Plastome Sequencing of Ten Nonmodel Crop Species Uncovers a Large Insertion of Mitochondrial DNA in Cashew.

    Science.gov (United States)

    Rabah, Samar O; Lee, Chaehee; Hajrah, Nahid H; Makki, Rania M; Alharby, Hesham F; Alhebshi, Alawiah M; Sabir, Jamal S M; Jansen, Robert K; Ruhlman, Tracey A

    2017-11-01

    In plant evolution, intracellular gene transfer (IGT) is a prevalent, ongoing process. While nuclear and mitochondrial genomes are known to integrate foreign DNA via IGT and horizontal gene transfer (HGT), plastid genomes (plastomes) have resisted foreign DNA incorporation and only recently has IGT been uncovered in the plastomes of a few land plants. In this study, we completed plastome sequences for l0 crop species and describe a number of structural features including variation in gene and intron content, inversions, and expansion and contraction of the inverted repeat (IR). We identified a putative in cinnamon ( J. Presl) and other sequenced Lauraceae and an apparent functional transfer of to the nucleus of quinoa ( Willd.). In the orchard tree cashew ( L.), we report the insertion of an ∼6.7-kb fragment of mitochondrial DNA into the plastome IR. BLASTn analyses returned high identity hits to mitogenome sequences including an intact open reading frame. Using three plastome markers for five species of , we generated a phylogeny to investigate the distribution and timing of the insertion. Four species share the insertion, suggesting that this event occurred <20 million yr ago in a single clade in the genus. Our study extends the observation of mitochondrial to plastome IGT to include long-lived tree species. While previous studies have suggested possible mechanisms facilitating IGT to the plastome, more examples of this phenomenon, along with more complete mitogenome sequences, will be required before a common, or variable, mechanism can be elucidated. Copyright © 2017 Crop Science Society of America.

  7. Heterogeneity of rat tropoelastin mRNA revealed by cDNA cloning

    International Nuclear Information System (INIS)

    Pierce, R.A.; Deak, S.B.; Stolle, C.A.; Boyd, C.D.

    1990-01-01

    A λgt11 library constructed from poly(A+) RNA isolated from aortic tissue of neonatal rats was screened for rat tropoelastin cDNAs. The first, screen, utilizing a human tropoelastin cDNA clone, provided rat tropoelastin cDNAs spanning 2.3 kb of carboxy-terminal coding sequence and extended into the 3'-untranslated region. A subsequent screen using a 5' rat tropoelastin cDNA clone yielded clones extending into the amino-terminal signal sequence coding region. Sequence analysis of these clones has provided the complete derived amino acid sequence of rat tropoelastin and allowed alignment and comparison with published bovine cDNA sequence. While the overall structure of rat tropoelastin is similar to bovine sequence, numerous substitutions, deletions, and insertions demonstrated considerable heterogeneity between species. In particular, the pentapeptide repeat VPGVG, characteristic of all tropoelastins analyzed to date, is replaced in rat tropoelastin by a repeating pentapeptide, IPGVG. The hexapeptide repeat VGVAPG, the bovine elastin receptor binding peptide, is not encoded by rat tropoelastin cDNAs. Variations in coding sequence between rat tropoelastin CDNA clones were also found which may represent mRNA heterogeneity produced by alternative splicing of the rat tropoelastin pre-mRNA

  8. Short communication Sperm DNA damage in relation to lipid ...

    African Journals Online (AJOL)

    Leyland Fraser

    Short communication. Sperm DNA ... (Received 21 January 2017; Accepted 28 February2017; First published online 8 March 2017) ... This study investigated the relationships between lipid peroxidation (LPO) and sperm DNA damage.

  9. An automated annotation tool for genomic DNA sequences using

    Indian Academy of Sciences (India)

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...

  10. Novel DNA sequence detection method based on fluorescence energy transfer

    International Nuclear Information System (INIS)

    Kobayashi, S.; Tamiya, E.; Karube, I.

    1987-01-01

    Recently the detection of specific DNA sequence, DNA analysis, has been becoming more important for diagnosis of viral genomes causing infections disease and human sequences related to inherited disorders. These methods typically involve electrophoresis, the immobilization of DNA on a solid support, hybridization to a complementary probe, the detection using labeled with /sup 32/P or nonisotopically with a biotin-avidin-enzyme system, and so on. These techniques are highly effective, but they are very time-consuming and expensive. A principle of fluorescene energy transfer is that the light energy from an excited donor (fluorophore) is transferred to an acceptor (fluorophore), if the acceptor exists in the vicinity of the donor and the excitation spectrum of donor overlaps the emission spectrum of acceptor. In this study, the fluorescence energy transfer was applied to the detection of specific DNA sequence using the hybridization method. The analyte, single-stranded DNA labeled with the donor fluorophore is hybridized to a probe DNA labeled with the acceptor. Because of the complementary DNA duplex formation, two fluorophores became to be closed to each other, and the fluorescence energy transfer was occurred

  11. mtDNA sequence diversity of Hazara ethnic group from Pakistan.

    Science.gov (United States)

    Rakha, Allah; Fatima; Peng, Min-Sheng; Adan, Atif; Bi, Rui; Yasmin, Memona; Yao, Yong-Gang

    2017-09-01

    The present study was undertaken to investigate mitochondrial DNA (mtDNA) control region sequences of Hazaras from Pakistan, so as to generate mtDNA reference database for forensic casework in Pakistan and to analyze phylogenetic relationship of this particular ethnic group with geographically proximal populations. Complete mtDNA control region (nt 16024-576) sequences were generated through Sanger Sequencing for 319 Hazara individuals from Quetta, Baluchistan. The population sample set showed a total of 189 distinct haplotypes, belonging mainly to West Eurasian (51.72%), East & Southeast Asian (29.78%) and South Asian (18.50%) haplogroups. Compared with other populations from Pakistan, the Hazara population had a relatively high haplotype diversity (0.9945) and a lower random match probability (0.0085). The dataset has been incorporated into EMPOP database under accession number EMP00680. The data herein comprises the largest, and likely most thoroughly examined, control region mtDNA dataset from Hazaras of Pakistan. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Genomic sequencing of Pleistocene cave bears

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  13. Characterization and DNA-binding specificities of Ralstonia TAL-like effectors

    KAUST Repository

    Li, Lixin

    2013-07-01

    Transcription activator-like effectors (TALEs) from Xanthomonas sp. have been used as customizable DNA-binding modules for genome-engineering applications. Ralstonia solanacearum TALE-like proteins (RTLs) exhibit similar structural features to TALEs, including a central DNA-binding domain composed of 35 amino acid-long repeats. Here, we characterize the RTLs and show that they localize in the plant cell nucleus, mediate DNA binding, and might function as transcriptional activators. RTLs have a unique DNA-binding architecture and are enriched in repeat variable di-residues (RVDs), which determine repeat DNA-binding specificities. We determined the DNA-binding specificities for the RVD sequences ND, HN, NP, and NT. The RVD ND mediates highly specific interactions with C nucleotide, HN interacts specifically with A and G nucleotides, and NP binds to C, A, and G nucleotides. Moreover, we developed a highly efficient repeat assembly approach for engineering RTL effectors. Taken together, our data demonstrate that RTLs are unique DNA-targeting modules that are excellent alternatives to be tailored to bind to user-selected DNA sequences for targeted genomic and epigenomic modifications. These findings will facilitate research concerning RTL molecular biology and RTL roles in the pathogenicity of Ralstonia spp. © 2013 The Author.

  14. Oxford Nanopore MinION Sequencing and Genome Assembly

    Directory of Open Access Journals (Sweden)

    Hengyun Lu

    2016-10-01

    Full Text Available The revolution of genome sequencing is continuing after the successful second-generation sequencing (SGS technology. The third-generation sequencing (TGS technology, led by Pacific Biosciences (PacBio, is progressing rapidly, moving from a technology once only capable of providing data for small genome analysis, or for performing targeted screening, to one that promises high quality de novo assembly and structural variation detection for human-sized genomes. In 2014, the MinION, the first commercial sequencer using nanopore technology, was released by Oxford Nanopore Technologies (ONT. MinION identifies DNA bases by measuring the changes in electrical conductivity generated as DNA strands pass through a biological pore. Its portability, affordability, and speed in data production makes it suitable for real-time applications, the release of the long read sequencer MinION has thus generated much excitement and interest in the genomics community. While de novo genome assemblies can be cheaply produced from SGS data, assembly continuity is often relatively poor, due to the limited ability of short reads to handle long repeats. Assembly quality can be greatly improved by using TGS long reads, since repetitive regions can be easily expanded into using longer sequencing lengths, despite having higher error rates at the base level. The potential of nanopore sequencing has been demonstrated by various studies in genome surveillance at locations where rapid and reliable sequencing is needed, but where resources are limited.

  15. Linkage map of the fragments of herpesvirus papio DNA.

    Science.gov (United States)

    Lee, Y S; Tanaka, A; Lau, R Y; Nonoyama, M; Rabin, H

    1981-01-01

    Herpesvirus papio (HVP), an Epstein-Barr-like virus, causes lymphoblastoid disease in baboons. The physical map of HVP DNA was constructed for the fragments produced by cleavage of HVP DNA with restriction endonucleases EcoRI, HindIII, SalI, and PvuI, which produced 12, 12, 10, and 4 fragments, respectively. The total molecular size of HVP DNA was calculated as close to 110 megadaltons. The following methods were used for construction of the map; (i) fragments near the ends of HVP DNA were identified by treating viral DNA with lambda exonuclease before restriction enzyme digestion; (ii) fragments containing nucleotide sequences in common with fragments from the second enzyme digest of HVP DNA were examined by Southern blot hybridization; and (iii) the location of some fragments was determined by isolating individual fragments from agarose gels and redigesting the isolated fragments with a second restriction enzyme. Terminal heterogeneity and internal repeats were found to be unique features of HVP DNA molecule. One to five repeats of 0.8 megadaltons were found at both terminal ends. Although the repeats of both ends shared a certain degree of homology, it was not determined whether they were identical repeats. The internal repeat sequence of HVP DNA was found in the EcoRI-C region, which extended from 8.4 to 23 megadaltons from the left end of the molecule. The average number of the repeats was calculated to be seven, and the molecular size was determined to be 1.8 megadaltons. Similar unique features have been reported in EBV DNA (D. Given and E. Kieff, J. Virol. 28:524-542, 1978). Images PMID:6261015

  16. Fingerprinting for discriminating tea germplasm using inter-simple sequence repeat (ISSR) markers

    International Nuclear Information System (INIS)

    Liu, B.Y.; Li, Y.Y.; Wang, P.S.; Wang, L.Y.; Wang, P.S.

    2012-01-01

    For the discrimination of tea germplasm at the inter-specific level, 134 tea varieties preserved in the China National Germplasm Tea Repositories (CNGTR) were analyzed using inter simple sequence repeat (ISSR) markers. Eighteen primers were chosen from 60 screened for ISSR amplification, generating 99.4% polymorphic bands. The mean Nei's gene diversity (H) and the overall mean Shannon's Information index (I) were 0.396 and 0.578, respectively, indicating a wide gene pool. Using the presence, sometimes absence of unique ISSR markers, it was possible to discriminate 32 of the genotypes tested. No single primer could discriminate all the 134 genotypes. However, UBC811 provided rich band patterns and it can discriminate 35 genotypes. The combination of two and three primers could discriminate 99 and 121 genotypes, respectively. Furthermore, the combination of band patterns or the DNA fingerprinting based on specific ISSR markers generated by UBC811, UBC835, ISSR2 and ISSR3 could discriminate all 134 genotypes tested. ISSR markers also provide a powerful tool to discriminate tea germplasm at the inter-specific level. (author)

  17. VoSeq: a voucher and DNA sequence web application.

    Directory of Open Access Journals (Sweden)

    Carlos Peña

    Full Text Available There is an ever growing number of molecular phylogenetic studies published, due to, in part, the advent of new techniques that allow cheap and quick DNA sequencing. Hence, the demand for relational databases with which to manage and annotate the amassing DNA sequences, genes, voucher specimens and associated biological data is increasing. In addition, a user-friendly interface is necessary for easy integration and management of the data stored in the database back-end. Available databases allow management of a wide variety of biological data. However, most database systems are not specifically constructed with the aim of being an organizational tool for researchers working in phylogenetic inference. We here report a new software facilitating easy management of voucher and sequence data, consisting of a relational database as back-end for a graphic user interface accessed via a web browser. The application, VoSeq, includes tools for creating molecular datasets of DNA or amino acid sequences ready to be used in commonly used phylogenetic software such as RAxML, TNT, MrBayes and PAUP, as well as for creating tables ready for publishing. It also has inbuilt BLAST capabilities against all DNA sequences stored in VoSeq as well as sequences in NCBI GenBank. By using mash-ups and calls to web services, VoSeq allows easy integration with public services such as Yahoo! Maps, Flickr, Encyclopedia of Life (EOL and GBIF (by generating data-dumps that can be processed with GBIF's Integrated Publishing Toolkit.

  18. Forensic DNA testing.

    Science.gov (United States)

    Butler, John M

    2011-12-01

    Forensic DNA testing has a number of applications, including parentage testing, identifying human remains from natural or man-made disasters or terrorist attacks, and solving crimes. This article provides background information followed by an overview of the process of forensic DNA testing, including sample collection, DNA extraction, PCR amplification, short tandem repeat (STR) allele separation and sizing, typing and profile interpretation, statistical analysis, and quality assurance. The article concludes with discussions of possible problems with the data and other forensic DNA testing techniques.

  19. Asymmetric epigenetic modification and elimination of rDNA sequences by polyploidization in wheat.

    Science.gov (United States)

    Guo, Xiang; Han, Fangpu

    2014-11-01

    rRNA genes consist of long tandem repeats clustered on chromosomes, and their products are important functional components of the ribosome. In common wheat (Triticum aestivum), rDNA loci from the A and D genomes were largely lost during the evolutionary process. This biased DNA elimination may be related to asymmetric transcription and epigenetic modifications caused by the polyploid formation. Here, we observed both sets of parental nucleolus organizing regions (NORs) were expressed after hybridization, but asymmetric silencing of one parental NOR was immediately induced by chromosome doubling, and reversing the ploidy status could not reactivate silenced NORs. Furthermore, increased CHG and CHH DNA methylation on promoters was accompanied by asymmetric silencing of NORs. Enrichment of H3K27me3 and H3K9me2 modifications was also observed to be a direct response to increased DNA methylation and transcriptional inactivation of NOR loci. Both A and D genome NOR loci with these modifications started to disappear in the S4 generation and were completely eliminated by the S7 generation in synthetic tetraploid wheat. Our results indicated that asymmetric epigenetic modification and elimination of rDNA sequences between different donor genomes may lead to stable allopolyploid wheat with increased differentiation and diversity. © 2014 American Society of Plant Biologists. All rights reserved.

  20. Electronic Transport in Single-Stranded DNA Molecule Related to Huntington's Disease

    Science.gov (United States)

    Sarmento, R. G.; Silva, R. N. O.; Madeira, M. P.; Frazão, N. F.; Sousa, J. O.; Macedo-Filho, A.

    2018-04-01

    We report a numerical analysis of the electronic transport in single chain DNA molecule consisting of 182 nucleotides. The DNA chains studied were extracted from a segment of the human chromosome 4p16.3, which were modified by expansion of CAG (cytosine-adenine-guanine) triplet repeats to mimics Huntington's disease. The mutated DNA chains were connected between two platinum electrodes to analyze the relationship between charge propagation in the molecule and Huntington's disease. The computations were performed within a tight-binding model, together with a transfer matrix technique, to investigate the current-voltage (I-V) of 23 types of DNA sequence and compare them with the distributions of the related CAG repeat numbers with the disease. All DNA sequences studied have a characteristic behavior of a semiconductor. In addition, the results showed a direct correlation between the current-voltage curves and the distributions of the CAG repeat numbers, suggesting possible applications in the development of DNA-based biosensors for molecular diagnostics.

  1. Identification of multiple binding sites for the THAP domain of the Galileo transposase in the long terminal inverted-repeats.

    Science.gov (United States)

    Marzo, Mar; Liu, Danxu; Ruiz, Alfredo; Chalmers, Ronald

    2013-08-01

    Galileo is a DNA transposon responsible for the generation of several chromosomal inversions in Drosophila. In contrast to other members of the P-element superfamily, it has unusually long terminal inverted-repeats (TIRs) that resemble those of Foldback elements. To investigate the function of the long TIRs we derived consensus and ancestral sequences for the Galileo transposase in three species of Drosophilids. Following gene synthesis, we expressed and purified their constituent THAP domains and tested their binding activity towards the respective Galileo TIRs. DNase I footprinting located the most proximal DNA binding site about 70 bp from the transposon end. Using this sequence we identified further binding sites in the tandem repeats that are found within the long TIRs. This suggests that the synaptic complex between Galileo ends may be a complicated structure containing higher-order multimers of the transposase. We also attempted to reconstitute Galileo transposition in Drosophila embryos but no events were detected. Thus, although the limited numbers of Galileo copies in each genome were sufficient to provide functional consensus sequences for the THAP domains, they do not specify a fully active transposase. Since the THAP recognition sequence is short, and will occur many times in a large genome, it seems likely that the multiple binding sites within the long, internally repetitive, TIRs of Galileo and other Foldback-like elements may provide the transposase with its binding specificity. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.

  2. Fidelity and mutational spectrum of Pfu DNA polymerase on a human mitochondrial DNA sequence.

    Science.gov (United States)

    André, P; Kim, A; Khrapko, K; Thilly, W G

    1997-08-01

    The study of rare genetic changes in human tissues requires specialized techniques. Point mutations at fractions at or below 10(-6) must be observed to discover even the most prominent features of the point mutational spectrum. PCR permits the increase in number of mutant copies but does so at the expense of creating many additional mutations or "PCR noise". Thus, each DNA sequence studied must be characterized with regard to the DNA polymerase and conditions used to avoid interpreting a PCR-generated mutation as one arising in human tissue. The thermostable DNA polymerase derived from Pyrococcus furiosus designated Pfu has the highest fidelity of any DNA thermostable polymerase studied to date, and this property recommends it for analyses of tissue mutational spectra. Here, we apply constant denaturant capillary electrophoresis (CDCE) to separate and isolate the products of DNA amplification. This new strategy permitted direct enumeration and identification of point mutations created by Pfu DNA polymerase in a 96-bp low melting domain of a human mitochondrial sequence despite the very low mutant fractions generated in the PCR process. This sequence, containing part of the tRNA glycine and NADH dehydrogenase subunit 3 genes, is the target of our studies of mitochondrial mutagenesis in human cells and tissues. Incorrectly synthesized sequences were separated from the wild type as mutant/wild-type heteroduplexes by sequential enrichment on CDCE. An artificially constructed mutant was used as an internal standard to permit calculation of the mutant fraction. Our study found that the average error rate (mutations per base pair duplication) of Pfu was 6.5 x 10(-7), and five of its more frequent mutations (hot spots) consisted of three transversions (GC-->TA, AT-->TA, and AT-->CG), one transition (AT-->GC), and one 1-bp deletion (in an AAAAAA sequence). To achieve an even higher sensitivity, the amount of Pfu-induced mutants must be reduced.

  3. Spectral entropy criteria for structural segmentation in genomic DNA sequences

    International Nuclear Information System (INIS)

    Chechetkin, V.R.; Lobzin, V.V.

    2004-01-01

    The spectral entropy is calculated with Fourier structure factors and characterizes the level of structural ordering in a sequence of symbols. It may efficiently be applied to the assessment and reconstruction of the modular structure in genomic DNA sequences. We present the relevant spectral entropy criteria for the local and non-local structural segmentation in DNA sequences. The results are illustrated with the model examples and analysis of intervening exon-intron segments in the protein-coding regions

  4. Functional role of a highly repetitive DNA sequence in anchorage of the mouse genome.

    Science.gov (United States)

    Neuer-Nitsche, B; Lu, X N; Werner, D

    1988-09-12

    The major portion of the eukaryotic genome consists of various categories of repetitive DNA sequences which have been studied with respect to their base compositions, organizations, copy numbers, transcription and species specificities; their biological roles, however, are still unclear. A novel quality of a highly repetitive mouse DNA sequence is described which points to a functional role: All copies (approximately 50,000 per haploid genome) of this DNA sequence reside on genomic Alu I DNA fragments each associated with nuclear polypeptides that are not released from DNA by proteinase K, SDS and phenol extraction. By this quality the repetitive DNA sequence is classified as a member of the sub-set of DNA sequences involved in tight DNA-polypeptide complexes which have been previously shown to be components of the subnuclear structure termed 'nuclear matrix'. From these results it has to be concluded that the repetitive DNA sequence characterized in this report represents or comprises a signal for a large number of site specific attachment points of the mouse genome in the nuclear matrix.

  5. Mutations in Cas9 Enhance the Rate of Acquisition of Viral Spacer Sequences during the CRISPR-Cas Immune Response.

    Science.gov (United States)

    Heler, Robert; Wright, Addison V; Vucelja, Marija; Bikard, David; Doudna, Jennifer A; Marraffini, Luciano A

    2017-01-05

    CRISPR loci and their associated (Cas) proteins encode a prokaryotic immune system that protects against viruses and plasmids. Upon infection, a low fraction of cells acquire short DNA sequences from the invader. These sequences (spacers) are integrated in between the repeats of the CRISPR locus and immunize the host against the matching invader. Spacers specify the targets of the CRISPR immune response through transcription into short RNA guides that direct Cas nucleases to the invading DNA molecules. Here we performed random mutagenesis of the RNA-guided Cas9 nuclease to look for variants that provide enhanced immunity against viral infection. We identified a mutation, I473F, that increases the rate of spacer acquisition by more than two orders of magnitude. Our results highlight the role of Cas9 during CRISPR immunization and provide a useful tool to study this rare process and develop it as a biotechnological application. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. DNA fingerprinting of Mycobacterium tuberculosis: from phage typing to whole-genome sequencing.

    Science.gov (United States)

    Schürch, Anita C; van Soolingen, Dick

    2012-06-01

    Current typing methods for Mycobacterium tuberculosis complex evolved from simple phenotypic approaches like phage typing and drug susceptibility profiling to DNA-based strain typing methods, such as IS6110-restriction fragment length polymorphisms (RFLP) and variable number of tandem repeats (VNTR) typing. Examples of the usefulness of molecular typing are source case finding and epidemiological linkage of tuberculosis (TB) cases, international transmission of MDR/XDR-TB, the discrimination between endogenous reactivation and exogenous re-infection as a cause of relapses after curative treatment of tuberculosis, the evidence of multiple M. tuberculosis infections, and the disclosure of laboratory cross-contaminations. Simultaneously, phylogenetic analyses were developed based on single nucleotide polymorphisms (SNPs), genomic deletions usually referred to as regions of difference (RDs) and spoligotyping which served both strain typing and phylogenetic analysis. National and international initiatives that rely on the application of these typing methods have brought significant insight into the molecular epidemiology of tuberculosis. However, current DNA fingerprinting methods have important limitations. They can often not distinguish between genetically closely related strains and the turn-over of these markers is variable. Moreover, the suitability of most DNA typing methods for phylogenetic reconstruction is limited as they show a high propensity of convergent evolution or misinfer genetic distances. In order to fully explore the possibilities of genotyping in the molecular epidemiology of tuberculosis and to study the phylogeny of the causative bacteria reliably, the application of whole-genome sequencing (WGS) analysis for all M. tuberculosis isolates is the optimal, although currently still a costly solution. In the last years WGS for typing of pathogens has been explored and yielded important additional information on strain diversity in comparison to the

  7. Virtual Genome Walking across the 32 Gb Ambystoma mexicanum genome; assembling gene models and intronic sequence.

    Science.gov (United States)

    Evans, Teri; Johnson, Andrew D; Loose, Matthew

    2018-01-12

    Large repeat rich genomes present challenges for assembly using short read technologies. The 32 Gb axolotl genome is estimated to contain ~19 Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genome Walking as it locally assembles whole genome reads based on a reference transcriptome, identifying exons and iteratively extending them into surrounding genomic sequence. These assemblies are then linked and refined to generate gene models including upstream and downstream genomic, and intronic, sequence. Our assemblies are validated by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. Our analyses of axolotl intron length, intron-exon structure, repeat content and synteny provide novel insights into the genic structure of this model species. This resource will enable new experimental approaches in axolotl, such as ChIP-Seq and CRISPR and aid in future whole genome sequencing efforts. The assembled sequences and annotations presented here are freely available for download from https://tinyurl.com/y8gydc6n . The software pipeline is available from https://github.com/LooseLab/iterassemble .

  8. Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads.

    Science.gov (United States)

    Lima, Leandro; Sinaimeri, Blerina; Sacomoto, Gustavo; Lopez-Maestre, Helene; Marchet, Camille; Miele, Vincent; Sagot, Marie-France; Lacroix, Vincent

    2017-01-01

    The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Most transcriptome assemblers of short reads are based on de Bruijn graphs (DBG) and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them. The results of this work are threefold. First, we introduce a formal model for representing high copy-number and low-divergence repeats in RNA-seq data and exploit its properties to infer a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying such subgraphs in a DBG is NP-complete. Second, we show that in the specific case of local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs, and we present an efficient algorithm to enumerate AS events that are not included in repeats. Using simulated data, we show that this strategy is significantly more sensitive and precise than the previous version of KisSplice (Sacomoto et al. in WABI, pp 99-111, 1), Trinity (Grabherr et al. in Nat Biotechnol 29(7):644-652, 2), and Oases (Schulz et al. in Bioinformatics 28(8):1086-1092, 3), for the specific task of calling AS events. Third, we turn our focus to full-length transcriptome assembly, and we show that exploring the topology of DBGs can improve de novo transcriptome evaluation methods. Based on the observation that repeats create complicated regions in a DBG, and when assemblers try to traverse these regions, they can infer erroneous transcripts, we propose a measure to flag transcripts traversing such troublesome regions, thereby giving a confidence level for each transcript. The originality of our work when

  9. Colorimetric and dynamic light scattering detection of DNA sequences by using positively charged gold nanospheres: a comparative study with gold nanorods

    Science.gov (United States)

    Pylaev, T. E.; Khanadeev, V. A.; Khlebtsov, B. N.; Dykman, L. A.; Bogatyrev, V. A.; Khlebtsov, N. G.

    2011-07-01

    We introduce a new genosensing approach employing CTAB (cetyltrimethylammonium bromide)-coated positively charged colloidal gold nanoparticles (GNPs) to detect target DNA sequences by using absorption spectroscopy and dynamic light scattering. The approach is compared with a previously reported method employing unmodified CTAB-coated gold nanorods (GNRs). Both approaches are based on the observation that whereas the addition of probe and target ssDNA to CTAB-coated particles results in particle aggregation, no aggregation is observed after addition of probe and nontarget DNA sequences. Our goal was to compare the feasibility and sensitivity of both methods. A 21-mer ssDNA from the human immunodeficiency virus type 1 HIV-1 U5 long terminal repeat (LTR) sequence and a 23-mer ssDNA from the Bacillus anthracis cryptic protein and protective antigen precursor (pagA) genes were used as ssDNA models. In the case of GNRs, unexpectedly, the colorimetric test failed with perfect cigar-like particles but could be performed with dumbbell and dog-bone rods. By contrast, our approach with cationic CTAB-coated GNPs is easy to implement and possesses excellent feasibility with retention of comparable sensitivity—a 0.1 nM concentration of target cDNA can be detected with the naked eye and 10 pM by dynamic light scattering (DLS) measurements. The specificity of our method is illustrated by successful DLS detection of one-three base mismatches in cDNA sequences for both DNA models. These results suggest that the cationic GNPs and DLS can be used for genosensing under optimal DNA hybridization conditions without any chemical modifications of the particle surface with ssDNA molecules and signal amplification. Finally, we discuss a more than two-three-order difference in the reported estimations of the detection sensitivity of colorimetric methods (0.1 to 10-100 pM) to show that the existing aggregation models are inconsistent with the detection limits of about 0.1-1 pM DNA and that

  10. Capillary gel electrophoresis for rapid, high resolution DNA sequencing.

    OpenAIRE

    Swerdlow, H; Gesteland, R

    1990-01-01

    Capillary gel electrophoresis has been demonstrated for the separation and detection of DNA sequencing samples. Enzymatic dideoxy nucleotide chain termination was employed, using fluorescently tagged oligonucleotide primers and laser based on-column detection (limit of detection is 6,000 molecules per peak). Capillary gel separations were shown to be three times faster, with better resolution (2.4 x), and higher separation efficiency (5.4 x) than a conventional automated slab gel DNA sequenci...

  11. X-Chromosomal short tandem repeat loci in the Turkish population ...

    African Journals Online (AJOL)

    In this study, we aimed to demonstrate the importance and utility of polymorphic short tandem repeat (STR) found on the human X chromosome and to provide the first allelic frequency data of X-STR (X chromosomal) loci in the Turkish population. Blood samples were taken from unrelated individuals (135 males and 129 ...

  12. Method for priming and DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Mugasimangalam, R.C.; Ulanovsky, L.E.

    1997-12-01

    A method is presented for improving the priming specificity of an oligonucleotide primer that is non-unique in a nucleic acid template which includes selecting a continuous stretch of several nucleotides in the template DNA where one of the four bases does not occur in the stretch. This also includes bringing the template DNA in contract with a non-unique primer partially or fully complimentary to the sequence immediately upstream of the selected sequence stretch. This results in polymerase-mediated differential extension of the primer in the presence of a subset of deoxyribonucleotide triphosphates that does not contain the base complementary to the base absent in the selected sequence stretch. These reactions occur at a temperature sufficiently low for allowing the extension of the non-unique primer. The method causes polymerase-mediated extension reactions in the presence of all four natural deoxyribonucleotide triphosphates or modifications. At this high temperature discrimination occurs against priming sites of the non-unique primer where the differential extension has not made the primer sufficiently stable to prime. However, the primer extended at the selected stretch is sufficiently stable to prime.

  13. OPTSDNA: Performance evaluation of an efficient distributed bioinformatics system for DNA sequence analysis.

    Science.gov (United States)

    Khan, Mohammad Ibrahim; Sheel, Chotan

    2013-01-01

    Storage of sequence data is a big concern as the amount of data generated is exponential in nature at several locations. Therefore, there is a need to develop techniques to store data using compression algorithm. Here we describe optimal storage algorithm (OPTSDNA) for storing large amount of DNA sequences of varying length. This paper provides performance analysis of optimal storage algorithm (OPTSDNA) of a distributed bioinformatics computing system for analysis of DNA sequences. OPTSDNA algorithm is used for storing various sizes of DNA sequences into database. DNA sequences of different lengths were stored by using this algorithm. These input DNA sequences are varied in size from very small to very large. Storage size is calculated by this algorithm. Response time is also calculated in this work. The efficiency and performance of the algorithm is high (in size calculation with percentage) when compared with other known with sequential approach.

  14. Premutation huntingtin allele adopts a non-B conformation and contains a hot spot for DNA damage

    International Nuclear Information System (INIS)

    Jarem, Daniel A.; Delaney, Sarah

    2011-01-01

    Highlights: ► First structural and thermodynamic analysis of premutation allele of HD. ► Premutation allele of HD adopts a stem-loop non-B conformation. ► Healthy and premutation length stem-loops are hyper-susceptible to oxidative damage. ► Stability of stem-loop structures increases linearly with repeat length. ► Thermodynamic stability, not the ability to adopt non-B conformation, distinguishes DNA prone to expansion from stable DNA. -- Abstract: The expansion of a CAG trinucleotide repeat (TNR) sequence has been linked to several neurological disorders, for example, Huntington’s disease (HD). In HD, healthy individuals have 5–35 CAG repeats. Those with 36–39 repeats have the premutation allele, which is known to be prone to expansion. In the disease state, greater than 40 repeats are present. Interestingly, the formation of non-B DNA conformations by the TNR sequence is proposed to contribute to the expansion. Here we provide the first structural and thermodynamic analysis of a premutation length TNR sequence. Using chemical probes of nucleobase accessibility, we found that similar to (CAG) 10 , the premutation length sequence (CAG) 36 forms a stem-loop hairpin and contains a hot spot for DNA damage. Additionally, calorimetric analysis of a series of (CAG) n sequences, that includes repeat tracts in both the healthy and premutation ranges, reveal that thermodynamic stability increases linearly with the number of repeats. Based on these data, we propose that while non-B conformations can be formed by TNR tracts found in both the healthy and premutation allele, only sequences containing at least 36 repeats have sufficient thermodynamic stability to contribute to expansion.

  15. The cDNA sequence of a neutral horseradish peroxidase.

    Science.gov (United States)

    Bartonek-Roxå, E; Eriksson, H; Mattiasson, B

    1991-02-16

    A cDNA clone encoding a horseradish (Armoracia rusticana) peroxidase has been isolated and characterized. The cDNA contains 1378 nucleotides excluding the poly(A) tail and the deduced protein contains 327 amino acids which includes a 28 amino acid leader sequence. The predicted amino acid sequence is nine amino acids shorter than the major isoenzyme belonging to the horseradish peroxidase C group (HRP-C) and the sequence shows 53.7% identity with this isoenzyme. The described clone encodes nine cysteines of which eight correspond well with the cysteines found in HRP-C. Five potential N-glycosylation sites with the general sequence Asn-X-Thr/Ser are present in the deduced sequence. Compared to the earlier described HRP-C this is three glycosylation sites less. The shorter sequence and fewer N-glycosylation sites give the native isoenzyme a molecular weight of several thousands less than the horseradish peroxidase C isoenzymes. Comparison with the net charge value of HRP-C indicates that the described cDNA clone encodes a peroxidase which has either the same or a slightly less basic pI value, depending on whether the encoded protein is N-terminally blocked or not. This excludes the possibility that HRP-n could belong to either the HRP-A, -D or -E groups. The low sequence identity (53.7%) with HRP-C indicates that the described clone does not belong to the HRP-C isoenzyme group and comparison of the total amino acid composition with the HRP-B group does not place the described clone within this isoenzyme group. Our conclusion is that the described cDNA clone encodes a neutral horseradish peroxidase which belongs to a new, not earlier described, horseradish peroxidase group.

  16. Real sequence effects on the search dynamics of transcription factors on DNA

    DEFF Research Database (Denmark)

    Bauer, Maximilian; Rasmussen, Emil S.; Lomholt, Michael A.

    2015-01-01

    Recent experiments show that transcription factors (TFs) indeed use the facilitated diffusion mechanism to locate their target sequences on DNA in living bacteria cells: TFs alternate between sliding motion along DNA and relocation events through the cytoplasm. From simulations and theoretical...... analysis we study the TF-sliding motion for a large section of the DNA-sequence of a common E. coli strain, based on the two-state TF-model with a fast-sliding search state and a recognition state enabling target detection. For the probability to detect the target before dissociating from DNA the TF...... on the underlying nucleotide sequence is varied. A moderate dependence maximises the capability to distinguish between the main operator and similar sequences. Moreover, these auxiliary operators serve as starting points for DNA looping with the main operator, yielding a spectrum of target detection times spanning...

  17. Detection of fetal-specific DNA after enrichment for trophoblasts using the monoclonal antibody LK26 in model systems but failure to demonstrate fetal DNA in maternal peripheral blood

    DEFF Research Database (Denmark)

    Hviid, T V; Sørensen, S; Morling, N

    1999-01-01

    Trophoblast cells can be detected in maternal blood during normal human pregnancy and DNA from these cells may be used for non-invasive prenatal diagnosis of inherited diseases. The possibility of enriching trophoblast cells from maternal blood samples using a monoclonal antibody (LK26) against...... a folate-binding protein, which recognizes trophoblast in normal tissues, in conjunction with immunomagnetic cell sorting was investigated. Verification of the presence of fetal DNA in the sorted samples was done by detection of fetal/paternal-specific short tandem repeat (STR) alleles using polymerase...... on peripheral maternal blood samples. However, it was not possible to detect fetal DNA sequences in these samples, most probably due to the extremely low number of trophoblast cells. Positive identification and retrieval of trophoblast cells in suspension or trophoblast nuclear material prepared on microscope...

  18. High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.

    Science.gov (United States)

    Inagaki, Soichi; Henry, Isabelle M; Lieberman, Meric C; Comai, Luca

    2015-01-01

    Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.

  19. Phylogenetic footprinting of non-coding RNA: hammerhead ribozyme sequences in a satellite DNA family of Dolichopoda cave crickets (Orthoptera, Rhaphidophoridae

    Directory of Open Access Journals (Sweden)

    Venanzetti Federica

    2010-01-01

    Full Text Available Abstract Background The great variety in sequence, length, complexity, and abundance of satellite DNA has made it difficult to ascribe any function to this genome component. Recent studies have shown that satellite DNA can be transcribed and be involved in regulation of chromatin structure and gene expression. Some satellite DNAs, such as the pDo500 sequence family in Dolichopoda cave crickets, have a catalytic hammerhead (HH ribozyme structure and activity embedded within each repeat. Results We assessed the phylogenetic footprints of the HH ribozyme within the pDo500 sequences from 38 different populations representing 12 species of Dolichopoda. The HH region was significantly more conserved than the non-hammerhead (NHH region of the pDo500 repeat. In addition, stems were more conserved than loops. In stems, several compensatory mutations were detected that maintain base pairing. The core region of the HH ribozyme was affected by very few nucleotide substitutions and the cleavage position was altered only once among 198 sequences. RNA folding of the HH sequences revealed that a potentially active HH ribozyme can be found in most of the Dolichopoda populations and species. Conclusions The phylogenetic footprints suggest that the HH region of the pDo500 sequence family is selected for function in Dolichopoda cave crickets. However, the functional role of HH ribozymes in eukaryotic organisms is unclear. The possible functions have been related to trans cleavage of an RNA target by a ribonucleoprotein and regulation of gene expression. Whether the HH ribozyme in Dolichopoda is involved in similar functions remains to be investigated. Future studies need to demonstrate how the observed nucleotide changes and evolutionary constraint have affected the catalytic efficiency of the hammerhead.

  20. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.

    Science.gov (United States)

    Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook

    2014-11-01

    As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of