WorldWideScience

Sample records for dna sequence pattern

  1. Similarity Estimation Between DNA Sequences Based on Local Pattern Histograms of Binary Images

    Institute of Scientific and Technical Information of China (English)

    Yusei Kobori; Satoshi Mizuta

    2016-01-01

    Graphical representation of DNA sequences is one of the most popular techniques for alignment-free sequence comparison. Here, we propose a new method for the feature extraction of DNA sequences represented by binary images, by estimating the similarity between DNA sequences using the frequency histograms of local bitmap patterns of images. Our method shows linear time complexity for the length of DNA sequences, which is practical even when long sequences, such as whole genome sequences, are compared. We tested five distance measures for the estimation of sequence similarities, and found that the histogram intersection and Manhattan distance are the most appropriate ones for phylogenetic analyses.

  2. A pattern matching approach for the estimation of alignment between any two given DNA sequences.

    Science.gov (United States)

    Basu, K; Sriraam, N; Richard, R J A

    2007-08-01

    For a given DNA sequence, it is well known that pair wise alignment schemes are used to determine the similarity with the DNA sequences available in the databanks. The efficiency of the alignment decides the type of amino acids and its corresponding proteins. In order to evaluate the given DNA sequence for its proteomic identity, a pattern matching approach is proposed in this paper. A block based semi-global alignment scheme is introduced to determine the similarity between the DNA sequences (known and given). The two DNA sequences are divided into blocks of equal length and alignment is performed which minimizes the computational complexity. The efficiency of the alignment scheme is evaluated using the parameter, percentage of similarity (POS). Four essential DNA version of the amino acids that emphasize the importance of proteomic functionalities are chosen as patterns and matching is performed with the known and given DNA sequences to determine the similarity between them. The ratio of amino acid counts between the two sequences is estimated and the results are compared with that of the POS value. It is found from the experimental results that higher the POS value and the pattern matching higher are the similarity between the two DNA sequences. The optimal block is also identified based on the POS value and amino acids count.

  3. Dna Sequencing

    Science.gov (United States)

    Tabor, Stanley; Richardson, Charles C.

    1995-04-25

    A method for sequencing a strand of DNA, including the steps off: providing the strand of DNA; annealing the strand with a primer able to hybridize to the strand to give an annealed mixture; incubating the mixture with four deoxyribonucleoside triphosphates, a DNA polymerase, and at least three deoxyribonucleoside triphosphates in different amounts, under conditions in favoring primer extension to form nucleic acid fragments complementory to the DNA to be sequenced; labelling the nucleic and fragments; separating them and determining the position of the deoxyribonucleoside triphosphates by differences in the intensity of the labels, thereby to determine the DNA sequence.

  4. mapDamage: testing for damage patterns in ancient DNA sequences.

    Science.gov (United States)

    Ginolhac, Aurelien; Rasmussen, Morten; Gilbert, M Thomas P; Willerslev, Eske; Orlando, Ludovic

    2011-08-01

    Ancient DNA extracts consist of a mixture of contaminant DNA molecules, most often originating from environmental microbes, and endogenous fragments exhibiting substantial levels of DNA damage. The latter introduce specific nucleotide misincorporations and DNA fragmentation signatures in sequencing reads that could be advantageously used to argue for sequence validity. mapDamage is a Perl script that computes nucleotide misincorporation and fragmentation patterns using next-generation sequencing reads mapped against a reference genome. The Perl script outputs are further automatically processed in embedded R script in order to detect typical patterns of genuine ancient DNA sequences. The Perl script mapDamage is freely available with documentation and example files at http://geogenetics.ku.dk/all_literature/mapdamage/. The script requires prior installation of the SAMtools suite and R environment and has been validated on both GNU/Linux and MacOSX operating systems.

  5. Genomic MRI - a Public Resource for Studying Sequence Patterns within Genomic DNA

    Science.gov (United States)

    Prakash, Ashwin; Bechtel, Jason; Fedorov, Alexei

    2011-01-01

    Non-coding genomic regions in complex eukaryotes, including intergenic areas, introns, and untranslated segments of exons, are profoundly non-random in their nucleotide composition and consist of a complex mosaic of sequence patterns. These patterns include so-called Mid-Range Inhomogeneity (MRI) regions -- sequences 30-10000 nucleotides in length that are enriched by a particular base or combination of bases (e.g. (G+T)-rich, purine-rich, etc.). MRI regions are associated with unusual (non-B-form) DNA structures that are often involved in regulation of gene expression, recombination, and other genetic processes (Fedorova & Fedorov 2010). The existence of a strong fixation bias within MRI regions against mutations that tend to reduce their sequence inhomogeneity additionally supports the functionality and importance of these genomic sequences (Prakash et al. 2009). Here we demonstrate a freely available Internet resource -- the Genomic MRI program package -- designed for computational analysis of genomic sequences in order to find and characterize various MRI patterns within them (Bechtel et al. 2008). This package also allows generation of randomized sequences with various properties and level of correspondence to the natural input DNA sequences. The main goal of this resource is to facilitate examination of vast regions of non-coding DNA that are still scarcely investigated and await thorough exploration and recognition. PMID:21610667

  6. Reduced representation bisulphite sequencing of the cattle genome reveals DNA methylation patterns

    Science.gov (United States)

    Using reduced representation bisulphite sequencing (RRBS), we obtained the first single-base-resolution maps of bovine DNA methylation in ten somatic tissues. In total, we observed 1,868,049 cytosines in the CG-enriched regions. Similar to the methylation patterns in other species, the CG context wa...

  7. Nanoscale programmable sequence-specific patterning of DNA scaffolds using RecA protein

    Science.gov (United States)

    Sharma, R.; Davies, A. G.; Wälti, C.

    2012-09-01

    Molecular self-assembly inherent to many biological molecules, in conjunction with suitable molecular scaffolds to facilitate programmable positioning of nanoscale objects, offers a promising approach for the integration of functional nanoscale complexes into macroscopic host devices. Here, we report the use of the protein RecA as a means of highly efficient programmable patterning of double-stranded (ds)DNA molecules with molecular-scale precision at specific locations along the DNA strand. RecA proteins form nucleoprotein filaments with single-stranded (ss)DNA molecules, which are chosen to be of sequence homologous to the desired binding region on the dsDNA scaffold. We show that the patterning yield can be in excess of 85% and we demonstrate that concurrent patterning of multiple locations on the same dsDNA scaffold can be achieved with separation between the assembled nucleoprotein filaments of less than 4 nm. This is an important prerequisite for this programmable and flexible DNA scaffold patterning technique to be employed in molecular- and nanoscale assembly applications.

  8. Design pattern mining using distributed learning automata and DNA sequence alignment.

    Directory of Open Access Journals (Sweden)

    Mansour Esmaeilpour

    Full Text Available CONTEXT: Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. OBJECTIVE: This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA and deoxyribonucleic acid (DNA sequences alignment. METHOD: The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. RESULTS: The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. CONCLUSION: The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns.

  9. Design Pattern Mining Using Distributed Learning Automata and DNA Sequence Alignment

    Science.gov (United States)

    Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina

    2014-01-01

    Context Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. Objective This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. Method The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. Results The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. Conclusion The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns. PMID:25243670

  10. Profiling the genome-wide DNA methylation pattern of porcine ovaries using reduced representation bisulfite sequencing.

    Science.gov (United States)

    Yuan, Xiao-Long; Gao, Ning; Xing, Yan; Zhang, Hai-Bin; Zhang, Ai-Ling; Liu, Jing; He, Jin-Long; Xu, Yuan; Lin, Wen-Mian; Chen, Zan-Mou; Zhang, Hao; Zhang, Zhe; Li, Jia-Qi

    2016-02-25

    Substantial evidence has shown that DNA methylation regulates the initiation of ovarian and sexual maturation. Here, we investigated the genome-wide profile of DNA methylation in porcine ovaries at single-base resolution using reduced representation bisulfite sequencing. The biological variation was minimal among the three ovarian replicates. We found hypermethylation frequently occurred in regions with low gene abundance, while hypomethylation in regions with high gene abundance. The DNA methylation around transcriptional start sites was negatively correlated with their own CpG content. Additionally, the methylation level in the bodies of genes was higher than that in their 5' and 3' flanking regions. The DNA methylation pattern of the low CpG content promoter genes differed obviously from that of the high CpG content promoter genes. The DNA methylation level of the porcine ovary was higher than that of the porcine intestine. Analyses of the genome-wide DNA methylation in porcine ovaries would advance the knowledge and understanding of the porcine ovarian methylome.

  11. Shotgun Bisulfite Sequencing of the Betula platyphylla Genome Reveals the Tree’s DNA Methylation Patterning

    Directory of Open Access Journals (Sweden)

    Chang Su

    2014-12-01

    Full Text Available DNA methylation plays a critical role in the regulation of gene expression. Most studies of DNA methylation have been performed in herbaceous plants, and little is known about the methylation patterns in tree genomes. In the present study, we generated a map of methylated cytosines at single base pair resolution for Betula platyphylla (white birch by bisulfite sequencing combined with transcriptomics to analyze DNA methylation and its effects on gene expression. We obtained a detailed view of the function of DNA methylation sequence composition and distribution in the genome of B. platyphylla. There are 34,460 genes in the whole genome of birch, and 31,297 genes are methylated. Conservatively, we estimated that 14.29% of genomic cytosines are methylcytosines in birch. Among the methylation sites, the CHH context accounts for 48.86%, and is the largest proportion. Combined transcriptome and methylation analysis showed that the genes with moderate methylation levels had higher expression levels than genes with high and low methylation. In addition, methylated genes are highly enriched for the GO subcategories of binding activities, catalytic activities, cellular processes, response to stimulus and cell death, suggesting that methylation mediates these pathways in birch trees.

  12. A Novel Signal Processing Measure to Identify Exact and Inexact Tandem Repeat Patterns in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Ravi Gupta

    2007-03-01

    Full Text Available The identification and analysis of repetitive patterns are active areas of biological and computational research. Tandem repeats in telomeres play a role in cancer and hypervariable trinucleotide tandem repeats are linked to over a dozen major neurodegenerative genetic disorders. In this paper, we present an algorithm to identify the exact and inexact repeat patterns in DNA sequences based on orthogonal exactly periodic subspace decomposition technique. Using the new measure our algorithm resolves the problems like whether the repeat pattern is of period P or its multiple (i.e., 2P, 3P, etc., and several other problems that were present in previous signal-processing-based algorithms. We present an efficient algorithm of O(NLw logLw, where N is the length of DNA sequence and Lw is the window length, for identifying repeats. The algorithm operates in two stages. In the first stage, each nucleotide is analyzed separately for periodicity, and in the second stage, the periodic information of each nucleotide is combined together to identify the tandem repeats. Datasets having exact and inexact repeats were taken up for the experimental purpose. The experimental result shows the effectiveness of the approach.

  13. Patterns of rDNA and telomeric sequences diversification: contribution to repetitive DNA organization in Phyllostomidae bats.

    Science.gov (United States)

    Calixto, Merilane da Silva; de Andrade, Izaquiel Santos; Cabral-de-Mello, Diogo Cavalcanti; Santos, Neide; Martins, Cesar; Loreto, Vilma; de Souza, Maria José

    2014-02-01

    Chromosomal organization and the evolution of genome architecture can be investigated by physical mapping of the genes for 45S and 5S ribosomal DNAs (rDNAs) and by the analysis of telomeric sequences. We studied 12 species of bats belonging to four subfamilies of the family Phyllostomidae in order to correlate patterns of distribution of heterochromatin and the multigene families for rDNA. The number of clusters for 45S gene ranged from one to three pairs, with exclusively location in autosomes, except for Carollia perspicillata that had in X chromosome. The 5S gene all the species studied had only one site located on an autosomal pair. In no species the 45S and 5S genes collocated. The fluorescence in situ hybridization (FISH) probe for telomeric sequences revealed fluorescence on all telomeres in all species, except in Carollia perspicillata. Non-telomeric sites in the pericentromeric region of the chromosomes were observed in most species, ranged from one to 12 pairs. Most interstitial telomeric sequences were coincident with heterochromatic regions. The results obtained in the present work indicate that different evolutionary mechanisms are acting in Phyllostomidae genome architecture, as well as the occurrence of Robertsonian fusion during the chromosomal evolution of bats without a loss of telomeric sequences. These data contribute to understanding the organization of multigene families and telomeric sequences on bat genome as well as the chromosomal evolutionary history of Phyllostomidae bats.

  14. Discovering approximate-associated sequence patterns for protein-DNA interactions

    KAUST Repository

    Chan, Tak Ming

    2010-12-30

    Motivation: The bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) are fundamental protein-DNA interactions in transcriptional regulation. Extensive efforts have been made to better understand the protein-DNA interactions. Recent mining on exact TF-TFBS-associated sequence patterns (rules) has shown great potentials and achieved very promising results. However, exact rules cannot handle variations in real data, resulting in limited informative rules. In this article, we generalize the exact rules to approximate ones for both TFs and TFBSs, which are essential for biological variations. Results: A progressive approach is proposed to address the approximation to alleviate the computational requirements. Firstly, similar TFBSs are grouped from the available TF-TFBS data (TRANSFAC database). Secondly, approximate and highly conserved binding cores are discovered from TF sequences corresponding to each TFBS group. A customized algorithm is developed for the specific objective. We discover the approximate TF-TFBS rules by associating the grouped TFBS consensuses and TF cores. The rules discovered are evaluated by matching (verifying with) the actual protein-DNA binding pairs from Protein Data Bank (PDB) 3D structures. The approximate results exhibit many more verified rules and up to 300% better verification ratios than the exact ones. The customized algorithm achieves over 73% better verification ratios than traditional methods. Approximate rules (64-79%) are shown statistically significant. Detailed variation analysis and conservation verification on NCBI records demonstrate that the approximate rules reveal both the flexible and specific protein-DNA interactions accurately. The approximate TF-TFBS rules discovered show great generalized capability of exploring more informative binding rules. © The Author 2010. Published by Oxford University Press. All rights reserved.

  15. Ultra-deep sequencing of mouse mitochondrial DNA: mutational patterns and their origins.

    Directory of Open Access Journals (Sweden)

    Adam Ameur

    2011-03-01

    Full Text Available Somatic mutations of mtDNA are implicated in the aging process, but there is no universally accepted method for their accurate quantification. We have used ultra-deep sequencing to study genome-wide mtDNA mutation load in the liver of normally- and prematurely-aging mice. Mice that are homozygous for an allele expressing a proof-reading-deficient mtDNA polymerase (mtDNA mutator mice have 10-times-higher point mutation loads than their wildtype siblings. In addition, the mtDNA mutator mice have increased levels of a truncated linear mtDNA molecule, resulting in decreased sequence coverage in the deleted region. In contrast, circular mtDNA molecules with large deletions occur at extremely low frequencies in mtDNA mutator mice and can therefore not drive the premature aging phenotype. Sequence analysis shows that the main proportion of the mutation load in heterozygous mtDNA mutator mice and their wildtype siblings is inherited from their heterozygous mothers consistent with germline transmission. We found no increase in levels of point mutations or deletions in wildtype C57Bl/6N mice with increasing age, thus questioning the causative role of these changes in aging. In addition, there was no increased frequency of transversion mutations with time in any of the studied genotypes, arguing against oxidative damage as a major cause of mtDNA mutations. Our results from studies of mice thus indicate that most somatic mtDNA mutations occur as replication errors during development and do not result from damage accumulation in adult life.

  16. Reduced representation bisulphite sequencing of the ten bovine somatic tissues reveals DNA methylation patterns

    Science.gov (United States)

    As a major component epigenetics, DNA methylation has been proved that widely functions in individual development and various diseases. It has been well studied in model organisms and human but includes limited data for the economic animals. Using reduced representation bisulphite sequencing (RRBS),...

  17. Foundations for a syntatic pattern recognition system for genomic DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    Searles, D.B.

    1993-03-01

    The goal of the proposed work is the creation of a software system that will perform sophisticated pattern recognition and related functions at a level of abstraction and with expressive power beyond current general-purpose pattern-matching systems for biological sequences; and with a more uniform language, environment, and graphical user interface, and with greater flexibility, extensibility, embeddability, and ability to incorporate other algorithms, than current special-purpose analytic software.

  18. Exploiting mid-range DNA patterns for sequence classification: binary abstraction Markov models

    Science.gov (United States)

    Shepard, Samuel S.; McSweeny, Andrew; Serpen, Gursel; Fedorov, Alexei

    2012-01-01

    Messenger RNA sequences possess specific nucleotide patterns distinguishing them from non-coding genomic sequences. In this study, we explore the utilization of modified Markov models to analyze sequences up to 44 bp, far beyond the 8-bp limit of conventional Markov models, for exon/intron discrimination. In order to analyze nucleotide sequences of this length, their information content is first reduced by conversion into shorter binary patterns via the application of numerous abstraction schemes. After the conversion of genomic sequences to binary strings, homogenous Markov models trained on the binary sequences are used to discriminate between exons and introns. We term this approach the Binary Abstraction Markov Model (BAMM). High-quality abstraction schemes for exon/intron discrimination are selected using optimization algorithms on supercomputers. The best MM classifiers are then combined using support vector machines into a single classifier. With this approach, over 95% classification accuracy is achieved without taking reading frame into account. With further development, the BAMM approach can be applied to sequences lacking the genetic code such as ncRNAs and 5′-untranslated regions. PMID:22344692

  19. DNA sequences encoding erythropoietin

    Energy Technology Data Exchange (ETDEWEB)

    Lin, F.K.

    1987-10-27

    A purified and isolated DNA sequence is described consisting essentially of a DNA sequence encoding a polypeptide having an amino acid sequence sufficiently duplicative of that of erythropoietin to allow possession of the biological property of causing bone marrow cells to increase production of reticulocytes and red blood cells, and to increase hemoglobin synthesis or iron uptake.

  20. DNA sequence, products, and transcriptional pattern of the genes involved in production of the DNA replication inhibitor microcin B17.

    Science.gov (United States)

    Genilloud, O; Moreno, F; Kolter, R

    1989-02-01

    The 3.8-kilobase segment of plasmid DNA that contains the genes required for production of the DNA replication inhibitor microcin B17 was sequenced. The sequence contains four open reading frames which were shown to be translated in vivo by the construction of fusions to lacZ. The location of these open reading frames fits well with the location of the four microcin B17 production genes, mcbABCD, identified previously through genetic complementation. The products of the four genes have been identified, and the observed molecular weights of the proteins agree with those predicted from the nucleotide sequence. The transcription of these genes was studied by using fusions to lacZ and physical mapping of mRNA start sites. Three promoters were identified in this region. The major promoter for all the genes is a growth phase-regulated OmpR-dependent promoter located upstream of mcbA. A second promoter is located within mcbC and is responsible for a low-level basal expression of mcbD. A third promoter, located within mcbD, promotes transcription in the reverse direction starting within mcbD and extending through mcbC. The resulting mRNA appears to be an untranslated antisense transcript that could play a regulatory role in the expression of these genes.

  1. DNA sequencing conference, 2

    Energy Technology Data Exchange (ETDEWEB)

    Cook-Deegan, R.M. [Georgetown Univ., Kennedy Inst. of Ethics, Washington, DC (United States); Venter, J.C. [National Inst. of Neurological Disorders and Strokes, Bethesda, MD (United States); Gilbert, W. [Harvard Univ., Cambridge, MA (United States); Mulligan, J. [Stanford Univ., CA (United States); Mansfield, B.K. [Oak Ridge National Lab., TN (United States)

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  2. 18S-rDNA SEQUENCING, ENZYME PATTERNS AND MORPHOLOGICAL CHARACTERIZATION OF TRICHOPHYTON ISOLATES

    Directory of Open Access Journals (Sweden)

    Nascimento Adriana Mendes do

    2001-01-01

    Full Text Available Dermatophytes, capable to use keratin of the host for nutrition, belong to one of the major groups of pathogenic fungi. Since dermatophytes are a closely related group they share various common features, and the morphology of isolates of a given species can be atypical, making species identification and differentiation even more difficult. Many methods have been explored in attempts to distinguish dermatophytes, but the combined use of different approaches for the investigation of the intraspecific and interspecific variability of Trichophyton continues to be scarce. Some studies have shown that amplified fragments of the small ribosomal DNA subunit 18S contains variable regions which can be used to discriminate between medically relevant yeast species, indicating that these regions could also be used for differentiation between dermatophytes. In our study, sequence analysis of the 18S-rDNA gene was combined with morphological and biochemical criteria in order to detect genetic differences between seven Trichophyton isolates and estimate their phylogenetic relationships. The results show that the isolates investigated belong to the Trichophyton group, which potentially contains the Trichophyton rubrum cluster.

  3. A comparison of intraspecific patterns of DNA sequence variation in mitochondrial DNA, alpha-enolase, and MHC class II B loci in auklets (Charadriiformes: Alcidae).

    Science.gov (United States)

    Walsh, Hollie E; Friesen, Vicki L

    2003-12-01

    Patterns of DNA sequence variation can be used to learn about mechanisms of organismal evolution, but only if mechanisms of sequence evolution are well understood. Although theories of molecular evolution are well developed, few empirical studies have addressed patterns and mechanisms of sequence evolution in nuclear genes within species. In the present study, we compared DNA sequences among three loci with different evolutionary constraints to determine the influences of effective population size, balancing selection, and linkage on intraspecific patterns of sequence variation. Specifically, we assessed the degree and nature of polymorphism in a 307-base pair (bp) fragment of the mitochondrial cytochrome b gene, intron VIII of the gene for alpha-enolase (a presumably neutral nuclear gene), and an approximately 600-bp fragment of an MHC class II B gene, including 155 bp of the hypervariable peptide binding region (a nuclear locus thought to be under balancing selection) for least and crested auklets (Aethia pusilla and A. cristatella; Charadriiformes: Alcidae). Transspecies polymorphism was found in both alpha-enolase and the MHC but not cytochrome b and, given estimates of effective population size, probably represents retained ancestral variation. Biases in nucleotide composition suggested that mutational bias, tRNA availability, and the secondary structure of mRNA and/or DNA may influence base usage. Several lines of evidence indicated that balancing selection may be acting on the MHC II B exon 2. However, no evidence of balancing selection was observed in the intron and exon sequences immediately downstream of MHC II B exon 2.

  4. Automated DNA Sequencing System

    Energy Technology Data Exchange (ETDEWEB)

    Armstrong, G.A.; Ekkebus, C.P.; Hauser, L.J.; Kress, R.L.; Mural, R.J.

    1999-04-25

    Oak Ridge National Laboratory (ORNL) is developing a core DNA sequencing facility to support biological research endeavors at ORNL and to conduct basic sequencing automation research. This facility is novel because its development is based on existing standard biology laboratory equipment; thus, the development process is of interest to the many small laboratories trying to use automation to control costs and increase throughput. Before automation, biology Laboratory personnel purified DNA, completed cycle sequencing, and prepared 96-well sample plates with commercially available hardware designed specifically for each step in the process. Following purification and thermal cycling, an automated sequencing machine was used for the sequencing. A technician handled all movement of the 96-well sample plates between machines. To automate the process, ORNL is adding a CRS Robotics A- 465 arm, ABI 377 sequencing machine, automated centrifuge, automated refrigerator, and possibly an automated SpeedVac. The entire system will be integrated with one central controller that will direct each machine and the robot. The goal of this system is to completely automate the sequencing procedure from bacterial cell samples through ready-to-be-sequenced DNA and ultimately to completed sequence. The system will be flexible and will accommodate different chemistries than existing automated sequencing lines. The system will be expanded in the future to include colony picking and/or actual sequencing. This discrete event, DNA sequencing system will demonstrate that smaller sequencing labs can achieve cost-effective the laboratory grow.

  5. Multilocus DNA sequencing of the whiskey fungus reveals a continental-scale speciation pattern.

    Science.gov (United States)

    Scott, J A; Ewaze, J O; Summerbell, R C; Arocha-Rosete, Y; Maharaj, A; Guardiola, Y; Saleh, M; Wong, B; Bogale, M; O'Hara, M J; Untereiner, W A

    2017-01-01

    Baudoinia was described to accommodate a single species, B. compniacensis. Known as the 'whiskey fungus', this species is the predominant member of a ubiquitous microbial community known colloquially as 'warehouse staining' that develops on outdoor surfaces subject to periodic exposure to ethanolic vapours near distilleries and bakeries. Here we examine 19 strains recovered from environmental samples near industrial settings in North America, South America, the Caribbean, Europe and the Far East. Molecular phylogenetic analysis of a portion of the nucLSU rRNA gene confirms that Baudoinia is a monophyletic lineage within the Teratosphaeriaceae (Capnodiales). Multilocus phylogenetic analysis of nucITS rRNA (ITS1-5.8S-ITS2) and partial nucLSU rRNA, beta-tubulin (TUB) and elongation factor 1-alpha (TEF1) gene sequences further indicates that Baudoinia consists of five strongly supported, geographically patterned lineages representing four new species (viz. Baudoinia antilliensis, B. caledoniensis, B. orientalis and B. panamericana).

  6. Evolution of DNA sequencing

    National Research Council Canada - National Science Library

    Tipu, Hamid Nawaz; Shabbir, Ambreen

    2015-01-01

    Sanger and coworkers introduced DNA sequencing in 1970s for the first time. It principally relied on termination of growing nucleotide chain when a dideoxythymidine triphosphate (ddTTP) was inserted...

  7. Gomphid DNA sequence data

    Data.gov (United States)

    U.S. Environmental Protection Agency — DNA sequence data for several genetic loci. This dataset is not publicly accessible because: It's already publicly available on GenBank. It can be accessed through...

  8. Patterning nanocrystals using DNA

    Energy Technology Data Exchange (ETDEWEB)

    Williams, Shara Carol

    2003-09-01

    One of the goals of nanotechnology is to enable programmed self-assembly of patterns made of various materials with nanometer-sized control. This dissertation describes the results of experiments templating arrangements of gold and semiconductor nanocrystals using 2'-deoxyribonucleic acid (DNA). Previously, simple DNA-templated linear arrangements of two and three nanocrystals structures have been made.[1] Here, we have sought to assemble larger and more complex nanostructures. Gold-DNA conjugates with 50 to 100 bases self-assembled into planned arrangements using strands of DNA containing complementary base sequences. We used two methods to increase the complexity of the arrangements: using branched synthetic doublers within the DNA covalent backbone to create discrete nanocrystal groupings, and incorporating the nanocrystals into a previously developed DNA lattice structure [2][3] that self-assembles from tiles made of DNA double-crossover molecules to create ordered nanoparticle arrays. In the first project, the introduction of a covalently-branched synthetic doubler reagent into the backbone of DNA strands created a branched DNA ''trimer.'' This DNA trimer templated various structures that contained groupings of three and four gold nanoparticles, giving promising, but inconclusive transmission electron microscopy (TEM) results. Due to the presence of a variety of possible structures in the reaction mixtures, and due to the difficulty of isolating the desired structures, the TEM and gel electrophoresis results for larger structures having four particles, and for structures containing both 5 and 10 nm gold nanoparticles were inconclusive. Better results may come from using optical detection methods, or from improved sample preparation. In the second project, we worked toward making two-dimensional ordered arrays of nanocrystals. We replicated and improved upon previous results for making DNA lattices, increasing the size of the lattices

  9. DNA sequencing by CE.

    Science.gov (United States)

    Karger, Barry L; Guttman, András

    2009-06-01

    Sequencing of human and other genomes has been at the center of interest in the biomedical field over the past several decades and is now leading toward an era of personalized medicine. During this time, DNA-sequencing methods have evolved from the labor-intensive slab gel electrophoresis, through automated multiCE systems using fluorophore labeling with multispectral imaging, to the "next-generation" technologies of cyclic-array, hybridization based, nanopore and single molecule sequencing. Deciphering the genetic blueprint and follow-up confirmatory sequencing of Homo sapiens and other genomes were only possible with the advent of modern sequencing technologies that were a result of step-by-step advances with a contribution of academics, medical personnel and instrument companies. While next-generation sequencing is moving ahead at breakneck speed, the multicapillary electrophoretic systems played an essential role in the sequencing of the Human Genome, the foundation of the field of genomics. In this prospective, we wish to overview the role of CE in DNA sequencing based in part of several of our articles in this journal.

  10. Inferring multiple refugia and phylogeographical patterns in Pinus massoniana based on nucleotide sequence variation and DNA fingerprinting.

    Directory of Open Access Journals (Sweden)

    Xue-Jun Ge

    Full Text Available BACKGROUND: Pinus massoniana, an ecologically and economically important conifer, is widespread across central and southern mainland China and Taiwan. In this study, we tested the central-marginal paradigm that predicts that the marginal populations tend to be less polymorphic than the central ones in their genetic composition, and examined a founders' effect in the island population. METHODOLOGY/PRINCIPAL FINDINGS: We examined the phylogeography and population structuring of the P. massoniana based on nucleotide sequences of cpDNA atpB-rbcL intergenic spacer, intron regions of the AdhC2 locus, and microsatellite fingerprints. SAMOVA analysis of nucleotide sequences indicated that most genetic variants resided among geographical regions. High levels of genetic diversity in the marginal populations in the south region, a pattern seemingly contradicting the central-marginal paradigm, and the fixation of private haplotypes in most populations indicate that multiple refugia may have existed over the glacial maxima. STRUCTURE analyses on microsatellites revealed that genetic structure of mainland populations was mediated with recent genetic exchanges mostly via pollen flow, and that the genetic composition in east region was intermixed between south and west regions, a pattern likely shaped by gene introgression and maintenance of ancestral polymorphisms. As expected, the small island population in Taiwan was genetically differentiated from mainland populations. CONCLUSIONS/SIGNIFICANCE: The marginal populations in south region possessed divergent gene pools, suggesting that the past glaciations might have low impacts on these populations at low latitudes. Estimates of ancestral population sizes interestingly reflect a recent expansion in mainland from a rather smaller population, a pattern that seemingly agrees with the pollen record.

  11. Human cellular protein patterns and their link to genome DNA sequence data: usefulness of two-dimensional gel electrophoresis and microsequencing

    DEFF Research Database (Denmark)

    Celis, J E; Rasmussen, H H; Leffers, H;

    1991-01-01

    Analysis of cellular protein patterns by computer-aided 2-dimensional gel electrophoresis together with recent advances in protein sequence analysis have made possible the establishment of comprehensive 2-dimensional gel protein databases that may link protein and DNA information and that offer a...

  12. Information Theory of DNA Sequencing

    CERN Document Server

    Motahari, Abolfazl; Tse, David

    2012-01-01

    DNA sequencing is the basic workhorse of modern day biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original sequence. By drawing an analogy between the DNA sequencing problem and the classic communication problem, we define an information theoretic notion of sequencing capacity. This is the maximum number of DNA base pairs that can be resolved reliably per read, and provides a fundamental limit to the performance that can be achieved by any assembly algorithm. We compute the sequencing capacity explicitly for a simple statistical model of the DNA sequence and the read process. Using this framework, we also study the impact of noise in the read process on the sequencing capacity.

  13. Foundations for a syntatic pattern recognition system for genomic DNA sequences. [Annual] report, 1 December 1991--31 March 1993

    Energy Technology Data Exchange (ETDEWEB)

    Searles, D.B.

    1993-03-01

    The goal of the proposed work is the creation of a software system that will perform sophisticated pattern recognition and related functions at a level of abstraction and with expressive power beyond current general-purpose pattern-matching systems for biological sequences; and with a more uniform language, environment, and graphical user interface, and with greater flexibility, extensibility, embeddability, and ability to incorporate other algorithms, than current special-purpose analytic software.

  14. Human cellular protein patterns and their link to genome DNA sequence data: usefulness of two-dimensional gel electrophoresis and microsequencing

    DEFF Research Database (Denmark)

    Celis, J E; Rasmussen, H H; Leffers, H

    1991-01-01

    Analysis of cellular protein patterns by computer-aided 2-dimensional gel electrophoresis together with recent advances in protein sequence analysis have made possible the establishment of comprehensive 2-dimensional gel protein databases that may link protein and DNA information and that offer...... a global approach to the study of the cell. Using the integrated approach offered by 2-dimensional gel protein databases it is now possible to reveal phenotype specific protein (or proteins), to microsequence them, to search for homology with previously identified proteins, to clone the cDNAs, to assign...... partial protein sequence to genes for which the full DNA sequence and the chromosome location is known, and to study the regulatory properties and function of groups of proteins that are coordinately expressed in a given biological process. Human 2-dimensional gel protein databases are becoming...

  15. Biosensors for DNA sequence detection

    Science.gov (United States)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  16. Graphene nanodevices for DNA sequencing

    Science.gov (United States)

    Heerema, Stephanie J.; Dekker, Cees

    2016-02-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with nanopores. Owing to its unique structure and properties, graphene provides interesting opportunities for the development of a new sequencing technology. In recent years, a wide range of creative ideas for graphene sequencers have been theoretically proposed and the first experimental demonstrations have begun to appear. Here, we review the different approaches to using graphene nanodevices for DNA sequencing, which involve DNA passing through graphene nanopores, nanogaps, and nanoribbons, and the physisorption of DNA on graphene nanostructures. We discuss the advantages and problems of each of these key techniques, and provide a perspective on the use of graphene in future DNA sequencing technology.

  17. Finding Sequential Patterns from Large Sequence Data

    CERN Document Server

    Esmaeili, Mahdi

    2010-01-01

    Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Sequential pattern mining finds sets of data items that occur together frequently in some sequences. Sequential pattern mining, which extracts frequent subsequences from a sequence database, has attracted a great deal of interest during the recent data mining research because it is the basis of many applications, such as: web user analysis, stock trend prediction, DNA sequence analysis, finding language or linguistic patterns from natural language texts, and using the history of symptoms to predict certain kind of disease. The diversity of the applications may not be possible to apply a single sequential pattern model to all these problems. Each application may require a unique model and solution. A number of research projects were established in recent years to develop meaningful sequential pattern...

  18. Chromosome-wide mapping of DNA methylation patterns in normal and malignant prostate cells reveals pervasive methylation of gene-associated and conserved intergenic sequences

    Science.gov (United States)

    2011-01-01

    Background DNA methylation has been linked to genome regulation and dysregulation in health and disease respectively, and methods for characterizing genomic DNA methylation patterns are rapidly emerging. We have developed/refined methods for enrichment of methylated genomic fragments using the methyl-binding domain of the human MBD2 protein (MBD2-MBD) followed by analysis with high-density tiling microarrays. This MBD-chip approach was used to characterize DNA methylation patterns across all non-repetitive sequences of human chromosomes 21 and 22 at high-resolution in normal and malignant prostate cells. Results Examining this data using computational methods that were designed specifically for DNA methylation tiling array data revealed widespread methylation of both gene promoter and non-promoter regions in cancer and normal cells. In addition to identifying several novel cancer hypermethylated 5' gene upstream regions that mediated epigenetic gene silencing, we also found several hypermethylated 3' gene downstream, intragenic and intergenic regions. The hypermethylated intragenic regions were highly enriched for overlap with intron-exon boundaries, suggesting a possible role in regulation of alternative transcriptional start sites, exon usage and/or splicing. The hypermethylated intergenic regions showed significant enrichment for conservation across vertebrate species. A sampling of these newly identified promoter (ADAMTS1 and SCARF2 genes) and non-promoter (downstream or within DSCR9, C21orf57 and HLCS genes) hypermethylated regions were effective in distinguishing malignant from normal prostate tissues and/or cell lines. Conclusions Comparison of chromosome-wide DNA methylation patterns in normal and malignant prostate cells revealed significant methylation of gene-proximal and conserved intergenic sequences. Such analyses can be easily extended for genome-wide methylation analysis in health and disease. PMID:21669002

  19. DNA pattern recognition using canonical correlation algorithm

    Indian Academy of Sciences (India)

    B K Sarkar; Chiranjib Chakraborty

    2015-10-01

    We performed canonical correlation analysis as an unsupervised statistical tool to describe related views of the same semantic object for identifying patterns. A pattern recognition technique based on canonical correlation analysis (CCA) was proposed for finding required genetic code in the DNA sequence. Two related but different objects were considered: one was a particular pattern, and other was test DNA sequence. CCA found correlations between two observations of the same semantic pattern and test sequence. It is concluded that the relationship possesses maximum value in the position where the pattern exists. As a case study, the potential of CCA was demonstrated on the sequence found from HIV-1 preferred integration sites. The subsequences on the left and right flanking from the integration site were considered as the two views, and statistically significant relationships were established between these two views to elucidate the viral preference as an important factor for the correlation.

  20. Duplication in DNA Sequences

    Science.gov (United States)

    Ito, Masami; Kari, Lila; Kincaid, Zachary; Seki, Shinnosuke

    The duplication and repeat-deletion operations are the basis of a formal language theoretic model of errors that can occur during DNA replication. During DNA replication, subsequences of a strand of DNA may be copied several times (resulting in duplications) or skipped (resulting in repeat-deletions). As formal language operations, iterated duplication and repeat-deletion of words and languages have been well studied in the literature. However, little is known about single-step duplications and repeat-deletions. In this paper, we investigate several properties of these operations, including closure properties of language families in the Chomsky hierarchy and equations involving these operations. We also make progress toward a characterization of regular languages that are generated by duplicating a regular language.

  1. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences

    Directory of Open Access Journals (Sweden)

    Bauer Margarete

    2004-10-01

    Full Text Available Abstract Background In the emerging field of environmental genomics, direct cloning and sequencing of genomic fragments from complex microbial communities has proven to be a valuable source of new enzymes, expanding the knowledge of basic biological processes. The central problem of this so called metagenome-approach is that the cloned fragments often lack suitable phylogenetic marker genes, rendering the identification of clones that are likely to originate from the same genome difficult or impossible. In such cases, the analysis of intrinsic DNA-signatures like tetranucleotide frequencies can provide valuable hints on fragment affiliation. With this application in mind, the TETRA web-service and the TETRA stand-alone program have been developed, both of which automate the task of comparative tetranucleotide frequency analysis. Availability: http://www.megx.net/tetra Results TETRA provides a statistical analysis of tetranucleotide usage patterns in genomic fragments, either via a web-service or a stand-alone program. With respect to discriminatory power, such an analysis outperforms the assignment of genomic fragments based on the (G+C-content, which is a widely-used sequence-based measure for assessing fragment relatedness. While the web-service is restricted to the calculation of correlation coefficients between tetranucleotide usage patterns of submitted DNA sequences, the stand-alone program generates a much more detailed output, comprising all raw data and graphical plots. The stand-alone program is controlled via a graphical user interface and can batch-process a multitude of sequences. Furthermore, it comes with pre-computed tetranucleotide usage patterns for 166 prokaryote chromosomes, providing a useful reference dataset and source for data-mining. Conclusions Up to now, the analysis of skewed oligonucleotide distributions within DNA sequences is not a commonly used tool within metagenomics. With the TETRA web-service and stand

  2. DNA Sequencing Sensors: An Overview

    Directory of Open Access Journals (Sweden)

    Jose Antonio Garrido-Cardenas

    2017-03-01

    Full Text Available The first sequencing of a complete genome was published forty years ago by the double Nobel Prize in Chemistry winner Frederick Sanger. That corresponded to the small sized genome of a bacteriophage, but since then there have been many complex organisms whose DNA have been sequenced. This was possible thanks to continuous advances in the fields of biochemistry and molecular genetics, but also in other areas such as nanotechnology and computing. Nowadays, sequencing sensors based on genetic material have little to do with those used by Sanger. The emergence of mass sequencing sensors, or new generation sequencing (NGS meant a quantitative leap both in the volume of genetic material that was able to be sequenced in each trial, as well as in the time per run and its cost. One can envisage that incoming technologies, already known as fourth generation sequencing, will continue to cheapen the trials by increasing DNA reading lengths in each run. All of this would be impossible without sensors and detection systems becoming smaller and more precise. This article provides a comprehensive overview on sensors for DNA sequencing developed within the last 40 years.

  3. Nucleosome DNA sequence structure of isochores

    Directory of Open Access Journals (Sweden)

    Trifonov Edward N

    2011-04-01

    Full Text Available Abstract Background Significant differences in G+C content between different isochore types suggest that the nucleosome positioning patterns in DNA of the isochores should be different as well. Results Extraction of the patterns from the isochore DNA sequences by Shannon N-gram extension reveals that while the general motif YRRRRRYYYYYR is characteristic for all isochore types, the dominant positioning patterns of the isochores vary between TAAAAATTTTTA and CGGGGGCCCCCG due to the large differences in G+C composition. This is observed in human, mouse and chicken isochores, demonstrating that the variations of the positioning patterns are largely G+C dependent rather than species-specific. The species-specificity of nucleosome positioning patterns is revealed by dinucleotide periodicity analyses in isochore sequences. While human sequences are showing CG periodicity, chicken isochores display AG (CT periodicity. Mouse isochores show very weak CG periodicity only. Conclusions Nucleosome positioning pattern as revealed by Shannon N-gram extension is strongly dependent on G+C content and different in different isochores. Species-specificity of the pattern is subtle. It is reflected in the choice of preferentially periodical dinucleotides.

  4. Structural Complexity of DNA Sequence

    Directory of Open Access Journals (Sweden)

    Cheng-Yuan Liou

    2013-01-01

    Full Text Available In modern bioinformatics, finding an efficient way to allocate sequence fragments with biological functions is an important issue. This paper presents a structural approach based on context-free grammars extracted from original DNA or protein sequences. This approach is radically different from all those statistical methods. Furthermore, this approach is compared with a topological entropy-based method for consistency and difference of the complexity results.

  5. Assessment of adaptive evolution between wheat and rice as deduced from full-length common wheat cDNA sequence data and expression patterns

    Directory of Open Access Journals (Sweden)

    Hayashizaki Yoshihide

    2009-06-01

    Full Text Available Abstract Background Wheat is an allopolyploid plant that harbors a huge, complex genome. Therefore, accumulation of expressed sequence tags (ESTs for wheat is becoming particularly important for functional genomics and molecular breeding. We prepared a comprehensive collection of ESTs from the various tissues that develop during the wheat life cycle and from tissues subjected to stress. We also examined their expression profiles in silico. As full-length cDNAs are indispensable to certify the collected ESTs and annotate the genes in the wheat genome, we performed a systematic survey and sequencing of the full-length cDNA clones. This sequence information is a valuable genetic resource for functional genomics and will enable carrying out comparative genomics in cereals. Results As part of the functional genomics and development of genomic wheat resources, we have generated a collection of full-length cDNAs from common wheat. By grouping the ESTs of recombinant clones randomly selected from the full-length cDNA library, we were able to sequence 6,162 independent clones with high accuracy. About 10% of the clones were wheat-unique genes, without any counterparts within the DNA database. Wheat clones that showed high homology to those of rice were selected in order to investigate their expression patterns in various tissues throughout the wheat life cycle and in response to abiotic-stress treatments. To assess the variability of genes that have evolved differently in wheat and rice, we calculated the substitution rate (Ka/Ks of the counterparts in wheat and rice. Genes that were preferentially expressed in certain tissues or treatments had higher Ka/Ks values than those in other tissues and treatments, which suggests that the genes with the higher variability expressed in these tissues is under adaptive selection. Conclusion We have generated a high-quality full-length cDNA resource for common wheat, which is essential for continuation of the

  6. Finding Sequential Patterns from Large Sequence Data

    Directory of Open Access Journals (Sweden)

    Fazekas Gabor

    2010-01-01

    Full Text Available Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Sequential pattern mining finds sets of data items that occur together frequently in some sequences. Sequential pattern mining, which extracts frequent subsequences from a sequence database, has attracted a great deal of interest during the recent data mining research because it is the basis of many applications, such as: web user analysis, stock trend prediction, DNA sequence analysis, finding language or linguistic patterns from natural language texts, and using the history of symptoms to predict certain kind of disease. The diversity of the applications may not be possible to apply a single sequential pattern model to all these problems. Each application may require a unique model and solution. A number of research projects were established in recent years to develop meaningful sequential pattern models and efficient algorithms for mining these patterns. In this paper, we theoretically provided a brief overview three types of sequential patterns model.

  7. Fractals in DNA sequence analysis

    Institute of Scientific and Technical Information of China (English)

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  8. DNA Sequencing Using capillary Electrophoresis

    Energy Technology Data Exchange (ETDEWEB)

    Dr. Barry Karger

    2011-05-09

    The overall goal of this program was to develop capillary electrophoresis as the tool to be used to sequence for the first time the Human Genome. Our program was part of the Human Genome Project. In this work, we were highly successful and the replaceable polymer we developed, linear polyacrylamide, was used by the DOE sequencing lab in California to sequence a significant portion of the human genome using the MegaBase multiple capillary array electrophoresis instrument. In this final report, we summarize our efforts and success. We began our work by separating by capillary electrophoresis double strand oligonucleotides using cross-linked polyacrylamide gels in fused silica capillaries. This work showed the potential of the methodology. However, preparation of such cross-linked gel capillaries was difficult with poor reproducibility, and even more important, the columns were not very stable. We improved stability by using non-cross linked linear polyacrylamide. Here, the entangled linear chains could move when osmotic pressure (e.g. sample injection) was imposed on the polymer matrix. This relaxation of the polymer dissipated the stress in the column. Our next advance was to use significantly lower concentrations of the linear polyacrylamide that the polymer could be automatically blown out after each run and replaced with fresh linear polymer solution. In this way, a new column was available for each analytical run. Finally, while testing many linear polymers, we selected linear polyacrylamide as the best matrix as it was the most hydrophilic polymer available. Under our DOE program, we demonstrated initially the success of the linear polyacrylamide to separate double strand DNA. We note that the method is used even today to assay purity of double stranded DNA fragments. Our focus, of course, was on the separation of single stranded DNA for sequencing purposes. In one paper, we demonstrated the success of our approach in sequencing up to 500 bases. Other

  9. 2-Jump DNA Search Multiple Pattern Matching Algorithm

    OpenAIRE

    Raju Bhukya; D. V. L. N. Somayajulu

    2011-01-01

    Pattern matching in a DNA sequence or searching a pattern from a large data base is a major research area in computational biology. To extract pattern match from a large sequence it takes more time, in order to reduce searching time we have proposed an approach that reduces the search time with accurate retrieval of the matched pattern in the sequence. As performance plays a major role in extracting patterns from a given DNA sequence or from a database independent of the size of the sequence....

  10. Markov chain for estimating human mitochondrial DNA mutation pattern

    Science.gov (United States)

    Vantika, Sandy; Pasaribu, Udjianna S.

    2015-12-01

    The Markov chain was proposed to estimate the human mitochondrial DNA mutation pattern. One DNA sequence was taken randomly from 100 sequences in Genbank. The nucleotide transition matrix and mutation transition matrix were estimated from this sequence. We determined whether the states (mutation/normal) are recurrent or transient. The results showed that both of them are recurrent.

  11. DNA Sequencing Using capillary Electrophoresis

    Energy Technology Data Exchange (ETDEWEB)

    Dr. Barry Karger

    2011-05-09

    The overall goal of this program was to develop capillary electrophoresis as the tool to be used to sequence for the first time the Human Genome. Our program was part of the Human Genome Project. In this work, we were highly successful and the replaceable polymer we developed, linear polyacrylamide, was used by the DOE sequencing lab in California to sequence a significant portion of the human genome using the MegaBase multiple capillary array electrophoresis instrument. In this final report, we summarize our efforts and success. We began our work by separating by capillary electrophoresis double strand oligonucleotides using cross-linked polyacrylamide gels in fused silica capillaries. This work showed the potential of the methodology. However, preparation of such cross-linked gel capillaries was difficult with poor reproducibility, and even more important, the columns were not very stable. We improved stability by using non-cross linked linear polyacrylamide. Here, the entangled linear chains could move when osmotic pressure (e.g. sample injection) was imposed on the polymer matrix. This relaxation of the polymer dissipated the stress in the column. Our next advance was to use significantly lower concentrations of the linear polyacrylamide that the polymer could be automatically blown out after each run and replaced with fresh linear polymer solution. In this way, a new column was available for each analytical run. Finally, while testing many linear polymers, we selected linear polyacrylamide as the best matrix as it was the most hydrophilic polymer available. Under our DOE program, we demonstrated initially the success of the linear polyacrylamide to separate double strand DNA. We note that the method is used even today to assay purity of double stranded DNA fragments. Our focus, of course, was on the separation of single stranded DNA for sequencing purposes. In one paper, we demonstrated the success of our approach in sequencing up to 500 bases. Other

  12. New stopping criteria for segmenting DNA sequences

    CERN Document Server

    Li, W

    2001-01-01

    We propose a solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns. This new stopping criterion is based on Bayesian Information Criterion (BIC) in the model selection framework. When this stopping criterion is applied to a left telomere sequence of yeast Saccharomyces cerevisiae and the complete genome sequence of bacterium Escherichia coli, borders of biologically meaningful units were identified (e.g. subtelomeric units, replication origin, and replication terminus), and a more reasonable number of domains was obtained. We also introduce a measure called segmentation strength which can be used to control the delineation of large domains. The relationship between the average domain size and the threshold of segmentation strength is determined for several genome sequences.

  13. Sequence Patterns of Identity Authentication Protocols

    Institute of Scientific and Technical Information of China (English)

    Tao Hongcai; He Dake

    2006-01-01

    From the viewpoint of protocol sequence, analyses are made of the sequence patterns of possible identity authentication protocol under two cases: with or without the trusted third party (TTP). Ten feasible sequence patterns of authentication protocol with TTP and 5 sequence patterns without TTP are gained. These gained sequence patterns meet the requirements for identity authentication,and basically cover almost all the authentication protocols with TTP and without TTP at present. All of the sequence patterns gained are classified into unilateral or bilateral authentication. Then , according to the sequence symmetry, several good sequence patterns with TTP are evaluated. The accompolished results can provide a reference to design of new identity authentication protocols.

  14. Information Analysis of DNA Sequences

    CERN Document Server

    Mohammed, Riyazuddin

    2010-01-01

    The problem of differentiating the informational content of coding (exons) and non-coding (introns) regions of a DNA sequence is one of the central problems of genomics. The introns are estimated to be nearly 95% of the DNA and since they do not seem to participate in the process of transcription of amino-acids, they have been termed "junk DNA." Although it is believed that the non-coding regions in genomes have no role in cell growth and evolution, demonstration that these regions carry useful information would tend to falsify this belief. In this paper, we consider entropy as a measure of information by modifying the entropy expression to take into account the varying length of these sequences. Exons are usually much shorter in length than introns; therefore the comparison of the entropy values needs to be normalized. A length correction strategy was employed using randomly generated nucleonic base strings built out of the alphabet of the same size as the exons under question. Our analysis shows that intron...

  15. Novel circular single-stranded DNA viruses identified in marine invertebrates reveal high sequence diversity and consistent predicted intrinsic disorder patterns within putative structural proteins

    Directory of Open Access Journals (Sweden)

    Karyna eRosario

    2015-07-01

    Full Text Available Viral metagenomics has recently revealed the ubiquitous and diverse nature of single-stranded DNA (ssDNA viruses that encode a conserved replication initiator protein (Rep in the marine environment. Although eukaryotic circular Rep-encoding ssDNA (CRESS-DNA viruses were originally thought to only infect plants and vertebrates, recent studies have identified these viruses in a number of invertebrates. To further explore CRESS-DNA viruses in the marine environment, this study surveyed CRESS-DNA viruses in various marine invertebrate species. A total of 27 novel CRESS-DNA genomes, with Reps that share less than 60.1% identity with previously reported viruses, were recovered from 21 invertebrate species, mainly crustaceans. Phylogenetic analysis based on the Rep revealed a novel clade of CRESS-DNA viruses that included approximately one third of the marine invertebrate associated viruses identified here and whose members may represent a novel family. Investigation of putative capsid proteins (Cap encoded within the eukaryotic CRESS-DNA viral genomes from this study and those in GenBank demonstrated conserved patterns of predicted intrinsically disordered regions (IDRs, which can be used to complement similarity-based searches to identify divergent structural proteins within novel genomes. Overall, this study expands our knowledge of CRESS-DNA viruses associated with invertebrates and explores a new tool to evaluate divergent structural proteins encoded by these viruses.

  16. New method to study DNA sequences: the languages of evolution.

    Science.gov (United States)

    Spinelli, Gino; Mayer-Foulkes, David

    2008-04-01

    Recently, several authors have reported statistical evidence for deterministic dynamics in the flux of genetic information, suggesting that evolution involves the emergence and maintenance of a fractal landscape in DNA chains. Here we examine the idea that motif repetition lies at the origin of these statistical properties of DNA. To analyse repetition patterns we apply a modification of the BDS statistic, devised to analyze complex economic dynamics and adapted here to DNA sequence analysis. This provides a new method to detect structured signals in genetic information. We compare naturally occurring DNA sequences along the evolutionary tree with randomly generated sequences and also with simulated sequences with repetition motifs. For easier understanding, we also define a new statistic for a DNA sequence that constitutes a specific fingerprint. The new methods are applied to exon and intron DNA sequences, finding specific statistical differences. Moreover, by analysing DNA sequences of different species from Bacteria to Man, we explore the evolution of these linguistic DNA features along the evolutionary tree. The results are consistent with the idea that all the flux of DNA information need not be random, but may be structured along the evolutionary tree. The implications for evolutionary theory are discussed.

  17. Periodic pattern detection in sparse boolean sequences

    Directory of Open Access Journals (Sweden)

    Hérisson Joan

    2010-09-01

    Full Text Available Abstract Background The specific position of functionally related genes along the DNA has been shown to reflect the interplay between chromosome structure and genetic regulation. By investigating the statistical properties of the distances separating such genes, several studies have highlighted various periodic trends. In many cases, however, groups built up from co-functional or co-regulated genes are small and contain wrong information (data contamination so that the statistics is poorly exploitable. In addition, gene positions are not expected to satisfy a perfectly ordered pattern along the DNA. Within this scope, we present an algorithm that aims to highlight periodic patterns in sparse boolean sequences, i.e. sequences of the type 010011011010... where the ratio of the number of 1's (denoting here the transcription start of a gene to 0's is small. Results The algorithm is particularly robust with respect to strong signal distortions such as the addition of 1's at arbitrary positions (contaminated data, the deletion of existing 1's in the sequence (missing data and the presence of disorder in the position of the 1's (noise. This robustness property stems from an appropriate exploitation of the remarkable alignment properties of periodic points in solenoidal coordinates. Conclusions The efficiency of the algorithm is demonstrated in situations where standard Fourier-based spectral methods are poorly adapted. We also show how the proposed framework allows to identify the 1's that participate in the periodic trends, i.e. how the framework allows to allocate a positional score to genes, in the same spirit of the sequence score. The software is available for public use at http://www.issb.genopole.fr/MEGA/Softwares/iSSB_SolenoidalApplication.zip.

  18. [DNA sequencing technology and automatization of it].

    Science.gov (United States)

    Kraev, A S

    1991-01-01

    Precise manipulations with genetic material, typical for modern experiments in molecular biology and in new biotechnology, require a capability to determine DNA base sequence. This capability enables today to exploit specific genetic knowledge for the dissection of complex cell processes and for modulation of cell metabolism in transgenic organisms. The review focuses on such DNA sequencing technologies that are widespread in general laboratory practice. They can safely be called, with the availability of commercial reagents, industrial techniques. Modern DNA sequencing requires recurrent breakdown of large genomic DNA into smaller pieces, that are then amplified, sequenced and the initial long stretch reconstructed via overlap of small pieces. The DNA sequencing process has several steps: a DNA fragment is obtained in sufficient quantity and purity, it is converted to a form suitable for a particular sequencing method, a sequencing reaction is performed and its products fractionated; and finally the resultant data are interpreted (i.e. an autoradiograph is read into a computer memory) and a long sequence in reconstructed via overlap of short stretches. These steps are considered in separate parts; an accent is made on sequencing strategies with respect to their biological task. In the last part, possibilities for automation of sequencing experiment are considered, followed by a discussion of domestic problems in DNA sequencing.

  19. Parallel gigantism and complex colonization patterns in the Cape Verde scincid lizards Mabuya and Macroscincus (Reptilia: Scincidae) revealed by mitochondrial DNA sequences.

    OpenAIRE

    2001-01-01

    The scincid lizards of the Cape Verde islands comprise the extinct endemic giant Macroscincus coctei and at least five species of Mabuya, one of which, Mabuya vaillanti, also had populations with large body size. Phylogenetic analysis based on DNA sequences derived from the mitochondrial cytochrome b, cytochrome oxidase I and 12S rRNA genes (711, 498 and 378 base pairs (bp), respectively) corroborates morphological evidence that these species constitute a clade and that Macroscincus is unrela...

  20. Fibonacci Sequence and Supramolecular Structure of DNA.

    Science.gov (United States)

    Shabalkin, I P; Grigor'eva, E Yu; Gudkova, M V; Shabalkin, P I

    2016-05-01

    We proposed a new model of supramolecular DNA structure. Similar to the previously developed by us model of primary DNA structure [11-15], 3D structure of DNA molecule is assembled in accordance to a mathematic rule known as Fibonacci sequence. Unlike primary DNA structure, supramolecular 3D structure is assembled from complex moieties including a regular tetrahedron and a regular octahedron consisting of monomers, elements of the primary DNA structure. The moieties of the supramolecular DNA structure forming fragments of regular spatial lattice are bound via linker (joint) sequences of the DNA chain. The lattice perceives and transmits information signals over a considerable distance without acoustic aberrations. Linker sequences expand conformational space between lattice segments allowing their sliding relative to each other under the action of external forces. In this case, sliding is provided by stretching of the stacked linker sequences.

  1. Enhancement of the nucleosomal pattern in sequences of lower complexity

    DEFF Research Database (Denmark)

    Bolshoy, Alexander; Shapiro, Kevin; Trifonov, Edward N.;

    1997-01-01

    in those of higher linguistic complexity. The nucleosome DNA positioning pattern is one of the weakest (highly degenerate) sequence patterns. It has been extracted recently by specially designed multiple alignment procedures. We applied the most sensitive of these procedures to nearly equal subsets...

  2. Mitochondrial DNA sequence evolution in shorebird populations.

    NARCIS (Netherlands)

    Wenink, P.W.

    1994-01-01

    This thesis describes the global molecular population structure of two shorebird species, in particular of the dunlin, Calidris alpina, by means of comparative sequence analysis of the most variable part of the mitochondrial DNA (mtDNA) genome. There are several reasons why mtDNA is the molecule of

  3. Code domains in tandem repetitive DNA sequence structures.

    Science.gov (United States)

    Vogt, P

    1992-10-01

    Traditionally, many people doing research in molecular biology attribute coding properties to a given DNA sequence if this sequence contains an open reading frame for translation into a sequence of amino acids. This protein coding capability of DNA was detected about 30 years ago. The underlying genetic code is highly conserved and present in every biological species studied so far. Today, it is obvious that DNA has a much larger coding potential for other important tasks. Apart from coding for specific RNA molecules such as rRNA, snRNA and tRNA molecules, specific structural and sequence patterns of the DNA chain itself express distinct codes for the regulation and expression of its genetic activity. A chromatin code has been defined for phasing of the histone-octamer protein complex in the nucleosome. A translation frame code has been shown to exist that determines correct triplet counting at the ribosome during protein synthesis. A loop code seems to organize the single stranded interaction of the nascent RNA chain with proteins during the splicing process, and a splicing code phases successive 5' and 3' splicing sites. Most of these DNA codes are not exclusively based on the primary DNA sequence itself, but also seem to include specific features of the corresponding higher order structures. Based on the view that these various DNA codes are genetically instructive for specific molecular interactions or processes, important in the nucleus during interphase and during cell division, the coding capability of tandem repetitive DNA sequences has recently been reconsidered.

  4. Long range correlations in DNA sequences

    CERN Document Server

    Mohanty, A K

    2002-01-01

    The so called long range correlation properties of DNA sequences are studied using the variance analyses of the density distribution of a single or a group of nucleotides in a model independent way. This new method which was suggested earlier has been applied to extract slope parameters that characterize the correlation properties for several intron containing and intron less DNA sequences. An important aspect of all the DNA sequences is the properties of complimentarity by virtue of which any two complimentary distributions (like GA is complimentary to TC or G is complimentary to ATC) have identical fluctuations at all scales although their distribution functions need not be identical. Due to this complimentarity, the famous DNA walk representation whose statistical interpretation is still unresolved is shown to be a special case of the present formalism with a density distribution corresponding to a purine or a pyrimidine group. Another interesting aspect of most of the DNA sequences is that the factorial m...

  5. Dynamics and Control of DNA Sequence Amplification

    CERN Document Server

    Marimuthu, Karthikeyan

    2014-01-01

    DNA amplification is the process of replication of a specified DNA sequence \\emph{in vitro} through time-dependent manipulation of its external environment. A theoretical framework for determination of the optimal dynamic operating conditions of DNA amplification reactions, for any specified amplification objective, is presented based on first-principles biophysical modeling and control theory. Amplification of DNA is formulated as a problem in control theory with optimal solutions that can differ considerably from strategies typically used in practice. Using the Polymerase Chain Reaction (PCR) as an example, sequence-dependent biophysical models for DNA amplification are cast as control systems, wherein the dynamics of the reaction are controlled by a manipulated input variable. Using these control systems, we demonstrate that there exists an optimal temperature cycling strategy for geometric amplification of any DNA sequence and formulate optimal control problems that can be used to derive the optimal tempe...

  6. DNA display I. Sequence-encoded routing of DNA populations.

    Directory of Open Access Journals (Sweden)

    David R Halpin

    2004-07-01

    Full Text Available Recently reported technologies for DNA-directed organic synthesis and for DNA computing rely on routing DNA populations through complex networks. The reduction of these ideas to practice has been limited by a lack of practical experimental tools. Here we describe a modular design for DNA routing genes, and routing machinery made from oligonucleotides and commercially available chromatography resins. The routing machinery partitions nanomole quantities of DNA into physically distinct subpools based on sequence. Partitioning steps can be iterated indefinitely, with worst-case yields of 85% per step. These techniques facilitate DNA-programmed chemical synthesis, and thus enable a materials biology that could revolutionize drug discovery.

  7. Visible periodicity of strong nucleosome DNA sequences.

    Science.gov (United States)

    Salih, Bilal; Tripathi, Vijay; Trifonov, Edward N

    2015-01-01

    Fifteen years ago, Lowary and Widom assembled nucleosomes on synthetic random sequence DNA molecules, selected the strongest nucleosomes and discovered that the TA dinucleotides in these strong nucleosome sequences often appear at 10-11 bases from one another or at distances which are multiples of this period. We repeated this experiment computationally, on large ensembles of natural genomic sequences, by selecting the strongest nucleosomes--i.e. those with such distances between like-named dinucleotides, multiples of 10.4 bases, the structural and sequence period of nucleosome DNA. The analysis confirmed the periodicity of TA dinucleotides in the strong nucleosomes, and revealed as well other periodic sequence elements, notably classical AA and TT dinucleotides. The matrices of DNA bendability and their simple linear forms--nucleosome positioning motifs--are calculated from the strong nucleosome DNA sequences. The motifs are in full accord with nucleosome positioning sequences derived earlier, thus confirming that the new technique, indeed, detects strong nucleosomes. Species- and isochore-specific variations of the matrices and of the positioning motifs are demonstrated. The strong nucleosome DNA sequences manifest the highest hitherto nucleosome positioning sequence signals, showing the dinucleotide periodicities in directly observable rather than in hidden form.

  8. Applications of mass spectrometry to DNA fingerprinting and DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Jacobson, K.B.; Buchanan, M.V.; Chen, C.H.; Doktycz, M.J.; McLuckey, S.A. (Oak Ridge National Lab., TN (United States)); Arlinghaus, H.F. (Atom Sciences, Inc., Oak Ridge, TN (United States))

    1993-01-01

    DNA fingerprinting and sequencing rely on polyacrylamide gel electrophoresis to determine the sizes of the DNA fragments. Innovative altematives to polyacrylamide gel electrophoresis are under investigation for characterization of such fingerprinting and sequencing. One method uses stable isotopes of tin and other elements to label the DNAwhereas other procedures do not require labels. The detectors in each case are mass spectrometers that detect either the stable isotopes or the DNA fragments themselves. If successful, these methods will speed up the rate of DNA analysis by one or two orders of magnitude.

  9. Applications of mass spectrometry to DNA fingerprinting and DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Jacobson, K.B.; Buchanan, M.V.; Chen, C.H.; Doktycz, M.J.; McLuckey, S.A. [Oak Ridge National Lab., TN (United States); Arlinghaus, H.F. [Atom Sciences, Inc., Oak Ridge, TN (United States)

    1993-06-01

    DNA fingerprinting and sequencing rely on polyacrylamide gel electrophoresis to determine the sizes of the DNA fragments. Innovative altematives to polyacrylamide gel electrophoresis are under investigation for characterization of such fingerprinting and sequencing. One method uses stable isotopes of tin and other elements to label the DNAwhereas other procedures do not require labels. The detectors in each case are mass spectrometers that detect either the stable isotopes or the DNA fragments themselves. If successful, these methods will speed up the rate of DNA analysis by one or two orders of magnitude.

  10. EGNAS: an exhaustive DNA sequence design algorithm

    Directory of Open Access Journals (Sweden)

    Kick Alfred

    2012-06-01

    Full Text Available Abstract Background The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of sequences with defined properties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences offers the possibility of controlling both interstrand and intrastrand properties. The guanine-cytosine content can be adjusted. Sequences can be forced to start and end with guanine or cytosine. This option reduces the risk of “fraying” of DNA strands. It is possible to limit cross hybridizations of a defined length, and to adjust the uniqueness of sequences. Self-complementarity and hairpin structures of certain length can be avoided. Sequences and subsequences can optionally be forbidden. Furthermore, sequences can be designed to have minimum interactions with predefined strands and neighboring sequences. Results The algorithm is realized in a C++ program. TAG sequences can be generated and combined with primers for single-base extension reactions, which were described for multiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldback through intrastrand interaction of TAG-primer pairs can be limited. The design of sequences for specific attachment of molecular constructs to DNA origami is presented. Conclusions We developed a new software tool called EGNAS for the design of unique nucleic acid sequences. The presented exhaustive algorithm allows to generate greater sets of sequences than with previous software and equal constraints. EGNAS is freely available for noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS.

  11. Nanopore DNA sequencing using kinetic proofreading

    Science.gov (United States)

    Ling, Xinsheng

    We propose a method of DNA sequencing by combining the physical method of nanopore electrical measurements and Southern's sequencing-by-hybridization. The new key ingredient, essential to both lowering the costs and increasing the precision, is an asymmetric nanopore sandwich device capable of measuring the DNA hybridization probe twice separated by a designed waiting time. Those incorrect probes appearing only once in nanopore ionic current traces are discriminated from the correct ones that appear twice. This method of discrimination is similar to the principle of kinetic proofreading proposed by Hopfield and Ninio in gene transcription and translation processes. An error analysis is of this nanopore kinetic proofreading (nKP) technique for DNA sequencing is carried out in comparison with the most precise 3' dideoxy termination method developed by Sanger. Nanopore DNA sequencing using kinetic proofreading.

  12. Extracting biological knowledge from DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    De La Vega, F.M. [CINVESTAV-IPN (Mexico); Thieffry, D. [Universite Libre de Bruxelles, Rhode-Saint-Genese (Belgium)]|[Universidad Nacional Autonoma de Mexico, Morelos (Mexico); Collado-Vides, J. [Universidad Nacional Autonoma de Mexico, Morelos (Mexico)

    1996-12-31

    This session describes the elucidation of information from dna sequences and what challenges computational biologists face in their task of summarizing and deciphering the human genome. Techniques discussed include methods from statistics, information theory, artificial intelligence and linguistics. 1 ref.

  13. An approach to sequence DNA without tagging

    Science.gov (United States)

    Niu, Sanjun; Saraf, Ravi F.

    2002-10-01

    Microarray technology is playing an increasingly important role in biology and medicine and its application to genomics for gene expression analysis has already reached the market with a variety of commercially available instruments. In these combinatorial analysis methods, known probe single-strand DNA (ssDNA) 'primers' are attached in clusters of typically 100 µm × 100 µm pixels. Each pixel of the array has a slightly different sequence. On exposure to 'unknown' target ssDNA, the pixels with the right complementary probe ssDNA sequence convert to double-stranded DNA (dsDNA) by a hybridization reaction. To transduct the conversion of the pixel to dsDNA, the target ssDNA is labelled with a photoluminescent tag during the polymerase chain reaction (PCR) amplification process. Due to the statistical distribution of the tags in the target ssDNA, it becomes significantly difficult to implement these methods as a diagnostic tool in a pathology laboratory. A method to sequence DNA without tagging the molecule is developed. The fabrication process is compatible with current microelectronics and (emerging) soft-material fabrication technologies, allowing the method to be integrable with micro-electromechanical systems (MEMS) and lab-on-a-chip devices. An estimated sensitivity of 10-12 g on a 1 cm2 device area is obtained.

  14. gargammel: a sequence simulator for ancient DNA.

    Science.gov (United States)

    Renaud, Gabriel; Hanghøj, Kristian; Willerslev, Eske; Orlando, Ludovic

    2016-10-29

    Ancient DNA has emerged as a remarkable tool to infer the history of extinct species and past populations. However, many of its characteristics, such as extensive fragmentation, damage and contamination, can influence downstream analyses. To help investigators measure how these could impact their analyses in silico, we have developed gargammel, a package that simulates ancient DNA fragments given a set of known reference genomes. Our package simulates the entire molecular process from post-mortem DNA fragmentation and DNA damage to experimental sequencing errors, and reproduces most common bias observed in ancient DNA datasets.

  15. Inconsistencies in Neanderthal genomic DNA sequences.

    Directory of Open Access Journals (Sweden)

    Jeffrey D Wall

    2007-10-01

    Full Text Available Two recently published papers describe nuclear DNA sequences that were obtained from the same Neanderthal fossil. Our reanalyses of the data from these studies show that they are not consistent with each other and point to serious problems with the data quality in one of the studies, possibly due to modern human DNA contaminants and/or a high rate of sequencing errors.

  16. Kangaroo – A pattern-matching program for biological sequences

    Directory of Open Access Journals (Sweden)

    Betel Doron

    2002-07-01

    Full Text Available Abstract Background Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. Results Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. Conclusion A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats.

  17. Geographic patterns of genetic variation in a broadly distributed marine vertebrate: new insights into loggerhead turtle stock structure from expanded mitochondrial DNA sequences.

    Directory of Open Access Journals (Sweden)

    Brian M Shamblin

    Full Text Available Previous genetic studies have demonstrated that natal homing shapes the stock structure of marine turtle nesting populations. However, widespread sharing of common haplotypes based on short segments of the mitochondrial control region often limits resolution of the demographic connectivity of populations. Recent studies employing longer control region sequences to resolve haplotype sharing have focused on regional assessments of genetic structure and phylogeography. Here we synthesize available control region sequences for loggerhead turtles from the Mediterranean Sea, Atlantic, and western Indian Ocean basins. These data represent six of the nine globally significant regional management units (RMUs for the species and include novel sequence data from Brazil, Cape Verde, South Africa and Oman. Genetic tests of differentiation among 42 rookeries represented by short sequences (380 bp haplotypes from 3,486 samples and 40 rookeries represented by long sequences (∼800 bp haplotypes from 3,434 samples supported the distinction of the six RMUs analyzed as well as recognition of at least 18 demographically independent management units (MUs with respect to female natal homing. A total of 59 haplotypes were resolved. These haplotypes belonged to two highly divergent global lineages, with haplogroup I represented primarily by CC-A1, CC-A4, and CC-A11 variants and haplogroup II represented by CC-A2 and derived variants. Geographic distribution patterns of haplogroup II haplotypes and the nested position of CC-A11.6 from Oman among the Atlantic haplotypes invoke recent colonization of the Indian Ocean from the Atlantic for both global lineages. The haplotypes we confirmed for western Indian Ocean RMUs allow reinterpretation of previous mixed stock analysis and further suggest that contemporary migratory connectivity between the Indian and Atlantic Oceans occurs on a broader scale than previously hypothesized. This study represents a valuable model for

  18. PREDICTION OF CHROMATIN STATES USING DNA SEQUENCE PROPERTIES

    KAUST Repository

    Bahabri, Rihab R.

    2013-06-01

    Activities of DNA are to a great extent controlled epigenetically through the internal struc- ture of chromatin. This structure is dynamic and is influenced by different modifications of histone proteins. Various combinations of epigenetic modification of histones pinpoint to different functional regions of the DNA determining the so-called chromatin states. How- ever, the characterization of chromatin states by the DNA sequence properties remains largely unknown. In this study we aim to explore whether DNA sequence patterns in the human genome can characterize different chromatin states. Using DNA sequence motifs we built binary classifiers for each chromatic state to eval- uate whether a given genomic sequence is a good candidate for belonging to a particular chromatin state. Of four classification algorithms (C4.5, Naive Bayes, Random Forest, and SVM) used for this purpose, the decision tree based classifiers (C4.5 and Random Forest) yielded best results among those we evaluated. Our results suggest that in general these models lack sufficient predictive power, although for four chromatin states (insulators, het- erochromatin, and two types of copy number variation) we found that presence of certain motifs in DNA sequences does imply an increased probability that such a sequence is one of these chromatin states.

  19. Mitochondrial DNA sequence evolution in shorebird populations

    NARCIS (Netherlands)

    Wenink, P.W.

    1994-01-01

    This thesis describes the global molecular population structure of two shorebird species, in particular of the dunlin, Calidris alpina, by means of comparative sequence analysis of the most variable part of the mitochondrial DNA (mtDNA) genome. There are several reasons

  20. Nanogrid rolling circle DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Church, George M.; Porreca, Gregory J.; Shendure, Jay; Rosenbaum, Abraham Meir

    2017-04-18

    The present invention relates to methods for sequencing a polynucleotide immobilized on an array having a plurality of specific regions each having a defined diameter size, including synthesizing a concatemer of a polynucleotide by rolling circle amplification, wherein the concatemer has a cross-sectional diameter greater than the diameter of a specific region, immobilizing the concatemer to the specific region to make an immobilized concatemer, and sequencing the immobilized concatemer.

  1. Sequencing intractable DNA to close microbial genomes.

    Science.gov (United States)

    Hurt, Richard A; Brown, Steven D; Podar, Mircea; Palumbo, Anthony V; Elias, Dwayne A

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  2. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  3. Complete mitochondrial DNA sequences of the threadfin cichlid (Petrochromis trewavasae and the blunthead cichlid (Tropheus moorii and patterns of mitochondrial genome evolution in cichlid fishes.

    Directory of Open Access Journals (Sweden)

    Christoph Fischer

    Full Text Available The cichlid fishes of the East African Great Lakes represent a model especially suited to study adaptive radiation and speciation. With several African cichlid genome projects being in progress, a promising set of closely related genomes is emerging, which is expected to serve as a valuable data base to solve questions on genotype-phenotype relations. The mitochondrial (mt genomes presented here are the first results of the assembly and annotation process for two closely related but eco-morphologically highly distinct Lake Tanganyika cichlids, Petrochromis trewavasae and Tropheus moorii. The genomic sequences comprise 16,588 bp (P. trewavasae and 16,590 bp (T. moorii, and exhibit the typical mitochondrial structure, with 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes, and a non-coding control region. Analyses confirmed that the two species are very closely related with an overall sequence similarity of 96%. We analyzed the newly generated sequences in the phylogenetic context of 21 published labroid fish mitochondrial genomes. Consistent with other vertebrates, the D-loop region was found to evolve faster than protein-coding genes, which in turn are followed by the rRNAs; the tRNAs vary greatly in the rate of sequence evolution, but on average evolve the slowest. Within the group of coding genes, ND6 evolves most rapidly. Codon usage is similar among examined cichlid tribes and labroid families; although a slight shift in usage patterns down the gene tree could be observed. Despite having a clearly different nucleotide composition, ND6 showed a similar codon usage. C-terminal ends of Cox1 exhibit variations, where the varying number of amino acids is related to the structure of the obtained phylogenetic tree. This variation may be of functional relevance for Cox1 synthesis.

  4. Ancient DNA sequence revealed by error-correcting codes.

    Science.gov (United States)

    Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

    2015-07-10

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.

  5. Nanopore-CMOS Interfaces for DNA Sequencing.

    Science.gov (United States)

    Magierowski, Sebastian; Huang, Yiyun; Wang, Chengjie; Ghafar-Zadeh, Ebrahim

    2016-08-06

    DNA sequencers based on nanopore sensors present an opportunity for a significant break from the template-based incumbents of the last forty years. Key advantages ushered by nanopore technology include a simplified chemistry and the ability to interface to CMOS technology. The latter opportunity offers substantial promise for improvement in sequencing speed, size and cost. This paper reviews existing and emerging means of interfacing nanopores to CMOS technology with an emphasis on massively-arrayed structures. It presents this in the context of incumbent DNA sequencing techniques, reviews and quantifies nanopore characteristics and models and presents CMOS circuit methods for the amplification of low-current nanopore signals in such interfaces.

  6. Osmylated DNA, a novel concept for sequencing DNA using nanopores

    Science.gov (United States)

    Kanavarioti, Anastassia

    2015-03-01

    Saenger sequencing has led the advances in molecular biology, while faster and cheaper next generation technologies are urgently needed. A newer approach exploits nanopores, natural or solid-state, set in an electrical field, and obtains base sequence information from current variations due to the passage of a ssDNA molecule through the pore. A hurdle in this approach is the fact that the four bases are chemically comparable to each other which leads to small differences in current obstruction. ‘Base calling’ becomes even more challenging because most nanopores sense a short sequence and not individual bases. Perhaps sequencing DNA via nanopores would be more manageable, if only the bases were two, and chemically very different from each other; a sequence of 1s and 0s comes to mind. Osmylated DNA comes close to such a sequence of 1s and 0s. Osmylation is the addition of osmium tetroxide bipyridine across the C5-C6 double bond of the pyrimidines. Osmylation adds almost 400% mass to the reactive base, creates a sterically and electronically notably different molecule, labeled 1, compared to the unreactive purines, labeled 0. If osmylated DNA were successfully sequenced, the result would be a sequence of osmylated pyrimidines (1), and purines (0), and not of the actual nucleobases. To solve this problem we studied the osmylation reaction with short oligos and with M13mp18, a long ssDNA, developed a UV-vis assay to measure extent of osmylation, and designed two protocols. Protocol A uses mild conditions and yields osmylated thymidines (1), while leaving the other three bases (0) practically intact. Protocol B uses harsher conditions and effectively osmylates both pyrimidines, but not the purines. Applying these two protocols also to the complementary of the target polynucleotide yields a total of four osmylated strands that collectively could define the actual base sequence of the target DNA.

  7. Electrochemical measurement for analysis of DNA sequence

    Energy Technology Data Exchange (ETDEWEB)

    Cho, S.B.; Hong, J.S.; Pak, J.H. [Korea University, Seoul (Korea); Kim, Y.M. [National Institute of Health, Seoul (Korea)

    2002-02-01

    One of the important roles of a DNA chip is the capability of detecting genetic diseases and mutations by analyzing DNA sequence. For a successful electrochemical genotyping, several aspects should be considered including the chemical treatment of electrode surface, DNA immobilization on electrode, hybridization, choice of an intercalator to be selectively bound to double standed DNA, and an equipment for detecting and analyzing the output singal. Au was used as the electrode material, 2-mercaptoethanol was used for linking DNA to Au electrode, and methylene blue was used as an indicator that can be bound to a double stranded DNA selectively. From the analysis of reductive current of this indicator that was bound to a double stranded DNA on an electrode, a normal double stranded DNA was able to be distinguished from a single stranded DNA in just a few seconds. Also, it was found that the peak reduction current of indicator is proportional to the concentration of target DNA to be hybridized with probe DNA. Therefore, it is possible to realize a simple and cheap DNA sensor using the electrochemical measurement for genotyping. (author). 20 refs., 8 figs., 1 tab.

  8. Dynamics and control of DNA sequence amplification

    Energy Technology Data Exchange (ETDEWEB)

    Marimuthu, Karthikeyan [Department of Chemical Engineering and Center for Advanced Process Decision-Making, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213 (United States); Chakrabarti, Raj, E-mail: raj@pmc-group.com, E-mail: rajc@andrew.cmu.edu [Department of Chemical Engineering and Center for Advanced Process Decision-Making, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213 (United States); Division of Fundamental Research, PMC Advanced Technology, Mount Laurel, New Jersey 08054 (United States)

    2014-10-28

    DNA amplification is the process of replication of a specified DNA sequence in vitro through time-dependent manipulation of its external environment. A theoretical framework for determination of the optimal dynamic operating conditions of DNA amplification reactions, for any specified amplification objective, is presented based on first-principles biophysical modeling and control theory. Amplification of DNA is formulated as a problem in control theory with optimal solutions that can differ considerably from strategies typically used in practice. Using the Polymerase Chain Reaction as an example, sequence-dependent biophysical models for DNA amplification are cast as control systems, wherein the dynamics of the reaction are controlled by a manipulated input variable. Using these control systems, we demonstrate that there exists an optimal temperature cycling strategy for geometric amplification of any DNA sequence and formulate optimal control problems that can be used to derive the optimal temperature profile. Strategies for the optimal synthesis of the DNA amplification control trajectory are proposed. Analogous methods can be used to formulate control problems for more advanced amplification objectives corresponding to the design of new types of DNA amplification reactions.

  9. Female-specific DNA sequences in geese.

    Science.gov (United States)

    Huang, M C; Lin, W C; Horng, Y M; Rouvier, R; Huang, C W

    2003-07-01

    1. The OPAE random primers (Operon Technologies, Inc., CA) were used for random amplified polymorphic DNA (RAPD) fingerprinting in Chinese, White Roman and Landaise geese. One of these primers, OPAE-06, produced a 938-bp sex-specific fragment in all females and in no males of Chinese geese only. 2. A novel female-specific DNA sequence in Chinese goose was cloned and sequenced. Two primers, CGSex-F and CGSex-R, were designed in order to amplify a 912-bp sex-specific polymerase chain reaction (PCR) fragment on genomic DNA from female geese. 3. It was shown that a simple and effective PCR-based sexing technique could be used in the three goose breeds studied. 4. Nucleotide sequencing of the sex-specific fragments in White Roman and Landaise geese was performed and sequence differences were observed among these three breeds.

  10. Probabilistic models for semisupervised discriminative motif discovery in DNA sequences.

    Science.gov (United States)

    Kim, Jong Kyoung; Choi, Seungjin

    2011-01-01

    Methods for discriminative motif discovery in DNA sequences identify transcription factor binding sites (TFBSs), searching only for patterns that differentiate two sets (positive and negative sets) of sequences. On one hand, discriminative methods increase the sensitivity and specificity of motif discovery, compared to generative models. On the other hand, generative models can easily exploit unlabeled sequences to better detect functional motifs when labeled training samples are limited. In this paper, we develop a hybrid generative/discriminative model which enables us to make use of unlabeled sequences in the framework of discriminative motif discovery, leading to semisupervised discriminative motif discovery. Numerical experiments on yeast ChIP-chip data for discovering DNA motifs demonstrate that the best performance is obtained between the purely-generative and the purely-discriminative and the semisupervised learning improves the performance when labeled sequences are limited.

  11. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  12. DNA Sequencing in Cultural Heritage.

    Science.gov (United States)

    Vai, Stefania; Lari, Martina; Caramelli, David

    2016-02-01

    During the last three decades, DNA analysis on degraded samples revealed itself as an important research tool in anthropology, archaeozoology, molecular evolution, and population genetics. Application on topics such as determination of species origin of prehistoric and historic objects, individual identification of famous personalities, characterization of particular samples important for historical, archeological, or evolutionary reconstructions, confers to the paleogenetics an important role also for the enhancement of cultural heritage. A really fast improvement in methodologies in recent years led to a revolution that permitted recovering even complete genomes from highly degraded samples with the possibility to go back in time 400,000 years for samples from temperate regions and 700,000 years for permafrozen remains and to analyze even more recent material that has been subjected to hard biochemical treatments. Here we propose a review on the different methodological approaches used so far for the molecular analysis of degraded samples and their application on some case studies.

  13. cDNA sequence quality data - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available Budding yeast cDNA sequencing project cDNA sequence quality data Data detail Data name cDNA sequence quality... data Description of data contents Phred's quality score. PHD format, one file to a single cDNA data, and co...ription Download License Update History of This Database Site Policy | Contact Us cDNA sequence quality data - Budding yeast cDNA sequencing project | LSDB Archive ...

  14. Spectral sum rules and search for periodicities in DNA sequences

    Science.gov (United States)

    Chechetkin, V. R.

    2011-04-01

    Periodic patterns play the important regulatory and structural roles in genomic DNA sequences. Commonly, the underlying periodicities should be understood in a broad statistical sense, since the corresponding periodic patterns have been strongly distorted by the random point mutations and insertions/deletions during molecular evolution. The latent periodicities in DNA sequences can be efficiently displayed by Fourier transform. The criteria of significance for observed periodicities are obtained via the comparison versus the counterpart characteristics of the reference random sequences. We show that the restrictions imposed on the significance criteria by the rigorous spectral sum rules can be rationally described with De Finetti distribution. This distribution provides the convenient intermediate asymptotic form between Rayleigh distribution and exact combinatoric theory.

  15. Characterization of nucleotide misincorporation patterns in the iceman's mitochondrial DNA.

    Directory of Open Access Journals (Sweden)

    Cristina Olivieri

    Full Text Available BACKGROUND: The degradation of DNA represents one of the main issues in the genetic analysis of archeological specimens. In the recent years, a particular kind of post-mortem DNA modification giving rise to nucleotide misincorporation ("miscoding lesions" has been the object of extensive investigations. METHODOLOGY/PRINCIPAL FINDINGS: To improve our knowledge regarding the nature and incidence of ancient DNA nucleotide misincorporations, we have utilized 6,859 (629,975 bp mitochondrial (mt DNA sequences obtained from the 5,350-5,100-years-old, freeze-desiccated human mummy popularly known as the Tyrolean Iceman or Otzi. To generate the sequences, we have applied a mixed PCR/pyrosequencing procedure allowing one to obtain a particularly high sequence coverage. As a control, we have produced further 8,982 (805,155 bp mtDNA sequences from a contemporary specimen using the same system and starting from the same template copy number of the ancient sample. From the analysis of the nucleotide misincorporation rate in ancient, modern, and putative contaminant sequences, we observed that the rate of misincorporation is significantly lower in modern and putative contaminant sequence datasets than in ancient sequences. In contrast, type 2 transitions represent the vast majority (85% of the observed nucleotide misincorporations in ancient sequences. CONCLUSIONS/SIGNIFICANCE: This study provides a further contribution to the knowledge of nucleotide misincorporation patterns in DNA sequences obtained from freeze-preserved archeological specimens. In the Iceman system, ancient sequences can be clearly distinguished from contaminants on the basis of nucleotide misincorporation rates. This observation confirms a previous identification of the ancient mummy sequences made on a purely phylogenetical basis. The present investigation provides further indication that the majority of ancient DNA damage is reflected by type 2 (cytosine

  16. DNA sequencing by synthesis with degenerate primers

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    The degenerate primer-based sequencing Was developed by a synthesis method(DP-SBS)for high-throughput DNA sequencing,in which a set of degenerate primers are hybridized on the arrayed DNA templates and extended by DNA polymerase on microarrays.In this method,adifferent set of degenerate primers containing a give nnumber(n)of degenerate nucleotides at the 3'-ends were annealed to the sequenced templates that were immobilized on the solid surface.The nucleotides(n+1)on the template sequences were determined by detecting the incorporation of fluorescent labeled nucleotides.The fluorescent labeled nucleotide was incorporated into the primer in a base-specific manner after the enzymatic primer extension reactions and nine-base length were read out accurately.The main advanmge of the DP-SBS is that the method only uses very conventional biochemical reagents and avoids the complicated special chemical reagents for removing the labeled nucleotides and reactivating the primer for further extension.From the present study,it is found that the DP-SBS method is reliable,simple,and cost-effective for laboratory-sequencing a large amount of short DNA fragments.

  17. Glycome mapping on DNA sequencing equipment.

    Science.gov (United States)

    Laroy, Wouter; Contreras, Roland; Callewaert, Nico

    2006-01-01

    Here we provide a detailed protocol for the analysis of protein-linked glycans on DNA sequencing equipment. This protocol satisfies the glyco-analytical needs of many projects and can form the basis of 'glycomics' studies, in which robustness, high throughput, high sensitivity and reliable quantification are of paramount importance. The protocol routinely resolves isobaric glycan stereoisomers, which is much more difficult by mass spectrometry (MS). Earlier methods made use of polyacrylamide gel-based sequencers, but we have now adapted the technique to multicapillary DNA sequencers, which represent the state of the art today. In addition, we have integrated an option for HPLC-based fractionation of highly anionic 8-amino-1,3,6-pyrenetrisulfonic acid (APTS)-labeled glycans before rapid capillary electrophoretic profiling. This option facilitates either two-dimensional profiling of complex glycan mixtures and exoglycosidase sequencing, or MS analysis of particular compounds of interest rather than of the total pool of glycans in a sample.

  18. The complete DNA sequence of vaccinia virus.

    Science.gov (United States)

    Goebel, S J; Johnson, G P; Perkus, M E; Davis, S W; Winslow, J P; Paoletti, E

    1990-11-01

    The complete DNA sequence of the genome of vaccinia virus has been determined. The genome consisted of 191,636 bp with a base composition of 66.6% A + T. We have identified 198 "major" protein-coding regions and 65 overlapping "minor" regions, for a total of 263 potential genes. Genes encoded by the virus were located by examination of DNA sequence characteristics and compared with existing vaccinia virus mapping analyses, sequence data, and transcription data. These genes were found to be compactly organized along the genome with relatively few regions of noncoding sequences. Whereas several similarities to proteins of known function were discerned, the function of the majority of proteins encoded by these open reading frames is as yet undetermined.

  19. Output-Sensitive Pattern Extraction in Sequences

    DEFF Research Database (Denmark)

    Grossi, Roberto; Menconi, Giulia; Pisanti, Nadia

    2014-01-01

    Genomic Analysis, Plagiarism Detection, Data Mining, Intrusion Detection, Spam Fighting and Time Series Analysis are just some examples of applications where extraction of recurring patterns in sequences of objects is one of the main computational challenges. Several notions of patterns exist...

  20. The DNA sequence specificity of bleomycin cleavage in a systematically altered DNA sequence.

    Science.gov (United States)

    Gautam, Shweta D; Chen, Jon K; Murray, Vincent

    2017-08-01

    Bleomycin is an anti-tumour agent that is clinically used to treat several types of cancers. Bleomycin cleaves DNA at specific DNA sequences and recent genome-wide DNA sequencing specificity data indicated that the sequence 5'-RTGT*AY (where T* is the site of bleomycin cleavage, R is G/A and Y is T/C) is preferentially cleaved by bleomycin in human cells. Based on this DNA sequence, we constructed a plasmid clone to explore this bleomycin cleavage preference. By systematic variation of single nucleotides in the 5'-RTGT*AY sequence, we were able to investigate the effect of nucleotide changes on bleomycin cleavage efficiency. We observed that the preferred consensus DNA sequence for bleomycin cleavage in the plasmid clone was 5'-YYGT*AW (where W is A/T). The most highly cleaved sequence was 5'-TCGT*AT and, in fact, the seven most highly cleaved sequences conformed to the consensus sequence 5'-YYGT*AW. A comparison with genome-wide results was also performed and while the core sequence was similar in both environments, the surrounding nucleotides were different.

  1. DNA Sequence Alignment during Homologous Recombination.

    Science.gov (United States)

    Greene, Eric C

    2016-05-27

    Homologous recombination allows for the regulated exchange of genetic information between two different DNA molecules of identical or nearly identical sequence composition, and is a major pathway for the repair of double-stranded DNA breaks. A key facet of homologous recombination is the ability of recombination proteins to perfectly align the damaged DNA with homologous sequence located elsewhere in the genome. This reaction is referred to as the homology search and is akin to the target searches conducted by many different DNA-binding proteins. Here I briefly highlight early investigations into the homology search mechanism, and then describe more recent research. Based on these studies, I summarize a model that includes a combination of intersegmental transfer, short-distance one-dimensional sliding, and length-specific microhomology recognition to efficiently align DNA sequences during the homology search. I also suggest some future directions to help further our understanding of the homology search. Where appropriate, I direct the reader to other recent reviews describing various issues related to homologous recombination.

  2. Automated Template Quantification for DNA Sequencing Facilities

    Science.gov (United States)

    Ivanetich, Kathryn M.; Yan, Wilson; Wunderlich, Kathleen M.; Weston, Jennifer; Walkup, Ward G.; Simeon, Christian

    2005-01-01

    The quantification of plasmid DNA by the PicoGreen dye binding assay has been automated, and the effect of quantification of user-submitted templates on DNA sequence quality in a core laboratory has been assessed. The protocol pipets, mixes and reads standards, blanks and up to 88 unknowns, generates a standard curve, and calculates template concentrations. For pUC19 replicates at five concentrations, coefficients of variance were 0.1, and percent errors were from 1% to 7% (n = 198). Standard curves with pUC19 DNA were nonlinear over the 1 to 1733 ng/μL concentration range required to assay the majority (98.7%) of user-submitted templates. Over 35,000 templates have been quantified using the protocol. For 1350 user-submitted plasmids, 87% deviated by ≥ 20% from the requested concentration (500 ng/μL). Based on data from 418 sequencing reactions, quantification of user-submitted templates was shown to significantly improve DNA sequence quality. The protocol is applicable to all types of double-stranded DNA, is unaffected by primer (1 pmol/μL), and is user modifiable. The protocol takes 30 min, saves 1 h of technical time, and costs approximately $0.20 per unknown. PMID:16461949

  3. DNA Sequence Alignment during Homologous Recombination*

    Science.gov (United States)

    Greene, Eric C.

    2016-01-01

    Homologous recombination allows for the regulated exchange of genetic information between two different DNA molecules of identical or nearly identical sequence composition, and is a major pathway for the repair of double-stranded DNA breaks. A key facet of homologous recombination is the ability of recombination proteins to perfectly align the damaged DNA with homologous sequence located elsewhere in the genome. This reaction is referred to as the homology search and is akin to the target searches conducted by many different DNA-binding proteins. Here I briefly highlight early investigations into the homology search mechanism, and then describe more recent research. Based on these studies, I summarize a model that includes a combination of intersegmental transfer, short-distance one-dimensional sliding, and length-specific microhomology recognition to efficiently align DNA sequences during the homology search. I also suggest some future directions to help further our understanding of the homology search. Where appropriate, I direct the reader to other recent reviews describing various issues related to homologous recombination. PMID:27129270

  4. Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

    Science.gov (United States)

    McCutchen-Maloney, Sandra L.

    2002-01-01

    DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.

  5. The first determination of DNA sequence of a specific gene.

    Science.gov (United States)

    Inouye, Masayori

    2016-05-10

    How and when the first DNA sequence of a gene was determined? In 1977, F. Sanger came up with an innovative technology to sequence DNA by using chain terminators, and determined the entire DNA sequence of the 5375-base genome of bacteriophage φX 174 (Sanger et al., 1977). While this Sanger's achievement has been recognized as the first DNA sequencing of genes, we had determined DNA sequence of a gene, albeit a partial sequence, 11 years before the Sanger's DNA sequence (Okada et al., 1966).

  6. DNA sequencing by nanopores: advances and challenges

    Science.gov (United States)

    Agah, Shaghayegh; Zheng, Ming; Pasquali, Matteo; Kolomeisky, Anatoly B.

    2016-10-01

    Developing inexpensive and simple DNA sequencing methods capable of detecting entire genomes in short periods of time could revolutionize the world of medicine and technology. It will also lead to major advances in our understanding of fundamental biological processes. It has been shown that nanopores have the ability of single-molecule sensing of various biological molecules rapidly and at a low cost. This has stimulated significant experimental efforts in developing DNA sequencing techniques by utilizing biological and artificial nanopores. In this review, we discuss recent progress in the nanopore sequencing field with a focus on the nature of nanopores and on sensing mechanisms during the translocation. Current challenges and alternative methods are also discussed.

  7. Parallel gigantism and complex colonization patterns in the Cape Verde scincid lizards Mabuya and Macroscincus (Reptilia: Scincidae) revealed by mitochondrial DNA sequences.

    Science.gov (United States)

    Carranza, S; Arnold, E N; Mateo, J A; López-Jurado, L F

    2001-08-07

    The scincid lizards of the Cape Verde islands comprise the extinct endemic giant Macroscincus coctei and at least five species of Mabuya, one of which, Mabuya vaillanti, also had populations with large body size. Phylogenetic analysis based on DNA sequences derived from the mitochondrial cytochrome b, cytochrome oxidase I and 12S rRNA genes (711, 498 and 378 base pairs (bp), respectively) corroborates morphological evidence that these species constitute a clade and that Macroscincus is unrelated to very large skinks in other areas. The relationships are ((M. vaillanti and Mabuya delalandii) (Mabuya spinalis and Macroscincus coctei (Mabuya fogoensis nicolauensis (Mabuya fogoensis antaoensis and Mabuya stangeri)))). The Cape Verde archipelago was colonized from West Africa, probably in the Late Miocene or Early Pliocene period. The north-eastern islands were probably occupied first, after which the ancestor of M. vaillanti and M. delalandii may have originated on Boavista, the ancestor of the latter species arriving on Santiago or Fogo later. The M. fogoensis--M. stangeri clade colonized the islands of Branco, Razo, Santa Luzia and São Vicente from São Nicolau and reached Santo Antão after this. Colonization of these northeastern islands was slow, perhaps because the recipient islands had not developed earlier or because colonization cut across the path of the Canary Current and the Northeast Trade Winds, the main dispersing agents in the region. Rapid extension of range into the southwestern islands occurred later in M. spinalis and then in M. vaillanti and M. delalandii. The long apparent delay between the origin of these species and their southwestern dispersal may have been because there were earlier colonizations of the southern islands which excluded later ones until the earlier inhabitants were exterminated by volcanic or climatic events. The evolution of large size in Macroscincus occurred in the northwestern islands and was paralleled in the eastern and

  8. Mitochondrial DNA sequence variation in Greeks.

    Science.gov (United States)

    Kouvatsi, A; Karaiskou, N; Apostolidis, A; Kirmizidis, G

    2001-12-01

    Mitochondrial DNA (mtDNA) control region sequences were determined in 54 unrelated Greeks, coming from different regions in Greece, for both segments HVR-I and HVR-II. Fifty-two different mtDNA haplotypes were revealed, one of which was shared by three individuals. A very low heterogeneity was found among Greek regions. No one cluster of lineages was specific to individuals coming from a certain region. The average pairwise difference distribution showed a value of 7.599. The data were compared with that for other European or neighbor populations (British, French, Germans, Tuscans, Bulgarians, and Turks). The genetic trees that were constructed revealed homogeneity between Europeans. Median networks revealed that most of the Greek mtDNA haplotypes are clustered to the five known haplogroups and that a number of haplotypes are shared among Greeks and other European and Near Eastern populations.

  9. Insights into the Genetic Relationships and Breeding Patterns of the African Tea Germplasm Based on nSSR Markers and cpDNA Sequences.

    Science.gov (United States)

    Wambulwa, Moses C; Meegahakumbura, Muditha K; Kamunya, Samson; Muchugi, Alice; Möller, Michael; Liu, Jie; Xu, Jian-Chu; Ranjitkar, Sailesh; Li, De-Zhu; Gao, Lian-Ming

    2016-01-01

    Africa is one of the key centers of global tea production. Understanding the genetic diversity and relationships of cultivars of African tea is important for future targeted breeding efforts for new crop cultivars, specialty tea processing, and to guide germplasm conservation efforts. Despite the economic importance of tea in Africa, no research work has been done so far on its genetic diversity at a continental scale. Twenty-three nSSRs and three plastid DNA regions were used to investigate the genetic diversity, relationships, and breeding patterns of tea accessions collected from eight countries of Africa. A total of 280 African tea accessions generated 297 alleles with a mean of 12.91 alleles per locus and a genetic diversity (H S) estimate of 0.652. A STRUCTURE analysis suggested two main genetic groups of African tea accessions which corresponded well with the two tea types Camellia sinensis var. sinensis and C. sinensis var. assamica, respectively, as well as an admixed "mosaic" group whose individuals were defined as hybrids of F2 and BC generation with a high proportion of C. sinensis var. assamica being maternal parents. Accessions known to be C. sinensis var. assamica further separated into two groups representing the two major tea breeding centers corresponding to southern Africa (Tea Research Foundation of Central Africa, TRFCA), and East Africa (Tea Research Foundation of Kenya, TRFK). Tea accessions were shared among countries. African tea has relatively lower genetic diversity. C. sinensis var. assamica is the main tea type under cultivation and contributes more in tea breeding improvements in Africa. International germplasm exchange and movement among countries within Africa was confirmed. The clustering into two main breeding centers, TRFCA, and TRFK, suggested that some traits of C. sinensis var. assamica and their associated genes possibly underwent selection during geographic differentiation or local breeding preferences. This study represents

  10. Local Renyi entropic profiles of DNA sequences

    Directory of Open Access Journals (Sweden)

    Vinga Susana

    2007-10-01

    Full Text Available Abstract Background In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM. Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/. Conclusion The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.

  11. Sequence-specific recognition of DNA nanostructures.

    Science.gov (United States)

    Rusling, David A; Fox, Keith R

    2014-05-15

    DNA is the most exploited biopolymer for the programmed self-assembly of objects and devices that exhibit nanoscale-sized features. One of the most useful properties of DNA nanostructures is their ability to be functionalized with additional non-nucleic acid components. The introduction of such a component is often achieved by attaching it to an oligonucleotide that is part of the nanostructure, or hybridizing it to single-stranded overhangs that extend beyond or above the nanostructure surface. However, restrictions in nanostructure design and/or the self-assembly process can limit the suitability of these procedures. An alternative strategy is to couple the component to a DNA recognition agent that is capable of binding to duplex sequences within the nanostructure. This offers the advantage that it requires little, if any, alteration to the nanostructure and can be achieved after structure assembly. In addition, since the molecular recognition of DNA can be controlled by varying pH and ionic conditions, such systems offer tunable properties that are distinct from simple Watson-Crick hybridization. Here, we describe methodology that has been used to exploit and characterize the sequence-specific recognition of DNA nanostructures, with the aim of generating functional assemblies for bionanotechnology and synthetic biology applications.

  12. Spectral sum rules and search for periodicities in DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    Chechetkin, V.R., E-mail: chechet@biochip.r [Theoretical Department of Division for Perspective Investigations, Troitsk Institute of Innovation and Thermonuclear Investigations (TRINITI), Troitsk, 142190 Moscow Region (Russian Federation)

    2011-04-18

    Periodic patterns play the important regulatory and structural roles in genomic DNA sequences. Commonly, the underlying periodicities should be understood in a broad statistical sense, since the corresponding periodic patterns have been strongly distorted by the random point mutations and insertions/deletions during molecular evolution. The latent periodicities in DNA sequences can be efficiently displayed by Fourier transform. The criteria of significance for observed periodicities are obtained via the comparison versus the counterpart characteristics of the reference random sequences. We show that the restrictions imposed on the significance criteria by the rigorous spectral sum rules can be rationally described with De Finetti distribution. This distribution provides the convenient intermediate asymptotic form between Rayleigh distribution and exact combinatoric theory. - Highlights: We study the significance criteria for latent periodicities in DNA sequences. The constraints imposed by sum rules can be described with De Finetti distribution. It is intermediate between Rayleigh distribution and exact combinatoric theory. Theory is applicable to the study of correlations between different periodicities. The approach can be generalized to the arbitrary discrete Fourier transform.

  13. MEME: discovering and analyzing DNA and protein sequence motifs.

    Science.gov (United States)

    Bailey, Timothy L; Williams, Nadya; Misleh, Chris; Li, Wilfred W

    2006-07-01

    MEME (Multiple EM for Motif Elicitation) is one of the most widely used tools for searching for novel 'signals' in sets of biological sequences. Applications include the discovery of new transcription factor binding sites and protein domains. MEME works by searching for repeated, ungapped sequence patterns that occur in the DNA or protein sequences provided by the user. Users can perform MEME searches via the web server hosted by the National Biomedical Computation Resource (http://meme.nbcr.net) and several mirror sites. Through the same web server, users can also access the Motif Alignment and Search Tool to search sequence databases for matches to motifs encoded in several popular formats. By clicking on buttons in the MEME output, users can compare the motifs discovered in their input sequences with databases of known motifs, search sequence databases for matches to the motifs and display the motifs in various formats. This article describes the freely accessible web server and its architecture, and discusses ways to use MEME effectively to find new sequence patterns in biological sequences and analyze their significance.

  14. Output-Sensitive Pattern Extraction in Sequences

    DEFF Research Database (Denmark)

    Grossi, Roberto; Menconi, Giulia; Pisanti, Nadia

    2014-01-01

    Genomic Analysis, Plagiarism Detection, Data Mining, Intrusion Detection, Spam Fighting and Time Series Analysis are just some examples of applications where extraction of recurring patterns in sequences of objects is one of the main computational challenges. Several notions of patterns exist...... n. We address the problem of extracting maximal patterns with at most k don’t care symbols and at least q occurrences. Our contribution is to give the first algorithm that attains a stronger notion of output-sensitivity, borrowed from the analysis of data structures: the cost is proportional...... to the actual number of occurrences of each pattern, which is at most n and practically much smaller than n in real applications, thus avoiding the aforementioned cost of O(nc) per pattern....

  15. DNA Methylation Patterns in the Hypothalamus of Female Pubertal Goats.

    Science.gov (United States)

    Yang, Chen; Ye, Jing; Li, Xiumei; Gao, Xiaoxiao; Zhang, Kaifa; Luo, Lei; Ding, Jianping; Zhang, Yunhai; Li, Yunsheng; Cao, Hongguo; Ling, Yinghui; Zhang, Xiaorong; Liu, Ya; Fang, Fugui

    2016-01-01

    Female pubertal development is tightly controlled by complex mechanisms, including neuroendocrine and epigenetic regulatory pathways. Specific gene expression patterns can be influenced by DNA methylation changes in the hypothalamus, which can in turn regulate timing of puberty onset. In order to understand the relationship between DNA methylation changes and gene expression patterns in the hypothalamus of pubertal goats, whole-genome bisulfite sequencing and RNA-sequencing analyses were carried out. There was a decline in DNA methylation levels in the hypothalamus during puberty and 268 differentially methylated regions (DMR) in the genome, with differential patterns in different gene regions. There were 1049 genes identified with distinct expression patterns. High levels of DNA methylation were detected in promoters, introns and 3'-untranslated regions (UTRs). Levels of methylation decreased gradually from promoters to 5'-UTRs and increased from 5'-UTRs to introns. Methylation density analysis demonstrated that methylation level variation was consistent with the density in the promoter, exon, intron, 5'-UTRs and 3'-UTRs. Analyses of CpG island (CGI) sites showed that the enriched gene contents were gene bodies, intergenic regions and introns, and these CGI sites were hypermethylated. Our study demonstrated that DNA methylation changes may influence gene expression profiles in the hypothalamus of goats during the onset of puberty, which may provide new insights into the mechanisms involved in pubertal onset.

  16. Sequence periodic pattern of HERV LTRs: A matrix simulation algorithm

    Indian Academy of Sciences (India)

    Shihua Zhang; Jing Xu; Chaoling Wei

    2012-03-01

    Flanking regulatory long terminal repeats (LTRs) in Human endogenous retrovirus (HERV) is a kind of typical DNA repeat that is widespread in the human genome. Currently, many algorithms have been developed to detect the latent periodicity of a wide range of DNA repeats. However, no such attempt was made for HERV LTRs. The present study focused on the investigation of the possible sequence periodic patterns in the HERV LTRs and their regulatory mechanisms. We calculated the sequence periods of 5′, 3′ and combined LTRs in HERVs with our devised matrix simulation algorithm. It is interesting that 5′ and 3′ LTRs have the same period of 7, and combined LTRs have a period of 9. These results indicated that HERV LTRs have predominant periodic patterns. Based on the obtained sequence periodicity, we constructed periodic consensus sequences of 5′, 3′ and combined LTRs. As to 5′ and 3′ LTRs with the same period – 7, we manually scanned the nucleotide bases in the corresponding positions of their periodic consensus sequences, and found some positions have the nucleotide base unchanged, such as the 1st, 5th and 7th positions. These conservative nucleotide base positions represent critical binding sites of regulatory LTRs, and may be indicative of conserved regulatory mechanisms in LRT-participating regulatory networks.

  17. Next-generation sequencing offers new insights into DNA degradation

    DEFF Research Database (Denmark)

    Overballe-Petersen, Søren; Orlando, Ludovic Antoine Alexandre; Willerslev, Eske

    2012-01-01

    The processes underlying DNA degradation are central to various disciplines, including cancer research, forensics and archaeology. The sequencing of ancient DNA molecules on next-generation sequencing platforms provides direct measurements of cytosine deamination, depurination and fragmentation r...

  18. Random Coding Bounds for DNA Codes Based on Fibonacci Ensembles of DNA Sequences

    Science.gov (United States)

    2008-07-01

    COVERED (From - To) 6 Jul 08 – 11 Jul 08 4. TITLE AND SUBTITLE RANDOM CODING BOUNDS FOR DNA CODES BASED ON FIBONACCI ENSEMBLES OF DNA SEQUENCES ... sequences which are generalizations of the Fibonacci sequences . 15. SUBJECT TERMS DNA Codes, Fibonacci Ensembles, DNA Computing, Code Optimization 16...coding bound on the rate of DNA codes is proved. To obtain the bound, we use some ensembles of DNA sequences which are generalizations of the Fibonacci

  19. An oligonucleotide hybridization approach to DNA sequencing.

    Science.gov (United States)

    Khrapko, K R; Lysov YuP; Khorlyn, A A; Shick, V V; Florentiev, V L; Mirzabekov, A D

    1989-10-09

    We have proposed a DNA sequencing method based on hybridization of a DNA fragment to be sequenced with the complete set of fixed-length oligonucleotides (e.g., 4(8) = 65,536 possible 8-mers) immobilized individually as dots of a 2-D matrix [(1989) Dokl. Akad. Nauk SSSR 303, 1508-1511]. It was shown that the list of hybridizing octanucleotides is sufficient for the computer-assisted reconstruction of the structures for 80% of random-sequence fragments up to 200 bases long, based on the analysis of the octanucleotide overlapping. Here a refinement of the method and some experimental data are presented. We have performed hybridizations with oligonucleotides immobilized on a glass plate, and obtained their dissociation curves down to heptanucleotides. Other approaches, e.g., an additional hybridization of short oligonucleotides which continuously extend duplexes formed between the fragment and immobilized oligonucleotides, should considerably increase either the probability of unambiguous reconstruction, or the length of reconstructed sequences, or decrease the size of immobilized oligonucleotides.

  20. HIV-1 transmission patterns in antiretroviral therapy-naive, HIV-infected North Americans based on phylogenetic analysis by population level and ultra-deep DNA sequencing.

    Directory of Open Access Journals (Sweden)

    Lisa L Ross

    Full Text Available Factors that contribute to the transmission of human immunodeficiency virus type 1 (HIV-1, especially drug-resistant HIV-1 variants remain a significant public health concern. In-depth phylogenetic analyses of viral sequences obtained in the screening phase from antiretroviral-naïve HIV-infected patients seeking enrollment in EPZ108859, a large open-label study in the USA, Canada and Puerto Rico (ClinicalTrials.gov NCT00440947 were examined for insights into the roles of drug resistance and epidemiological factors that could impact disease dissemination. Viral transmission clusters (VTCs were initially predicted from a phylogenetic analysis of population level HIV-1 pol sequences obtained from 690 antiretroviral-naïve subjects in 2007. Subsequently, the predicted VTCs were tested for robustness by ultra deep sequencing (UDS using pyrosequencing technology and further phylogenetic analyses. The demographic characteristics of clustered and non-clustered subjects were then compared. From 690 subjects, 69 were assigned to 1 of 30 VTCs, each containing 2 to 5 subjects. Race composition of VTCs were significantly more likely to be white (72% vs. 60%; p = 0.04. VTCs had fewer reverse transcriptase and major PI resistance mutations (9% vs. 24%; p = 0.002 than non-clustered sequences. Both men-who-have-sex-with-men (MSM (68% vs. 48%; p = 0.001 and Canadians (29% vs. 14%; p = 0.03 were significantly more frequent in VTCs than non-clustered sequences. Of the 515 subjects who initiated antiretroviral therapy, 33 experienced confirmed virologic failure through 144 weeks while only 3/33 were from VTCs. Fewer VTCs subjects (as compared to those with non-clustering virus had HIV-1 with resistance-associated mutations or experienced virologic failure during the course of the study. Our analysis shows specific geographical and drug resistance trends that correlate well with transmission clusters defined by HIV sequences of similarity

  1. ERIC and REP-PCR banding patterns and sequence analysis of the internal transcribed spacer of rDNA of Stemphylium solani isolates from cotton.

    Science.gov (United States)

    Mehta, Yeshwant R; Mehta, Angela; Rosato, Yoko B

    2002-05-01

    The genetic diversity of the Stemphylium solani isolates from cotton was assessed by Enterobacterial Repetitive Intergenic Consensus (ERIC) and Repetitive Extragenic Palindromes (REP)-PCR fingerprinting. Twenty eight monosporic isolates of S. solani from cotton were used along with five isolates from tomato and one isolate of Alternaria macrospora from cotton for comparison. The dendrogram obtained revealed clear differences between the cotton and tomato isolates as well as between the tomato isolates from different geographic regions. The genetic relationships among S. solani isolates were also analyzed by sequencing the internal transcribed spacer (ITS) region of four isolates representing the three ERIC and REP groups. The tomato isolate from the State of São Paulo showed a distinct ITS sequence from that of the cotton isolates and tomato isolate from the State of Goiás, giving evidence that it belongs to a different genotype of S. solani. This is the first report of the entire sequence of the ITS1-5.8S-ITS2 regions of S. solani.

  2. Modified Genetic Algorithm for DNA Sequence Assembly by Shotgun and Hybridization Sequencing Techniques

    Directory of Open Access Journals (Sweden)

    Prof.Narayan Kumar Sahu

    2012-09-01

    Full Text Available Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion of pair wise fragment overlap. While shotgun sequencing infers a DNA sequence given the sequences of overlapping fragments, a recent and complementary method, called sequencing by hybridization (SBH, infers a DNA sequence given the set of oligomers that represents all sub words of some fixed length, k. In this paper, we propose a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods. Based on our preliminary investigations, the algorithm promises- to be very fast and practical for DNA sequence assembly [1].

  3. Laser mass spectrometry for DNA sequencing, disease diagnosis, and fingerprinting

    Energy Technology Data Exchange (ETDEWEB)

    Winston Chen, C.H.; Taranenko, N.I.; Zhu, Y.F.; Chung, C.N.; Allman, S.L.

    1997-03-01

    Since laser mass spectrometry has the potential for achieving very fast DNA analysis, the authors recently applied it to DNA sequencing, DNA typing for fingerprinting, and DNA screening for disease diagnosis. Two different approaches for sequencing DNA have been successfully demonstrated. One is to sequence DNA with DNA ladders produced from Snager`s enzymatic method. The other is to do direct sequencing without DNA ladders. The need for quick DNA typing for identification purposes is critical for forensic application. The preliminary results indicate laser mass spectrometry can possibly be used for rapid DNA fingerprinting applications at a much lower cost than gel electrophoresis. Population screening for certain genetic disease can be a very efficient step to reducing medical costs through prevention. Since laser mass spectrometry can provide very fast DNA analysis, the authors applied laser mass spectrometry to disease diagnosis. Clinical samples with both base deletion and point mutation have been tested with complete success.

  4. Sequence dependent hole evolution in DNA.

    Science.gov (United States)

    Lakhno, V D

    2004-06-01

    The paper examines thedynamical behavior of a radical cation(G(+*)) generated in adouble stranded DNA for differentoligonucleotide sequences. The resonancehole tunneling through an oligonucleotidesequence is studied by the method ofnumerical integration of self-consistentquantum-mechanical equations. The holemotion is considered quantum mechanicallyand nucleotide base oscillations aretreated classically. The results obtaineddemonstrate a strong dependence of chargetransfer on the type of nucleotidesequence. The rates of the hole transferare calculated for different nucleotidesequences and compared with experimentaldata on the transfer from (G(+*))to a GGG unit.

  5. The influence of DNA sequence on epigenome-induced pathologies

    Directory of Open Access Journals (Sweden)

    Meagher Richard B

    2012-07-01

    Full Text Available Abstract Clear cause-and-effect relationships are commonly established between genotype and the inherited risk of acquiring human and plant diseases and aberrant phenotypes. By contrast, few such cause-and-effect relationships are established linking a chromatin structure (that is, the epitype with the transgenerational risk of acquiring a disease or abnormal phenotype. It is not entirely clear how epitypes are inherited from parent to offspring as populations evolve, even though epigenetics is proposed to be fundamental to evolution and the likelihood of acquiring many diseases. This article explores the hypothesis that, for transgenerationally inherited chromatin structures, “genotype predisposes epitype”, and that epitype functions as a modifier of gene expression within the classical central dogma of molecular biology. Evidence for the causal contribution of genotype to inherited epitypes and epigenetic risk comes primarily from two different kinds of studies discussed herein. The first and direct method of research proceeds by the examination of the transgenerational inheritance of epitype and the penetrance of phenotype among genetically related individuals. The second approach identifies epitypes that are duplicated (as DNA sequences are duplicated and evolutionarily conserved among repeated patterns in the DNA sequence. The body of this article summarizes particularly robust examples of these studies from humans, mice, Arabidopsis, and other organisms. The bulk of the data from both areas of research support the hypothesis that genotypes predispose the likelihood of displaying various epitypes, but for only a few classes of epitype. This analysis suggests that renewed efforts are needed in identifying polymorphic DNA sequences that determine variable nucleosome positioning and DNA methylation as the primary cause of inherited epigenome-induced pathologies. By contrast, there is very little evidence that DNA sequence directly

  6. Human cellular protein patterns and their link to genome DNA mapping and sequencing data: towards an integrated approach to the study of gene expression

    DEFF Research Database (Denmark)

    Celis, J E; Rasmussen, H H; Leffers, H

    1993-01-01

    two-dimensional gel protein databases will provide an integrated picture of the expression levels and properties of the thousands of protein components of organelles, pathways, and cytoskeletal systems, both under physiological and abnormal conditions, and are expected to lead to the identification...... mapping and sequence information and that offer an integrated approach to the study of gene expression. With the integrated approach offered by two-dimensional gel protein databases it is now possible to reveal phenotype-specific protein(s), to microsequence them, to search for homology with previous...... of new regulatory networks. So far, about 20% (600 out of 2,980) of the total number of proteins recorded in the human keratinocyte protein database have been identified and we are actively gathering qualitative and quantitative biological data on all resolved proteins. Given the current improvements...

  7. Transverse Electronic Signature of DNA for Electronic Sequencing

    Science.gov (United States)

    Xu, Mingsheng; Endres, Robert G.; Arakawa, Yasuhiko

    In recent years, the proliferation of large-scale DNA sequencing projects for applications in clinical medicine and health care has driven the search for new methods that could reduce the time and cost. The commonly used Sanger sequencing method relies on the chemistry to read the bases in DNA and is far too slow and expensive for reading personal genetic codes. There were earlier attempts to sequence DNA by directly visualizing the nucleotide composition of the DNA molecules by scanning tunneling microscopy (STM). However, sequencing DNA based on directly imaging DNA's atomic structure has not yet been successful. In Chap. 9, Xu, Endres, and Arakawa report a potential physical alternative by detecting unique transverse electronic signatures of DNA bases using ultrahigh vacuum STM. Supported by the principles, calculations and statistical analyses, these authors argue that it would be possible to directly sequence DNA by the STM-based technology without any modification of the DNA.

  8. A new DNA sequence assembly program.

    Science.gov (United States)

    Bonfield, J K; Smith, K f; Staden, R

    1995-01-01

    We describe the Genome Assembly Program (GAP), a new program for DNA sequence assembly. The program is suitable for large and small projects, a variety of strategies and can handle data from a range of sequencing instruments. It retains the useful components of our previous work, but includes many novel ideas and methods. Many of these methods have been made possible by the program's completely new, and highly interactive, graphical user interface. The program provides many visual clues to the current state of a sequencing project and allows users to interact in intuitive and graphical ways with their data. The program has tools to display and manipulate the various types of data that help to solve and check difficult assemblies, particularly those in repetitive genomes. We have introduced the following new displays: the Contig Selector, the Contig Comparator, the Template Display, the Restriction Enzyme Map and the Stop Codon Map. We have also made it possible to have any number of Contig Editors and Contig Joining Editors running simultaneously even on the same contig. The program also includes a new 'Directed Assembly' algorithm and routines for automatically detecting unfinished segments of sequence, to which it suggests experimental solutions. Images PMID:8559656

  9. Understanding Long-Range Correlations in DNA sequences

    CERN Document Server

    Li, W; Kaneko, K; Wentian Li; Thomas G Marr; Kunihiko Kaneko

    1994-01-01

    Abstract: In this paper, we review the literature on statistical long-range correlation in DNA sequences. We examine the current evidence for these correlations, and conclude that a mixture of many length scales (including some relatively long ones) in DNA sequences is responsible for the observed 1/f-like spectral component. We note the complexity of the correlation structure in DNA sequences. The observed complexity often makes it hard, or impossible, to decompose the sequence into a few statistically stationary regions. We suggest that, based on the complexity of DNA sequences, a fruitful approach to understand long-range correlation is to model duplication, and other rearrangement processes, in DNA sequences. One model, called ``expansion-modification system", contains only point duplication and point mutation. Though simplistic, this model is able to generate sequences with 1/f spectra. We emphasize the importance of DNA duplication in its contribution to the observed long-range correlation in DNA sequen...

  10. Sequences sufficient for programming imprinted germline DNA methylation defined.

    Directory of Open Access Journals (Sweden)

    Yoon Jung Park

    Full Text Available Epigenetic marks are fundamental to normal development, but little is known about signals that dictate their placement. Insights have been provided by studies of imprinted loci in mammals, where monoallelic expression is epigenetically controlled. Imprinted expression is regulated by DNA methylation programmed during gametogenesis in a sex-specific manner and maintained after fertilization. At Rasgrf1 in mouse, paternal-specific DNA methylation on a differential methylation domain (DMD requires downstream tandem repeats. The DMD and repeats constitute a binary switch regulating paternal-specific expression. Here, we define sequences sufficient for imprinted methylation using two transgenic mouse lines: One carries the entire Rasgrf1 cluster (RC; the second carries only the DMD and repeats (DR from Rasgrf1. The RC transgene recapitulated all aspects of imprinting seen at the endogenous locus. DR underwent proper DNA methylation establishment in sperm and erasure in oocytes, indicating the DMD and repeats are sufficient to program imprinted DNA methylation in germlines. Both transgenes produce a DMD-spanning pit-RNA, previously shown to be necessary for imprinted DNA methylation at the endogenous locus. We show that when pit-RNA expression is controlled by the repeats, it regulates DNA methylation in cis only and not in trans. Interestingly, pedigree history dictated whether established DR methylation patterns were maintained after fertilization. When DR was paternally transmitted followed by maternal transmission, the unmethylated state that was properly established in the female germlines could not be maintained. This provides a model for transgenerational epigenetic inheritance in mice.

  11. Improved taboo search algorithm for designing DNA sequences

    Institute of Scientific and Technical Information of China (English)

    Kai Zhang; Jin Xu; Xiutang Geng; Jianhua Xiao; Linqiang Pan

    2008-01-01

    The design of DNA sequences is one of the most practical and important research topics in DNA computing.We adopt taboo search algorithm and improve the method for the systematic design of equal-length DNA sequences,which can satisfy certain combinatorial and thermodynamic constraints.Using taboo search algorithm,our method can avoid trapping into local optimization and can find a set of good DNA sequences satisfying required constraints.

  12. From DNA sequence to transcriptional behaviour: a quantitative approach.

    Science.gov (United States)

    Segal, Eran; Widom, Jonathan

    2009-07-01

    Complex transcriptional behaviours are encoded in the DNA sequences of gene regulatory regions. Advances in our understanding of these behaviours have been recently gained through quantitative models that describe how molecules such as transcription factors and nucleosomes interact with genomic sequences. An emerging view is that every regulatory sequence is associated with a unique binding affinity landscape for each molecule and, consequently, with a unique set of molecule-binding configurations and transcriptional outputs. We present a quantitative framework based on existing methods that unifies these ideas. This framework explains many experimental observations regarding the binding patterns of factors and nucleosomes and the dynamics of transcriptional activation. It can also be used to model more complex phenomena such as transcriptional noise and the evolution of transcriptional regulation.

  13. Chimeric proteins for detection and quantitation of DNA mutations, DNA sequence variations, DNA damage and DNA mismatches

    Science.gov (United States)

    McCutchen-Maloney, Sandra L.

    2002-01-01

    Chimeric proteins having both DNA mutation binding activity and nuclease activity are synthesized by recombinant technology. The proteins are of the general formula A-L-B and B-L-A where A is a peptide having DNA mutation binding activity, L is a linker and B is a peptide having nuclease activity. The chimeric proteins are useful for detection and identification of DNA sequence variations including DNA mutations (including DNA damage and mismatches) by binding to the DNA mutation and cutting the DNA once the DNA mutation is detected.

  14. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Science.gov (United States)

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697

  15. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Directory of Open Access Journals (Sweden)

    Chun-Tien Chang

    2012-01-01

    Full Text Available The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs, insertion-deletions (indels, short tandem repeats (STRs, and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR, which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS; (iii determine human papilloma virus (HPV genotypes by searching current viral databases in cases of double infections; (iv estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4 and its paralog HSPDP3.

  16. Mixed sequence reader: a program for analyzing DNA sequences with heterozygous base calling.

    Science.gov (United States)

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3.

  17. ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors

    OpenAIRE

    2009-01-01

    This article presents the design of a sequence-based predictor named ProteDNA for identifying the sequence-specific binding residues in a transcription factor (TF). Concerning protein–DNA interactions, there are two types of binding mechanisms involved, namely sequence-specific binding and nonspecific binding. Sequence-specific bindings occur between protein sidechains and nucleotide bases and correspond to sequence-specific recognition of genes. Therefore, sequence-specific bindings are esse...

  18. Protein patterning by a DNA origami framework.

    Science.gov (United States)

    Aslan, Hüsnü; Krissanaprasit, Abhichart; Besenbacher, Flemming; Gothelf, Kurt V; Dong, Mingdong

    2016-08-18

    A spatial arrangement of proteins provides structural and functional advantages in vast technological applications as well as fundamental research. Most protein patterning procedures employ complicated, time consuming and very costly nanofabrication techniques. As an alternative route, we developed a fully biomolecular self-assembly method using DNA Origami Frames (DOF) as a template for both small and large scale protein patterning. We employed a triangular DOF (tDOF) to arrange the Bovine Serum Albumin (BSA) protein. Our in situ protein patterning strategy provides a novel, fully organic platform using a fast and low-cost surface approach with possible utilization in fundamental science and technological applications.

  19. DNA Sequence Optimization Based on Continuous Particle Swarm Optimization for Reliable DNA Computing and DNA Nanotechnology

    Directory of Open Access Journals (Sweden)

    N. K. Khalid

    2008-01-01

    Full Text Available Problem statement: In DNA based computation and DNA nanotechnology, the design of good DNA sequences has turned out to be an essential problem and one of the most practical and important research topics. Basically, the DNA sequence design problem is a multi-objective problem and it can be evaluated using four objective functions, namely, Hmeasure, similarity, continuity and hairpin. Approach: There are several ways to solve multi-objective problem, however, in order to evaluate the correctness of PSO algorithm in DNA sequence design, this problem is converted into single objective problem. Particle Swarm Optimization (PSO is proposed to minimize the objective in the problem, subjected to two constraints: melting temperature and GCcontent. A model is developed to present the DNA sequence design based on PSO computation. Results: Based on experiments and researches done, 20 particles are used in the implementation of the optimization process, where the average values and the standard deviation for 100 runs are shown along with comparison to other existing methods. Conclusion: The results achieve verified that PSO can suitably solves the DNA sequence design problem using the proposed method and model, comparatively better than other approaches.

  20. Urban DNA: Morphogenetic Analysis of Urban Pattern

    Directory of Open Access Journals (Sweden)

    H. Serdar Kaya

    2017-06-01

    Full Text Available Urban pattern is the result of a dynamic transformation process, which can follow two different trajectories: planned interventions generally produces clear geometrical patterns in large areas, however, unplanned transformation process needs more time and has relatively smaller and partial effects on the urban pattern but creates more complex urban patterns. Highly complex spatial structure of urban pattern governed by local and global forces should be analyzed via advanced methods that corresponds the complexity of the pattern. Analyses of the dynamic structure of the multidimensional urban system shows the necessity of using advanced methods and several parameters together. The aim of this paper is developing a new method to analyze and represent highly complex urban pattern via evaluating geometrical, topological, and mathematical parameters to evaluate essential characteristics of cities. Physical space is analyzed by ‘geometrical parameters’, ‘topological parameters’, ‘parameters related to use and perception’ and ‘parameters related to complexity’. Calculation results gives two main information about urban structure: Firstly, values gives information about spatial characteristics and diversity of urban pattern. Secondly, the spatial distribution map of changing urban pattern reflects the unique structure of settlements, which resembles DNA of living creatures. In this paper, Istanbul was selected as case study area because of the rich historical background and dynamic urban growth process resulting various types of settlements including historical settlements, old villages, unplanned development, squatter areas and gated communities with different densities. As the proposed model shows essential morphological characteristics of urban pattern as a morphological DNA, outputs of this model has a potential to be used in different areas such as comparative analysis of geometrically different cities, analyzing irregularities in

  1. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  2. Solid-Phase Purification of Synthetic DNA Sequences.

    Science.gov (United States)

    Grajkowski, Andrzej; Cieslak, Jacek; Beaucage, Serge L

    2016-08-05

    Although high-throughput methods for solid-phase synthesis of DNA sequences are currently available for synthetic biology applications and technologies for large-scale production of nucleic acid-based drugs have been exploited for various therapeutic indications, little has been done to develop high-throughput procedures for the purification of synthetic nucleic acid sequences. An efficient process for purification of phosphorothioate and native DNA sequences is described herein. This process consists of functionalizing commercial aminopropylated silica gel with aminooxyalkyl functions to enable capture of DNA sequences carrying a 5'-siloxyl ether linker with a "keto" function through an oximation reaction. Deoxyribonucleoside phosphoramidites functionalized with the 5'-siloxyl ether linker were prepared in yields of 75-83% and incorporated last into the solid-phase assembly of DNA sequences. Capture of nucleobase- and phosphate-deprotected DNA sequences released from the synthesis support is demonstrated to proceed near quantitatively. After shorter than full-length DNA sequences were washed from the capture support, the purified DNA sequences were released from this support upon treatment with tetra-n-butylammonium fluoride in dry DMSO. The purity of released DNA sequences exceeds 98%. The scalability and high-throughput features of the purification process are demonstrated without sacrificing purity of the DNA sequences.

  3. Affinity purification of sequence-specific DNA binding proteins.

    OpenAIRE

    1986-01-01

    We describe a method for affinity purification of sequence-specific DNA binding proteins that is fast and effective. Complementary chemically synthesized oligodeoxynucleotides that contain a recognition site for a sequence-specific DNA binding protein are annealed and ligated to give oligomers. This DNA is then covalently coupled to Sepharose CL-2B with cyanogen bromide to yield the affinity resin. A partially purified protein fraction is combined with competitor DNA and subsequently passed t...

  4. Modeling associated protein-DNA pattern discovery with unified scores.

    Science.gov (United States)

    Chan, Tak-Ming; Lo, Leung-Yau; Sze-To, Ho-Yin; Leung, Kwong-Sak; Xiao, Xinshu; Wong, Man-Hon

    2013-01-01

    Understanding protein-DNA interactions, specifically transcription factor (TF) and transcription factor binding site (TFBS) bindings, is crucial in deciphering gene regulation. The recent associated TF-TFBS pattern discovery combines one-sided motif discovery on both the TF and the TFBS sides. Using sequences only, it identifies the short protein-DNA binding cores available only in high-resolution 3D structures. The discovered patterns lead to promising subtype and disease analysis applications. While the related studies use either association rule mining or existing TFBS annotations, none has proposed any formal unified (both-sided) model to prioritize the top verifiable associated patterns. We propose the unified scores and develop an effective pipeline for associated TF-TFBS pattern discovery. Our stringent instance-level evaluations show that the patterns with the top unified scores match with the binding cores in 3D structures considerably better than the previous works, where up to 90 percent of the top 20 scored patterns are verified. We also introduce extended verification from literature surveys, where the high unified scores correspond to even higher verification percentage. The top scored patterns are confirmed to match the known WRKY binding cores with no available 3D structures and agree well with the top binding affinities of in vivo experiments.

  5. The Study of Correlation Structures of DNA Sequences A Critical Review

    CERN Document Server

    Li, W

    1997-01-01

    The study of correlation structure in the primary sequences of DNA is reviewed. The issues reviewed include: symmetries among 16 base-base correlation functions, accurate estimation of correlation measures, the relationship between $1/f$ and Lorentzian spectra, heterogeneity in DNA sequences, different modeling strategies of the correlation structure of DNA sequences, the difference of correlation structure between coding and non-coding regions (besides the period-3 pattern), and source of broad distribution of domain sizes. Although some of the results remain controversial, a body of work on this topic constitutes a good starting point for future studies.

  6. An intragenic distribution bias of DNA uptake sequences in Pasteurellaceae and Neisseriae

    NARCIS (Netherlands)

    Passel, van M.W.J.

    2008-01-01

    Most sequenced strains from Pasteurellaceae and Neisseriae contain hundreds to thousands of uptake sequence (US) motifs in their genome, which are associated with natural competence for DNA uptake. The mechanism of their recognition is still unclear, and I searched for intragenic location patterns o

  7. SWORDS: A statistical tool for analysing large DNA sequences

    Indian Academy of Sciences (India)

    Probal Chaudhuri; Sandip Das

    2002-02-01

    In this article, we present some simple yet effective statistical techniques for analysing and comparing large DNA sequences. These techniques are based on frequency distributions of DNA words in a large sequence, and have been packaged into a software called SWORDS. Using sequences available in public domain databases housed in the Internet, we demonstrate how SWORDS can be conveniently used by molecular biologists and geneticists to unmask biologically important features hidden in large sequences and assess their statistical significance.

  8. Yeast DNA sequences initiating gene expression in Escherichia coli.

    Science.gov (United States)

    Lewin, Astrid; Tran, Thi Tuyen; Jacob, Daniela; Mayer, Martin; Freytag, Barbara; Appel, Bernd

    2004-01-01

    DNA transfer between pro- and eukaryotes occurs either during natural horizontal gene transfer or as a result of the employment of gene technology. We analysed the capacity of DNA sequences from a eukaryotic donor organism (Saccharomyces cerevisiae) to serve as promoter region in a prokaryotic recipient (Escherichia coli) by creating fusions between promoterless luxAB genes from Vibrio harveyi and random DNA sequences from S. cerevisiae and measuring the luminescence of transformed E. coli. Fifty-four out of 100 randomly analysed S. cerevisiae DNA sequences caused considerable gene expression in E. coli. Determination of transcription start sites within six selected yeast sequences in E. coli confirmed the existence of bacterial -10 and -35 consensus sequences at appropriate distances upstream from transcription initiation sites. Our results demonstrate that the probability of transcription of transferred eukaryotic DNA in bacteria is extremely high and does not require the insertion of the transferred DNA behind a promoter of the recipient genome.

  9. DNA Sequence Determination by Hybridization: A Strategy for Efficient Large-Scale Sequencing

    Science.gov (United States)

    Drmanac, R.; Drmanac, S.; Strezoska, Z.; Paunesku, T.; Labat, I.; Zeremski, M.; Snoddy, J.; Funkhouser, W. K.; Koop, B.; Hood, L.; Crkvenjakov, R.

    1993-06-01

    The concept of sequencing by hybridization (SBH) makes use of an array of all possible n-nucleotide oligomers (n-mers) to identify n-mers present in an unknown DNA sequence. Computational approaches can then be used to assemble the complete sequence. As a validation of this concept, the sequences of three DNA fragments, 343 base pairs in length, were determined with octamer oligonucleotides. Possible applications of SBH include physical mapping (ordering) of overlapping DNA clones, sequence checking, DNA fingerprinting comparisons of normal and disease-causing genes, and the identification of DNA fragments with particular sequence motifs in complementary DNA and genomic libraries. The SBH techniques may accelerate the mapping and sequencing phases of the human genome project.

  10. Generating Multiple Base-Resolution DNA Methylomes Using Reduced Representation Bisulfite Sequencing.

    Science.gov (United States)

    Chatterjee, Aniruddha; Rodger, Euan J; Stockwell, Peter A; Le Mée, Gwenn; Morison, Ian M

    2017-01-01

    Reduced representation bisulfite sequencing (RRBS) is an effective technique for profiling genome-wide DNA methylation patterns in eukaryotes. RRBS couples size selection, bisulfite conversion, and second-generation sequencing to enrich for CpG-dense regions of the genome. The progressive improvement of second-generation sequencing technologies and reduction in cost provided an opportunity to examine the DNA methylation patterns of multiple genomes. Here, we describe a protocol for sequencing multiple RRBS libraries in a single sequencing reaction to generate base-resolution methylomes. Furthermore, we provide a brief guideline for base-calling and data analysis of multiplexed RRBS libraries. These strategies will be useful to perform large-scale, genome-wide DNA methylation analysis.

  11. A novel constraint for thermodynamically designing DNA sequences.

    Directory of Open Access Journals (Sweden)

    Qiang Zhang

    Full Text Available Biotechnological and biomolecular advances have introduced novel uses for DNA such as DNA computing, storage, and encryption. For these applications, DNA sequence design requires maximal desired (and minimal undesired hybridizations, which are the product of a single new DNA strand from 2 single DNA strands. Here, we propose a novel constraint to design DNA sequences based on thermodynamic properties. Existing constraints for DNA design are based on the Hamming distance, a constraint that does not address the thermodynamic properties of the DNA sequence. Using a unique, improved genetic algorithm, we designed DNA sequence sets which satisfy different distance constraints and employ a free energy gap based on a minimum free energy (MFE to gauge DNA sequences based on set thermodynamic properties. When compared to the best constraints of the Hamming distance, our method yielded better thermodynamic qualities. We then used our improved genetic algorithm to obtain lower-bound DNA sequence sets. Here, we discuss the effects of novel constraint parameters on the free energy gap.

  12. A novel constraint for thermodynamically designing DNA sequences.

    Science.gov (United States)

    Zhang, Qiang; Wang, Bin; Wei, Xiaopeng; Zhou, Changjun

    2013-01-01

    Biotechnological and biomolecular advances have introduced novel uses for DNA such as DNA computing, storage, and encryption. For these applications, DNA sequence design requires maximal desired (and minimal undesired) hybridizations, which are the product of a single new DNA strand from 2 single DNA strands. Here, we propose a novel constraint to design DNA sequences based on thermodynamic properties. Existing constraints for DNA design are based on the Hamming distance, a constraint that does not address the thermodynamic properties of the DNA sequence. Using a unique, improved genetic algorithm, we designed DNA sequence sets which satisfy different distance constraints and employ a free energy gap based on a minimum free energy (MFE) to gauge DNA sequences based on set thermodynamic properties. When compared to the best constraints of the Hamming distance, our method yielded better thermodynamic qualities. We then used our improved genetic algorithm to obtain lower-bound DNA sequence sets. Here, we discuss the effects of novel constraint parameters on the free energy gap.

  13. Mining Class-Correlated Patterns for Sequence Labeling

    Science.gov (United States)

    Hopf, Thomas; Kramer, Stefan

    Sequence labeling is the task of assigning a label sequence to an observation sequence. Since many methods to solve this problem depend on the specification of predictive features, automated methods for their derivation are desirable. Unlike in other areas of pattern-based classification, however, no algorithm to directly mine class-correlated patterns for sequence labeling has been proposed so far. We introduce the novel task of mining class-correlated sequence patterns for sequence labeling and present a supervised pattern growth algorithm to find all patterns in a set of observation sequences, which correlate with the assignment of a fixed sequence label no less than a user-specified minimum correlation constraint. From the resulting set of patterns, features for a variety of classifiers can be obtained in a straightforward manner. The efficiency of the approach and the influence of important parameters are shown in experiments on several biological datasets.

  14. Movement Pattern Analysis Based on Sequence Signatures

    Directory of Open Access Journals (Sweden)

    Seyed Hossein Chavoshi

    2015-09-01

    Full Text Available Increased affordability and deployment of advanced tracking technologies have led researchers from various domains to analyze the resulting spatio-temporal movement data sets for the purpose of knowledge discovery. Two different approaches can be considered in the analysis of moving objects: quantitative analysis and qualitative analysis. This research focuses on the latter and uses the qualitative trajectory calculus (QTC, a type of calculus that represents qualitative data on moving point objects (MPOs, and establishes a framework to analyze the relative movement of multiple MPOs. A visualization technique called sequence signature (SESI is used, which enables to map QTC patterns in a 2D indexed rasterized space in order to evaluate the similarity of relative movement patterns of multiple MPOs. The applicability of the proposed methodology is illustrated by means of two practical examples of interacting MPOs: cars on a highway and body parts of a samba dancer. The results show that the proposed method can be effectively used to analyze interactions of multiple MPOs in different domains.

  15. Structural analysis of DNA sequence: evidence for lateral gene transfer in Thermotoga maritima

    DEFF Research Database (Denmark)

    Worning, Peder; Jensen, Lars Juhl; Nelson, K. E.

    2000-01-01

    The recently published complete DNA sequence of the bacterium Thermotoga maritima provides evidence, based on protein sequence conservation, for lateral gene transfer between Archaea and Bacteria. We introduce a new method of periodicity analysis of DNA sequences, based on structural parameters......, which brings independent evidence for the lateral gene transfer in the genome of T.maritima, The structural analysis relates the Archaea-like DNA sequences to the genome of Pyrococcus horikoshii. Analysis of 24 complete genomic DNA sequences shows different periodicity patterns for organisms...... of different origin, The typical genomic periodicity for Bacteria is 11 bp whilst it is 10 bp for Archaea, Eukaryotes have more complex spectra but the dominant period in the yeast Saccharomyces cerevisiae is 10.2 bp. These periodicities are most likely reflective of differences in chromatin structure....

  16. Intricate patterns of phylogenetic relationships in the olive family as inferred from multi-locus plastid and nuclear DNA sequence analyses: a close-up on Chionanthus and Noronhia (Oleaceae).

    Science.gov (United States)

    Hong-Wa, Cynthia; Besnard, Guillaume

    2013-05-01

    Noronhia represents the most successful radiation of the olive family (Oleaceae) in Madagascar with more than 40 named endemic species distributed in all ecoregions from sea level to high mountains. Its position within the subtribe Oleinae has, however, been largely unresolved and its evolutionary history has remained unexplored. In this study, we generated a dataset of plastid (trnL-F, trnT-L, trnS-G, trnK-matK) and nuclear (internal transcribed spacer [ITS]) DNA sequences to infer phylogenetic relationships within Oleinae and to examine evolutionary patterns within Noronhia. Our sample included most species of Noronhia and representatives of the ten other extant genera within the subtribe with an emphasis on Chionanthus. Bayesian inferences and maximum likelihood analyses of plastid and nuclear data indicated several instances of paraphyly and polyphyly within Oleinae, with some geographic signal. Both plastid and ITS data showed a polyphyletic Noronhia that included Indian Ocean species of Chionanthus. They also found close relationships between Noronhia and African Chionanthus. However, the plastid data showed little clear differentiation between Noronhia and the African Chionanthus whereas relationships suggested by the nuclear ITS data were more consistent with taxonomy and geography. We used molecular dating to discriminate between hybridization and lineage sorting/gene duplication as alternative explanations for these topological discordances and to infer the biogeographic history of Noronhia. Hybridization between African Chionanthus and Noronhia could not be ruled out. However, Noronhia has long been established in Madagascar after a likely Cenozoic dispersal from Africa, suggesting any hybridization between representatives of African and Malagasy taxa was ancient. In any case, the African and Indian Ocean Chionanthus and Noronhia together formed a strongly supported monophyletic clade distinct and distant from other Chionanthus, which calls for a revised

  17. Preparing DNA libraries for multiplexed paired-end deep sequencing for Illumina GA sequencers.

    Science.gov (United States)

    Son, Mike S; Taylor, Ronald K

    2011-02-01

    Whole-genome sequencing, also known as deep sequencing, is becoming a more affordable and efficient way to identify SNP mutations, deletions, and insertions in DNA sequences across several different strains. Two major obstacles preventing the widespread use of deep sequencers are the costs involved in services used to prepare DNA libraries for sequencing and the overall accuracy of the sequencing data. This unit describes the preparation of DNA libraries for multiplexed paired-end sequencing using the Illumina GA series sequencer. Self-preparation of DNA libraries can help reduce overall expenses, especially if optimization is required for the different samples, and use of the Illumina GA Sequencer can improve the quality of the data.

  18. Statistical assignment of DNA sequences using Bayesian phylogenetics

    DEFF Research Database (Denmark)

    Terkelsen, Kasper Munch; Boomsma, Wouter Krogh; Huelsenbeck, John P.;

    2008-01-01

    -analysis of previously published ancient DNA data and show that, with high statistical confidence, most of the published sequences are in fact of Neanderthal origin. However, there are several cases of chimeric sequences that are comprised of a combination of both Neanderthal and modern human DNA....

  19. Levenshtein error-correcting barcodes for multiplexed DNA sequencing

    NARCIS (Netherlands)

    Buschmann, Tilo; Bystrykh, Leonid V.

    2013-01-01

    Background: High-throughput sequencing technologies are improving in quality, capacity and costs, providing versatile applications in DNA and RNA research. For small genomes or fraction of larger genomes, DNA samples can be mixed and loaded together on the same sequencing track. This so-called multi

  20. New scoring schema for finding motifs in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Nowzari-Dalini Abbas

    2009-03-01

    Full Text Available Abstract Background Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions. Results We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions. Conclusion The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple

  1. Food Fish Identification from DNA Extraction through Sequence Analysis

    Science.gov (United States)

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  2. Food Fish Identification from DNA Extraction through Sequence Analysis

    Science.gov (United States)

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  3. Mesoscopic Model for Free Energy Landscape Analysis of DNA sequences

    CERN Document Server

    Tapia-Rojo, R; Mazo, J J; Falo, F; 10.1103/PhysRevE.86.021908

    2012-01-01

    A mesoscopic model which allows us to identify and quantify the strength of binding sites in DNA sequences is proposed. The model is based on the Peyrard-Bishop-Dauxois model for the DNA chain coupled to a Brownian particle which explores the sequence interacting more importantly with open base pairs of the DNA chain. We apply the model to promoter sequences of different organisms. The free energy landscape obtained for these promoters shows a complex structure that is strongly connected to their biological behavior. The analysis method used is able to quantify free energy differences of sites within genome sequences.

  4. Cloning and sequencing of mouse GABA transporter complementary DNA

    Institute of Scientific and Technical Information of China (English)

    TAMANTHONYC.W.; LIHEGUO; 等

    1994-01-01

    A cDNA encoding the mouse GABA transporter has been isolated and sequenced.The results show that the mouse GABA transporter cDNA differs from that of the rat by 60 base pairs at the open reading frame region but the deduced amino acid sequences of the two cDNAs are identical and both composed of 599 amino acids.However,the amino acid sequence is different from the sequence deduced from a recently published mouse GABA transporter cDNA.

  5. Affordable Hands-On DNA Sequencing and Genotyping: An Exercise for Teaching DNA Analysis to Undergraduates

    Science.gov (United States)

    Shah, Kushani; Thomas, Shelby; Stein, Arnold

    2013-01-01

    In this report, we describe a 5-week laboratory exercise for undergraduate biology and biochemistry students in which students learn to sequence DNA and to genotype their DNA for selected single nucleotide polymorphisms (SNPs). Students use miniaturized DNA sequencing gels that require approximately 8 min to run. The students perform G, A, T, C…

  6. Single-Round Patterned DNA Library Microarray Aptamer Lead Identification

    Directory of Open Access Journals (Sweden)

    Jennifer A. Martin

    2015-01-01

    Full Text Available A method for identifying an aptamer in a single round was developed using custom DNA microarrays containing computationally derived patterned libraries incorporating no information on the sequences of previously reported thrombin binding aptamers. The DNA library was specifically designed to increase the probability of binding by enhancing structural complexity in a sequence-space confined environment, much like generating lead compounds in a combinatorial drug screening library. The sequence demonstrating the highest fluorescence intensity upon target addition was confirmed to bind the target molecule thrombin with specificity by surface plasmon resonance, and a novel imino proton NMR/2D NOESY combination was used to screen the structure for G-quartet formation. We propose that the lack of G-quartet structure in microarray-derived aptamers may highlight differences in binding mechanisms between surface-immobilized and solution based strategies. This proof-of-principle study highlights the use of a computational driven methodology to create a DNA library rather than a SELEX based approach. This work is beneficial to the biosensor field where aptamers selected by solution based evolution have proven challenging to retain binding function when immobilized on a surface.

  7. Effects of Sequence on Transmission Properties of DNA Molecules

    Institute of Scientific and Technical Information of China (English)

    DONG Rui-Xin; YAN Xun-Ling; YANG Bing

    2008-01-01

    A double helix model of charge transport in DNA molecule is given and the transmission spectra of four DNA sequences are obtained. The calculated results show that the transmission characteristics of DNA are not only related to the longitudinal transport but also to the transverse transport of molecule. The periodic sequence with the same composition has stronger conduction ability. With the increasing of bases composition, the conductive ability reduces, but the weight of θ direction rises in charge transfer.

  8. DNA Polymerases Drive DNA Sequencing-by-Synthesis Technologies: Both Past and Present

    Directory of Open Access Journals (Sweden)

    Cheng-Yao eChen

    2014-06-01

    Full Text Available Next-generation sequencing (NGS technologies have revolutionized modern biological and biomedical research. The engines responsible for this innovation are DNA polymerases; they catalyze the biochemical reaction for deriving template sequence information. In fact, DNA polymerase has been a cornerstone of DNA sequencing from the very beginning. E. coli DNA polymerase I proteolytic (Klenow fragment was originally utilized in Sanger's dideoxy chain terminating DNA sequencing chemistry. From these humble beginnings followed an explosion of organism-specific, genome sequence information accessible via public database. Family A/B DNA polymerases from mesophilic/thermophilic bacteria/archaea were modified and tested in today's standard capillary electrophoresis (CE and NGS sequencing platforms. These enzymes were selected for their efficient incorporation of bulky dye-terminator and reversible dye-terminator nucleotides respectively. Third generation, real-time single molecule sequencing platform requires slightly different enzyme properties. Enterobacterial phage ⱷ29 DNA polymerase copies long stretches of DNA and possesses a unique capability to efficiently incorporate terminal phosphate-labeled nucleoside polyphosphates. Furthermore, ⱷ29 enzyme has also been utilized in emerging DNA sequencing technologies including nanopore-, and protein-transistor-based sequencing. DNA polymerase is, and will continue to be, a crucial component of sequencing technologies.

  9. Intermittency as a universal characteristic of the complete chromosome DNA sequences of eukaryotes: From protozoa to human genomes

    Science.gov (United States)

    Rybalko, S.; Larionov, S.; Poptsova, M.; Loskutov, A.

    2011-10-01

    Large-scale dynamical properties of complete chromosome DNA sequences of eukaryotes are considered. Using the proposed deterministic models with intermittency and symbolic dynamics we describe a wide spectrum of large-scale patterns inherent in these sequences, such as segmental duplications, tandem repeats, and other complex sequence structures. It is shown that the recently discovered gene number balance on the strands is not of a random nature, and certain subsystems of a complete chromosome DNA sequence exhibit the properties of deterministic chaos.

  10. ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors.

    Science.gov (United States)

    Chu, Wen-Yi; Huang, Yu-Feng; Huang, Chun-Chin; Cheng, Yi-Sheng; Huang, Chien-Kang; Oyang, Yen-Jen

    2009-07-01

    This article presents the design of a sequence-based predictor named ProteDNA for identifying the sequence-specific binding residues in a transcription factor (TF). Concerning protein-DNA interactions, there are two types of binding mechanisms involved, namely sequence-specific binding and nonspecific binding. Sequence-specific bindings occur between protein sidechains and nucleotide bases and correspond to sequence-specific recognition of genes. Therefore, sequence-specific bindings are essential for correct gene regulation. In this respect, ProteDNA is distinctive since it has been designed to identify sequence-specific binding residues. In order to accommodate users with different application needs, ProteDNA has been designed to operate under two modes, namely, the high-precision mode and the balanced mode. According to the experiments reported in this article, under the high-precision mode, ProteDNA has been able to deliver precision of 82.3%, specificity of 99.3%, sensitivity of 49.8% and accuracy of 96.5%. Meanwhile, under the balanced mode, ProteDNA has been able to deliver precision of 60.8%, specificity of 97.6%, sensitivity of 60.7% and accuracy of 95.4%. ProteDNA is available at the following websites: http://protedna.csbb.ntu.edu.tw/, http://protedna.csie.ntu.edu.tw/, http://bio222.esoe.ntu.edu.tw/ProteDNA/.

  11. DNA Shape Dominates Sequence Affinity in Nucleosome Formation

    Science.gov (United States)

    Freeman, Gordon S.; Lequieu, Joshua P.; Hinckley, Daniel M.; Whitmer, Jonathan K.; de Pablo, Juan J.

    2014-10-01

    Nucleosomes provide the basic unit of compaction in eukaryotic genomes, and the mechanisms that dictate their position at specific locations along a DNA sequence are of central importance to genetics. In this Letter, we employ molecular models of DNA and proteins to elucidate various aspects of nucleosome positioning. In particular, we show how DNA's histone affinity is encoded in its sequence-dependent shape, including subtle deviations from the ideal straight B-DNA form and local variations of minor groove width. By relying on high-precision simulations of the free energy of nucleosome complexes, we also demonstrate that, depending on DNA's intrinsic curvature, histone binding can be dominated by bending interactions or electrostatic interactions. More generally, the results presented here explain how sequence, manifested as the shape of the DNA molecule, dominates molecular recognition in the problem of nucleosome positioning.

  12. Next Generation Sequencing of Ancient DNA: Requirements, Strategies and Perspectives

    Directory of Open Access Journals (Sweden)

    Michael Knapp

    2010-07-01

    Full Text Available The invention of next-generation-sequencing has revolutionized almost all fields of genetics, but few have profited from it as much as the field of ancient DNA research. From its beginnings as an interesting but rather marginal discipline, ancient DNA research is now on its way into the centre of evolutionary biology. In less than a year from its invention next-generation-sequencing had increased the amount of DNA sequence data available from extinct organisms by several orders of magnitude. Ancient DNA  research is now not only adding a temporal aspect to evolutionary studies and allowing for the observation of evolution in real time, it also provides important data to help understand the origins of our own species. Here we review progress that has been made in next-generation-sequencing of ancient DNA over the past five years and evaluate sequencing strategies and future directions.

  13. A fast Boyer-Moore type pattern matching algorithm for highly similar sequences.

    Science.gov (United States)

    Ben Nsira, Nadia; Lecroq, Thierry; Elloumi, Mourad

    2015-01-01

    In the last decade, biology and medicine have undergone a fundamental change: next generation sequencing (NGS) technologies have enabled to obtain genomic sequences very quickly and at small costs compared to the traditional Sanger method. These NGS technologies have thus permitted to collect genomic sequences (genes, exomes or even full genomes) of individuals of the same species. These latter sequences are identical to more than 99%. There is thus a strong need for efficient algorithms for indexing and performing fast pattern matching in such specific sets of sequences. In this paper we propose a very efficient algorithm that solves the exact pattern matching problem in a set of highly similar DNA sequences where only the pattern can be pre-processed. This new algorithm extends variants of the Boyer-Moore exact string matching algorithm. Experimental results show that it exhibits the best performances in practice.

  14. An Optimal Seed Based Compression Algorithm for DNA Sequences

    Directory of Open Access Journals (Sweden)

    Pamela Vinitha Eric

    2016-01-01

    Full Text Available This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms.

  15. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    coding regions are the DNA uptake sequences (DUS) required for natural genetic transformation. More importantly, we found a significantly higher density of DUS within genes involved in DNA repair, recombination, restriction-modification and replication than in any other annotated gene group......Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within...

  16. DNA splice site sequences clustering method for conservativeness analysis

    Institute of Scientific and Technical Information of China (English)

    Quanwei Zhang; Qinke Peng; Tao Xu

    2009-01-01

    DNA sequences that are near to splice sites have remarkable conservativeness,and many researchers have contributed to the prediction of splice site.In order to mine the underlying biological knowledge,we analyze the conservativeness of DNA splice site adjacent sequences by clustering.Firstly,we propose a kind of DNA splice site sequences clustering method which is based on DBSCAN,and use four kinds of dissimilarity calculating methods.Then,we analyze the conservative feature of the clustering results and the experimental data set.

  17. Thermodynamics of sequence-specific binding of PNA to DNA

    DEFF Research Database (Denmark)

    Ratilainen, T; Holmén, A; Tuite, E

    2000-01-01

    For further characterization of the hybridization properties of peptide nucleic acids (PNAs), the thermodynamics of hybridization of mixed sequence PNA-DNA duplexes have been studied. We have characterized the binding of PNA to DNA in terms of binding affinity (perfectly matched duplexes) and seq......For further characterization of the hybridization properties of peptide nucleic acids (PNAs), the thermodynamics of hybridization of mixed sequence PNA-DNA duplexes have been studied. We have characterized the binding of PNA to DNA in terms of binding affinity (perfectly matched duplexes...

  18. PNA Directed Sequence Addressed Self-Assembly of DNA Nanostructures

    Science.gov (United States)

    Nielsen, Peter E.

    2008-10-01

    Peptide nucleic acids (PNA) can be designed to target duplex DNA with very high sequence specificity and efficiency via various binding modes. We have designed three domain PNA clamps, that bind stably to predefined decameric homopurine targets in large dsDNA molecules and via a third PNA domain sequence specifically recognize another PNA oligomer. We describe how such three domain PNAs have utility for assembling dsDNA grid and clover leaf structures, and in combination with SNAP-tag technology of protein dsDNA structures.

  19. Current-voltage characteristics of double-strand DNA sequences

    Science.gov (United States)

    Bezerril, L. M.; Moreira, D. A.; Albuquerque, E. L.; Fulco, U. L.; de Oliveira, E. L.; de Sousa, J. S.

    2009-09-01

    We use a tight-binding formulation to investigate the transmissivity and the current-voltage (I-V) characteristics of sequences of double-strand DNA molecules. In order to reveal the relevance of the underlying correlations in the nucleotides distribution, we compare the results for the genomic DNA sequence with those of artificial sequences (the long-range correlated Fibonacci and Rudin-Shapiro one) and a random sequence, which is a kind of prototype of a short-range correlated system. The random sequence is presented here with the same first neighbors pair correlations of the human DNA sequence. We found that the long-range character of the correlations is important to the transmissivity spectra, although the I-V curves seem to be mostly influenced by the short-range correlations.

  20. Current-voltage characteristics of double-strand DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bezerril, L.M.; Moreira, D.A. [Departamento de Fisica, Universidade Federal do Rio Grande do Norte, 59072-970, Natal-RN (Brazil); Albuquerque, E.L., E-mail: eudenilson@dfte.ufrn.b [Departamento de Fisica, Universidade Federal do Rio Grande do Norte, 59072-970, Natal-RN (Brazil); Fulco, U.L. [Departamento de Biofisica e Farmacologia, Universidade Federal do Rio Grande do Norte, 59072-970, Natal-RN (Brazil); Oliveira, E.L. de; Sousa, J.S. de [Departamento de Fisica, Universidade Federal do Ceara, 60455-760, Fortaleza-CE (Brazil)

    2009-09-07

    We use a tight-binding formulation to investigate the transmissivity and the current-voltage (I-V) characteristics of sequences of double-strand DNA molecules. In order to reveal the relevance of the underlying correlations in the nucleotides distribution, we compare the results for the genomic DNA sequence with those of artificial sequences (the long-range correlated Fibonacci and Rudin-Shapiro one) and a random sequence, which is a kind of prototype of a short-range correlated system. The random sequence is presented here with the same first neighbors pair correlations of the human DNA sequence. We found that the long-range character of the correlations is important to the transmissivity spectra, although the I-V curves seem to be mostly influenced by the short-range correlations.

  1. Characteristics of alternating current hopping conductivity in DNA sequences

    Institute of Scientific and Technical Information of China (English)

    Ma Song-Shan; Xu Hui; Wang Huan-You; Guo Rui

    2009-01-01

    This paper presents a model to describe alternating current (AC) conductivity of DNA sequences,in which DNA is considered as a one-dimensional (1D) disordered system,and electrons transport via hopping between localized states.It finds that AC conductivity in DNA sequences increases as the frequency of the external electric field rises,and it takes the form of σac(ω)~ω2 ln2(1/ω).Also AC conductivity of DNA sequences increases with the increase of temperature,this phenomenon presents characteristics of weak temperature-dependence.Meanwhile,the AC conductivity in an off diagonally correlated case is much larger than that in the uncorrelated case of the Anderson limit in low temperatures,which indicates that the off-diagonal correlations in DNA sequences have a great effect on the AC conductivity,while at high temperature the off-diagonal correlations no longer play a vital role in electric transport. In addition,the proportion of nucleotide pairs p also plays an important role in AC electron transport of DNA sequences.For p<0.5,the conductivity of DNA sequence decreases with the increase of p,while for p > 0.5,the conductivity increases with the increase of p.

  2. Repetitive sequence analysis and karyotyping reveals centromere-associated DNA sequences in radish (Raphanus sativus L.).

    Science.gov (United States)

    He, Qunyan; Cai, Zexi; Hu, Tianhua; Liu, Huijun; Bao, Chonglai; Mao, Weihai; Jin, Weiwei

    2015-04-18

    Radish (Raphanus sativus L., 2n = 2x = 18) is a major root vegetable crop especially in eastern Asia. Radish root contains various nutritions which play an important role in strengthening immunity. Repetitive elements are primary components of the genomic sequence and the most important factors in genome size variations in higher eukaryotes. To date, studies about repetitive elements of radish are still limited. To better understand genome structure of radish, we undertook a study to evaluate the proportion of repetitive elements and their distribution in radish. We conducted genome-wide characterization of repetitive elements in radish with low coverage genome sequencing followed by similarity-based cluster analysis. Results showed that about 31% of the genome was composed of repetitive sequences. Satellite repeats were the most dominating elements of the genome. The distribution pattern of three satellite repeat sequences (CL1, CL25, and CL43) on radish chromosomes was characterized using fluorescence in situ hybridization (FISH). CL1 was predominantly located at the centromeric region of all chromosomes, CL25 located at the subtelomeric region, and CL43 was a telomeric satellite. FISH signals of two satellite repeats, CL1 and CL25, together with 5S rDNA and 45S rDNA, provide useful cytogenetic markers to identify each individual somatic metaphase chromosome. The centromere-specific histone H3 (CENH3) has been used as a marker to identify centromere DNA sequences. One putative CENH3 (RsCENH3) was characterized and cloned from radish. Its deduced amino acid sequence shares high similarities to those of the CENH3s in Brassica species. An antibody against B. rapa CENH3, specifically stained radish centromeres. Immunostaining and chromatin immunoprecipitation (ChIP) tests with anti-BrCENH3 antibody demonstrated that both the centromere-specific retrotransposon (CR-Radish) and satellite repeat (CL1) are directly associated with RsCENH3 in radish. Proportions

  3. Patterns of DNA barcode variation in Canadian marine molluscs.

    Directory of Open Access Journals (Sweden)

    Kara K S Layton

    Full Text Available BACKGROUND: Molluscs are the most diverse marine phylum and this high diversity has resulted in considerable taxonomic problems. Because the number of species in Canadian oceans remains uncertain, there is a need to incorporate molecular methods into species identifications. A 648 base pair segment of the cytochrome c oxidase subunit I gene has proven useful for the identification and discovery of species in many animal lineages. While the utility of DNA barcoding in molluscs has been demonstrated in other studies, this is the first effort to construct a DNA barcode registry for marine molluscs across such a large geographic area. METHODOLOGY/PRINCIPAL FINDINGS: This study examines patterns of DNA barcode variation in 227 species of Canadian marine molluscs. Intraspecific sequence divergences ranged from 0-26.4% and a barcode gap existed for most taxa. Eleven cases of relatively deep (>2% intraspecific divergence were detected, suggesting the possible presence of overlooked species. Structural variation was detected in COI with indels found in 37 species, mostly bivalves. Some indels were present in divergent lineages, primarily in the region of the first external loop, suggesting certain areas are hotspots for change. Lastly, mean GC content varied substantially among orders (24.5%-46.5%, and showed a significant positive correlation with nearest neighbour distances. CONCLUSIONS/SIGNIFICANCE: DNA barcoding is an effective tool for the identification of Canadian marine molluscs and for revealing possible cases of overlooked species. Some species with deep intraspecific divergence showed a biogeographic partition between lineages on the Atlantic, Arctic and Pacific coasts, suggesting the role of Pleistocene glaciations in the subdivision of their populations. Indels were prevalent in the barcode region of the COI gene in bivalves and gastropods. This study highlights the efficacy of DNA barcoding for providing insights into sequence variation

  4. DNA Polymer Brush Patterning through Photocontrollable Surface-Initiated DNA Hybridization Chain Reaction.

    Science.gov (United States)

    Huang, Fujian; Zhou, Xiang; Yao, Dongbao; Xiao, Shiyan; Liang, Haojun

    2015-11-18

    The fabrication of DNA polymer brushes with spatial resolution onto a solid surface is a crucial step for biochip research and related applications, cell-free gene expression study, and even artificial cell fabrication. Here, for the first time, a DNA polymer brush patterning method is reported based on the photoactivation of an ortho-nitrobenzyl linker-embedded DNA hairpin structure and a subsequent surface-initiated DNA hybridization chain reaction (HCR). Inert DNA hairpins are exposed to ultraviolet light irradiation to generate DNA duplexes with two active sticky ends (toeholds) in a programmable manner. These activated DNA duplexes can initiate DNA HCR to generate multifunctional patterned DNA polymer brushes with complex geometrical shapes. Different multifunctional DNA polymer brush patterns can be fabricated on certain areas of the same solid surface using this method. Moreover, the patterned DNA brush surface can be used to capture target molecules in a desired manner.

  5. Spectroscopic investigation on the telomeric DNA base sequence repeat

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    Telomeres are protein-DNA complexes at the terminals of linear chromosomes, which protect chromosomal integrity and maintain cellular replicative capacity.From single-cell organisms to advanced animals and plants,structures and functions of telomeres are both very conservative. In cells of human and vertebral animals, telomeric DNA base sequences all are (TTAGGG)n. In the present work, we have obtained absorption and fluorescence spectra measured from seven synthesized oligonucleotides to simulate the telomeric DNA system and calculated their relative fluorescence quantum yields on which not only telomeric DNA characteristics are predicted but also possibly the shortened telomeric sequences during cell division are imrelative fluorescence quantum yield and remarkable excitation energy innerconversion, which tallies with the telomeric sequence of (TTAGGG)n. This result shows that telomeric DNA has a strong non-radiative or innerconvertible capability.``

  6. ATRF Houses the Latest DNA Sequencing Technologies | Poster

    Science.gov (United States)

    By Ashley DeVine, Staff Writer By the end of October, the Advanced Technology Research Facility (ATRF) will be one of the few facilities in the world to house all of the latest DNA sequencing technologies.

  7. Repetitive DNA Sequences in Wheat and Its Relatives

    Institute of Scientific and Technical Information of China (English)

    ZHANG Xue-yong; LI Da-yong

    2001-01-01

    Repetitive DNA sequences form a large portion of eukaryote genomes. Using wheat ( Triticum )as a model, the classification, features and functions of repetitive DNA sequences in the Tritieeae grass tribe is reviewed as well as the role of these sequences in genome differentiation, control and regulation of homologous chromosome synapsis and pairing. Transposable elements, as an important portion of dispersed repetitives,may play an essential role in gene mutation of the host. Dynamic models for change of copy number and sequences of the repetitive family are also presented after the models of Charlesworth et al. Application of repetitive DNA sequences in the study of evolution, chromosome fingerprinting and marker assisted gene transfer and breeding are described by taking wheat as an example.

  8. Which Are More Random: Coding or Noncoding DNA Sequences?

    Institute of Scientific and Technical Information of China (English)

    WU Fang; ZHENG Wei-Mou

    2002-01-01

    Evidence seems to show that coding DNA is more random than noncoding DNA, but other conflictingevidence also exists. Based on the third-base degeneracy of codons, we regard the third position of codons as a 'noisy'position. By deleting one fixed position of non-overlapping triplets in a given sequence, three masked sequences may bededuced from the sequence. We have investigated the block-to-site mutual information functions of coding and noncodingsequences in yeast without and with the masking. Characteristics that distinguish coding from noncoding DNA havebeen found. It is observed that the strong correlations in the coding regions may be blocked by the third base of codons,and the proper masking can extract the correlations. Distribution of dimeric tandem repeats of unmasked sequences isalso compared with that of masked sequences.

  9. Effects of sequence on DNA wrapping around histones

    Science.gov (United States)

    Ortiz, Vanessa

    2011-03-01

    A central question in biophysics is whether the sequence of a DNA strand affects its mechanical properties. In epigenetics, these are thought to influence nucleosome positioning and gene expression. Theoretical and experimental attempts to answer this question have been hindered by an inability to directly resolve DNA structure and dynamics at the base-pair level. In our previous studies we used a detailed model of DNA to measure the effects of sequence on the stability of naked DNA under bending. Sequence was shown to influence DNA's ability to form kinks, which arise when certain motifs slide past others to form non-native contacts. Here, we have now included histone-DNA interactions to see if the results obtained for naked DNA are transferable to the problem of nucleosome positioning. Different DNA sequences interacting with the histone protein complex are studied, and their equilibrium and mechanical properties are compared among themselves and with the naked case. NLM training grant to the Computation and Informatics in Biology and Medicine Training Program (NLM T15LM007359).

  10. PNA Directed Sequence Addressed Self-Assembly of DNA Nanostructures

    DEFF Research Database (Denmark)

    Nielsen, Peter E.

    2008-01-01

    sequence specifically recognize another PNA oligomer. We describe how such three domain PNAs have utility for assembling dsDNA grid and clover leaf structures, and in combination with SNAP-tag technol. of protein dsDNA structures. (c) 2008 American Institute of Physics. [on SciFinder (R)] Udgivelsesdato...

  11. PNA Directed Sequence Addressed Self-Assembly of DNA Nanostructures

    DEFF Research Database (Denmark)

    Nielsen, Peter E.

    2008-01-01

    sequence specifically recognize another PNA oligomer. We describe how such three domain PNAs have utility for assembling dsDNA grid and clover leaf structures, and in combination with SNAP-tag technol. of protein dsDNA structures. (c) 2008 American Institute of Physics. [on SciFinder (R)] Udgivelsesdato...

  12. Sequence dependence of electron-induced DNA strand breakage revealed by DNA nanoarrays

    DEFF Research Database (Denmark)

    Keller, Adrian; Rackwitz, Jenny; Cauët, Emilie

    2014-01-01

    The electronic structure of DNA is determined by its nucleotide sequence, which is for instance exploited in molecular electronics. Here we demonstrate that also the DNA strand breakage induced by low-energy electrons (18 eV) depends on the nucleotide sequence. To determine the absolute cross...

  13. Biometric Authentication Using ElGamal Cryptosystem And DNA Sequence

    Directory of Open Access Journals (Sweden)

    V.SAMUEL SUSAN

    2010-06-01

    Full Text Available Biometrics are automated methods of identifying a person or verifying the identity of a person based on a Physiological or behavioral characteristic. Physiological haracteristics include hand or finger images, facial characteristics and iris recognition. Behavioral characteristics include dynamic signature verification, speaker verification and keystroke dynamics. DNA is unique feature among individuals. DNA provides high security level, long term stability, user acceptance and is intrusive. Combining ElGamal cryptosystem and DNA sequence, a novel biometric authentication scheme is proposed.

  14. Protein sequence for clustering DNA based on Artificial Neural Networks

    Directory of Open Access Journals (Sweden)

    Gamal. F. Elhadi

    2012-01-01

    Full Text Available DNA is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. Clustering is a process that groups a set of objects into clusters so that the similarity among objects in the same cluster is high, while that among the objects in different clusters is low. In this paper, we proposed an approach for clustering DNA sequences using Self-Organizing Map (SOM algorithm and Protein Sequence. The main objective is to analyze biological data and to bunch DNA to many clusters more easily and efficiently. We use the proposed approach to analyze both large and small amount of input DNA sequences. The results show that the similarity of the sequences does not depend on the amount of input sequences. Our approach depends on evaluating the degree of the DNA sequences similarity using the hierarchal representation Dendrogram. Representing large amount of data using hierarchal tree gives the ability to compare large sequences efficiently

  15. Sequencing and Analysis of Neanderthal Genomic DNA

    OpenAIRE

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo, Svante; Pritchard, Jonathan K; Rubin, Edward M.

    2006-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library a...

  16. DNA fingerprinting, DNA barcoding, and next generation sequencing technology in plants.

    Science.gov (United States)

    Sucher, Nikolaus J; Hennell, James R; Carles, Maria C

    2012-01-01

    DNA fingerprinting of plants has become an invaluable tool in forensic, scientific, and industrial laboratories all over the world. PCR has become part of virtually every variation of the plethora of approaches used for DNA fingerprinting today. DNA sequencing is increasingly used either in combination with or as a replacement for traditional DNA fingerprinting techniques. A prime example is the use of short, standardized regions of the genome as taxon barcodes for biological identification of plants. Rapid advances in "next generation sequencing" (NGS) technology are driving down the cost of sequencing and bringing large-scale sequencing projects into the reach of individual investigators. We present an overview of recent publications that demonstrate the use of "NGS" technology for DNA fingerprinting and DNA barcoding applications.

  17. Aptamer-Binding Directed DNA Origami Pattern for Logic Gates.

    Science.gov (United States)

    Yang, Jing; Jiang, Shuoxing; Liu, Xiangrong; Pan, Linqiang; Zhang, Cheng

    2016-12-14

    In this study, an aptamer-substrate strategy is introduced to control programmable DNA origami pattern. Combined with DNA aptamer-substrate binding and DNAzyme-cutting, small DNA tiles were specifically controlled to fill into the predesigned DNA origami frame. Here, a set of DNA logic gates (OR, YES, and AND) are performed in response to the stimuli of adenosine triphosphate (ATP) and cocaine. The experimental results are confirmed by AFM imaging and time-dependent fluorescence changes, demonstrating that the geometric patterns are regulated in a controllable and programmable manner. Our approach provides a new platform for engineering programmable origami nanopatterns and constructing complex DNA nanodevices.

  18. Sequence dependence of electron-induced DNA strand breakage revealed by DNA nanoarrays

    DEFF Research Database (Denmark)

    Keller, Adrian; Rackwitz, Jenny; Cauët, Emilie;

    2014-01-01

    sections for electron induced single strand breaks in specific 13 mer oligonucleotides we used atomic force microscopy analysis of DNA origami based DNA nanoarrays. We investigated the DNA sequences 5'-TT(XYX)3TT with X = A, G, C and Y = T, BrU 5-bromouracil and found absolute strand break cross sections...

  19. Algorithms for mapping high-throughput DNA sequences

    DEFF Research Database (Denmark)

    Frellsen, Jes; Menzel, Peter; Krogh, Anders

    2014-01-01

    Abstract High-throughput sequencing (HTS) technologies revolutionized the field of molecular biology by enabling large scale whole genome sequencing as well as a broad range of experiments for studying the cell's inner workings directly on DNA or RNA level. Given the dramatically increased rate...

  20. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  1. Zinc finger recombinases with adaptable DNA sequence specificity.

    Directory of Open Access Journals (Sweden)

    Chris Proudfoot

    Full Text Available Site-specific recombinases have become essential tools in genetics and molecular biology for the precise excision or integration of DNA sequences. However, their utility is currently limited to circumstances where the sites recognized by the recombinase enzyme have been introduced into the DNA being manipulated, or natural 'pseudosites' are already present. Many new applications would become feasible if recombinase activity could be targeted to chosen sequences in natural genomic DNA. Here we demonstrate efficient site-specific recombination at several sequences taken from a 1.9 kilobasepair locus of biotechnological interest (in the bovine β-casein gene, mediated by zinc finger recombinases (ZFRs, chimaeric enzymes with linked zinc finger (DNA recognition and recombinase (catalytic domains. In the "Z-sites" tested here, 22 bp casein gene sequences are flanked by 9 bp motifs recognized by zinc finger domains. Asymmetric Z-sites were recombined by the concomitant action of two ZFRs with different zinc finger DNA-binding specificities, and could be recombined with a heterologous site in the presence of a third recombinase. Our results show that engineered ZFRs may be designed to promote site-specific recombination at many natural DNA sequences.

  2. cDNA cloning and sequencing of ostrich Growth hormone

    Directory of Open Access Journals (Sweden)

    Doosti Abbas

    2012-01-01

    Full Text Available In recent years, industrial breeding of ostrich (Struthio camelus has been widely developed in Iran. Growth hormone (GH is a peptide hormone that stimulates growth and cell reproduction in different animals. The aim of this study was to clone and sequence the ostrich growth hormone gene in E. coli, done for the first time in Iran. The cDNA that encodes ostrich growth hormone was isolated from total mRNA of the pituitary gland and amplified by RT-PCR using GH specific PCR primers. Then GH cDNA was cloned by T/A cloning technique and the construct was transformed into E. coli. Finally, GH cDNA sequence was submitted to the GenBank (Accession number: JN559394. The results of present study showed that GH cDNA was successfully cloned in E. coli. Sequencing confirmed that GH cDNA was cloned and that the length of ostrich GH cDNA was 672 bp; BLAST search showed that the sequence of growth hormone cDNA of the ostrich from Iran has 100% homology with other records existing in GenBank.

  3. Polyamide platinum anticancer complexes designed to target specific DNA sequences.

    Science.gov (United States)

    Jaramillo, David; Wheate, Nial J; Ralph, Stephen F; Howard, Warren A; Tor, Yitzhak; Aldrich-Wright, Janice R

    2006-07-24

    Two new platinum complexes, trans-chlorodiammine[N-(2-aminoethyl)-4-[4-(N-methylimidazole-2-carboxamido)-N-methylpyrrole-2-carboxamido]-N-methylpyrrole-2-carboxamide]platinum(II) chloride (DJ1953-2) and trans-chlorodiammine[N-(6-aminohexyl)-4-[4-(N-methylimidazole-2-carboxamido)-N-methylpyrrole-2-carboxamido]-N-methylpyrrole-2-carboxamide]platinum(II) chloride (DJ1953-6) have been synthesized as proof-of-concept molecules in the design of agents that can specifically target genes in DNA. Coordinate covalent binding to DNA was demonstrated with electrospray ionization mass spectrometry. Using circular dichroism, these complexes were found to show greater DNA binding affinity to the target sequence: d(CATTGTCAGAC)(2), than toward either d(GTCTGTCAATG)(2,) which contains different flanking sequences, or d(CATTGAGAGAC)(2), which contains a double base pair mismatch sequence. DJ1953-2 unwinds the DNA helix by around 13 degrees , but neither metal complex significantly affects the DNA melting temperature. Unlike simple DNA minor groove binders, DJ1953-2 is able to inhibit, in vitro, RNA synthesis. The cytotoxicity of both metal complexes in the L1210 murine leukaemia cell line was also determined, with DJ1953-6 (34 microM) more active than DJ1953-2 (>50 microM). These results demonstrate the potential of polyamide platinum complexes and provide the structural basis for designer agents that are able to recognize biologically relevant sequences and prevent DNA transcription and replication.

  4. Selective binding of anti-DNA antibodies to native dsDNA fragments of differing sequence.

    Science.gov (United States)

    Uccellini, Melissa B; Busto, Patricia; Debatis, Michelle; Marshak-Rothstein, Ann; Viglianti, Gregory A

    2012-03-30

    Systemic autoimmune diseases are characterized by the development of autoantibodies directed against a limited subset of nuclear antigens, including DNA. DNA-specific B cells take up mammalian DNA through their B cell receptor, and this DNA is subsequently transported to an endosomal compartment where it can potentially engage TLR9. We have previously shown that ssDNA-specific B cells preferentially bind to particular DNA sequences, and antibody specificity for short synthetic oligodeoxynucleotides (ODNs). Since CpG-rich DNA, the ligand for TLR9 is found in low abundance in mammalian DNA, we sought to determine whether antibodies derived from DNA-reactive B cells showed binding preference for CpG-rich native dsDNA, and thereby select immunostimulatory DNA for delivery to TLR9. We examined a panel of anti-DNA antibodies for binding to CpG-rich and CpG-poor DNA fragments. We show that a number of anti-DNA antibodies do show preference for binding to certain native dsDNA fragments of differing sequence, but this does not correlate directly with the presence of CpG dinucleotides. An antibody with preference for binding to a fragment containing optimal CpG motifs was able to promote B cell proliferation to this fragment at 10-fold lower antibody concentrations than an antibody that did not selectively bind to this fragment, indicating that antibody binding preference can influence autoreactive B cell responses.

  5. Nanopore-based Fourth-generation DNA Sequencing Technology

    Institute of Scientific and Technical Information of China (English)

    Yanxiao Feng; Yuechuan Zhang; Cuifeng Ying; Deqiang Wang; Chunlei Du

    2015-01-01

    Nanopore-based sequencers, as the fourth-generation DNA sequencing technology, have the potential to quickly and reliably sequence the entire human genome for less than $1000, and possibly for even less than$100. The single-molecule techniques used by this technology allow us to further study the interaction between DNA and protein, as well as between protein and protein. Nanopore analysis opens a new door to molecular biology investigation at the single-molecule scale. In this article, we have reviewed academic achievements in nanopore technology from the past as well as the latest advances, including both biological and solid-state nanopores, and discussed their recent and potential applications.

  6. Directed PCR-free engineering of highly repetitive DNA sequences

    Directory of Open Access Journals (Sweden)

    Preissler Steffen

    2011-09-01

    Full Text Available Abstract Background Highly repetitive nucleotide sequences are commonly found in nature e.g. in telomeres, microsatellite DNA, polyadenine (poly(A tails of eukaryotic messenger RNA as well as in several inherited human disorders linked to trinucleotide repeat expansions in the genome. Therefore, studying repetitive sequences is of biological, biotechnological and medical relevance. However, cloning of such repetitive DNA sequences is challenging because specific PCR-based amplification is hampered by the lack of unique primer binding sites resulting in unspecific products. Results For the PCR-free generation of repetitive DNA sequences we used antiparallel oligonucleotides flanked by restriction sites of Type IIS endonucleases. The arrangement of recognition sites allowed for stepwise and seamless elongation of repetitive sequences. This facilitated the assembly of repetitive DNA segments and open reading frames encoding polypeptides with periodic amino acid sequences of any desired length. By this strategy we cloned a series of polyglutamine encoding sequences as well as highly repetitive polyadenine tracts. Such repetitive sequences can be used for diverse biotechnological applications. As an example, the polyglutamine sequences were expressed as His6-SUMO fusion proteins in Escherichia coli cells to study their aggregation behavior in vitro. The His6-SUMO moiety enabled affinity purification of the polyglutamine proteins, increased their solubility, and allowed controlled induction of the aggregation process. We successfully purified the fusions proteins and provide an example for their applicability in filter retardation assays. Conclusion Our seamless cloning strategy is PCR-free and allows the directed and efficient generation of highly repetitive DNA sequences of defined lengths by simple standard cloning procedures.

  7. Mitochondrial DNA sequence analysis of two mouse hepatocarcinoma cell lines

    Institute of Scientific and Technical Information of China (English)

    Ji-Gang Dai; Xia Lei; Jia-Xin Min; Guo-Qiang Zhang; Hong Wei

    2005-01-01

    AIM: To study genetic difference of mitochondrial DNA (mtDNA)between two hepatocarcinoma cell lines (Hca-F and Hca-P)with diverse metastatic characteristics and the relationship between mtDNA changes in cancer cells and their oncogenic phenotype.METHODS: Mitochondrial DNA D-loop, tRNAMet+Glu+Ile and ND3gene fragments from the hepatocarcinoma cell lines with 1100, 1126 and 534 bp in length respectively were analysed by PCR amplification and restriction fragment length polymorphism techniques. The D-loop 3' end sequence of the hepatocarcinoma cell lines was determined by sequencing.RESULTS: No amplification fragment length polymorphism and restriction fragment length polymorphism were observed in tRNAMet+Glu+Ile,ND3 and D-loop of mitochondrial DNA of the hepatocarcinoma cells. Sequence differences between Hca-F and Hca-P were found in mtDNA D-loop.CONCLUSION: Deletion mutations of mitochondrial DNA restriction fragment may not play a significant role in carcinogenesis. Genetic difference of mtDNA D-loop between Hca-F and Hca-P, which may reflect the environmental and genetic influences during tumor progression, could be linked to their tumorigenic phenotypes.

  8. Palindromic sequence artifacts generated during next generation sequencing library preparation from historic and ancient DNA.

    Directory of Open Access Journals (Sweden)

    Bastiaan Star

    Full Text Available Degradation-specific processes and variation in laboratory protocols can bias the DNA sequence composition from samples of ancient or historic origin. Here, we identify a novel artifact in sequences from historic samples of Atlantic cod (Gadus morhua, which forms interrupted palindromes consisting of reverse complementary sequence at the 5' and 3'-ends of sequencing reads. The palindromic sequences themselves have specific properties - the bases at the 5'-end align well to the reference genome, whereas extensive misalignments exists among the bases at the terminal 3'-end. The terminal 3' bases are artificial extensions likely caused by the occurrence of hairpin loops in single stranded DNA (ssDNA, which can be ligated and amplified in particular library creation protocols. We propose that such hairpin loops allow the inclusion of erroneous nucleotides, specifically at the 3'-end of DNA strands, with the 5'-end of the same strand providing the template. We also find these palindromes in previously published ancient DNA (aDNA datasets, albeit at varying and substantially lower frequencies. This artifact can negatively affect the yield of endogenous DNA in these types of samples and introduces sequence bias.

  9. PCR primers for metazoan mitochondrial 12S ribosomal DNA sequences.

    Directory of Open Access Journals (Sweden)

    Ryuji J Machida

    Full Text Available BACKGROUND: Assessment of the biodiversity of communities of small organisms is most readily done using PCR-based analysis of environmental samples consisting of mixtures of individuals. Known as metagenetics, this approach has transformed understanding of microbial communities and is beginning to be applied to metazoans as well. Unlike microbial studies, where analysis of the 16S ribosomal DNA sequence is standard, the best gene for metazoan metagenetics is less clear. In this study we designed a set of PCR primers for the mitochondrial 12S ribosomal DNA sequence based on 64 complete mitochondrial genomes and then tested their efficacy. METHODOLOGY/PRINCIPAL FINDINGS: A total of the 64 complete mitochondrial genome sequences representing all metazoan classes available in GenBank were downloaded using the NCBI Taxonomy Browser. Alignment of sequences was performed for the excised mitochondrial 12S ribosomal DNA sequences, and conserved regions were identified for all 64 mitochondrial genomes. These regions were used to design a primer pair that flanks a more variable region in the gene. Then all of the complete metazoan mitochondrial genomes available in NCBI's Organelle Genome Resources database were used to determine the percentage of taxa that would likely be amplified using these primers. Results suggest that these primers will amplify target sequences for many metazoans. CONCLUSIONS/SIGNIFICANCE: Newly designed 12S ribosomal DNA primers have considerable potential for metazoan metagenetic analysis because of their ability to amplify sequences from many metazoans.

  10. Recognizing a Single Base in an Individual DNA Strand: A Step Toward Nanopore DNA Sequencing**

    Science.gov (United States)

    Ashkenasy, N.; Sánchez-Quesada, J.; Ghadiri, M. R.; Bayley, H.

    2007-01-01

    Functional supramolecular chemistry at the single-molecule level. Single strands of DNA can be captured inside α-hemolysin transmembrane pore protein to form single-species α-HL·DNA pseudorotaxanes. This process can be used to identify a single adenine nucleotide at a specific location on a strand of DNA by the characteristic reductions in the α-HL ion conductance. This study suggests that α-HL-mediated single-molecule DNA sequencing might be fundamentally feasible. PMID:15666419

  11. Analysis of sequence variation in Gnathostoma spinigerum mitochondrial DNA by single-strand conformation polymorphism analysis and DNA sequence.

    Science.gov (United States)

    Ngarmamonpirat, Charinthon; Waikagul, Jitra; Petmitr, Songsak; Dekumyoy, Paron; Rojekittikhun, Wichit; Anantapruti, Malinee T

    2005-03-01

    Morphological variations were observed in the advance third stage larvae of Gnathostoma spinigerum collected from swamp eel (Fluta alba), the second intermediate host. Larvae with typical and three atypical types were chosen for partial cytochrome c oxidase subunit I (COI) gene sequence analysis. A 450 bp polymerase chain reaction product of the COI gene was amplified from mitochondrial DNA. The variations were analyzed by single-strand conformation polymorphism and DNA sequencing. The nucleotide variations of the COI gene in the four types of larvae indicated the presence of an intra-specific variation of mitochondrial DNA in the G. spinigerum population.

  12. Repetitive sequences in Eurasian lynx (Lynx lynx L.) mitochondrial DNA control region.

    Science.gov (United States)

    Sindičić, Magda; Gomerčić, Tomislav; Galov, Ana; Polanc, Primož; Huber, Duro; Slavica, Alen

    2012-06-01

    Mitochondrial DNA (mtDNA) control region (CR) of numerous species is known to include up to five different repetitive sequences (RS1-RS5) that are found at various locations, involving motifs of different length and extensive length heteroplasmy. Two repetitive sequences (RS2 and RS3) on opposite sides of mtDNA central conserved region have been described in domestic cat (Felis catus) and some other felid species. However, the presence of repetitive sequence RS3 has not been detected in Eurasian lynx (Lynx lynx) yet. We analyzed mtDNA CR of 35 Eurasian lynx (L. lynx L.) samples to characterize repetitive sequences and to compare them with those found in other felid species. We confirmed the presence of 80 base pairs (bp) repetitive sequence (RS2) at the 5' end of the Eurasian lynx mtDNA CR L strand and for the first time we described RS3 repetitive sequence at its 3' end, consisting of an array of tandem repeats five to ten bp long. We found that felid species share similar RS3 repetitive pattern and fundamental repeat motif TACAC.

  13. Applications of recursive segmentation to the analysis of DNA sequences.

    Science.gov (United States)

    Li, Wentian; Bernaola-Galván, Pedro; Haghighi, Fatameh; Grosse, Ivo

    2002-07-01

    Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G + C)/weak(A + T) sequence, to a binary sequence indicating the presence or absence of the dinucleotide CpG, or to a sequence indicating both the base and the codon position information. We apply various conversion schemes in order to address the following five DNA sequence analysis problems: isochore mapping, CpG island detection, locating the origin and terminus of replication in bacterial genomes, finding complex repeats in telomere sequences, and delineating coding and noncoding regions. We find that the recursive segmentation procedure can successfully detect isochore borders, CpG islands, and the origin and terminus of replication, but it needs improvement for detecting complex repeats as well as borders between coding and noncoding regions.

  14. Chaos game representation (CGR)-walk model for DNA sequences

    Institute of Scientific and Technical Information of China (English)

    Gao Jie; Xu Zhen-Yuan

    2009-01-01

    Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model.

  15. Improved Algorithm for Analysis of DNA Sequences Using Multiresolution Transformation

    Directory of Open Access Journals (Sweden)

    T. M. Inbamalar

    2015-01-01

    Full Text Available Bioinformatics and genomic signal processing use computational techniques to solve various biological problems. They aim to study the information allied with genetic materials such as the deoxyribonucleic acid (DNA, the ribonucleic acid (RNA, and the proteins. Fast and precise identification of the protein coding regions in DNA sequence is one of the most important tasks in analysis. Existing digital signal processing (DSP methods provide less accurate and computationally complex solution with greater background noise. Hence, improvements in accuracy, computational complexity, and reduction in background noise are essential in identification of the protein coding regions in the DNA sequences. In this paper, a new DSP based method is introduced to detect the protein coding regions in DNA sequences. Here, the DNA sequences are converted into numeric sequences using electron ion interaction potential (EIIP representation. Then discrete wavelet transformation is taken. Absolute value of the energy is found followed by proper threshold. The test is conducted using the data bases available in the National Centre for Biotechnology Information (NCBI site. The comparative analysis is done and it ensures the efficiency of the proposed system.

  16. Improved algorithm for analysis of DNA sequences using multiresolution transformation.

    Science.gov (United States)

    Inbamalar, T M; Sivakumar, R

    2015-01-01

    Bioinformatics and genomic signal processing use computational techniques to solve various biological problems. They aim to study the information allied with genetic materials such as the deoxyribonucleic acid (DNA), the ribonucleic acid (RNA), and the proteins. Fast and precise identification of the protein coding regions in DNA sequence is one of the most important tasks in analysis. Existing digital signal processing (DSP) methods provide less accurate and computationally complex solution with greater background noise. Hence, improvements in accuracy, computational complexity, and reduction in background noise are essential in identification of the protein coding regions in the DNA sequences. In this paper, a new DSP based method is introduced to detect the protein coding regions in DNA sequences. Here, the DNA sequences are converted into numeric sequences using electron ion interaction potential (EIIP) representation. Then discrete wavelet transformation is taken. Absolute value of the energy is found followed by proper threshold. The test is conducted using the data bases available in the National Centre for Biotechnology Information (NCBI) site. The comparative analysis is done and it ensures the efficiency of the proposed system.

  17. How effective is graphene nanopore geometry on DNA sequencing?

    CERN Document Server

    Satarifard, Vahid; Ejtehadi, Mohammad Reza

    2015-01-01

    In this paper we investigate the effects of graphene nanopore geometry on homopolymer ssDNA pulling process through nanopore using steered molecular dynamic (SMD) simulations. Different graphene nanopores are examined including axially symmetric and asymmetric monolayer graphene nanopores as well as five layer graphene polyhedral crystals (GPC). The pulling force profile, moving fashion of ssDNA, work done in irreversible DNA pulling and orientations of DNA bases near the nanopore are assessed. Simulation results demonstrate the strong effect of the pore shape as well as geometrical symmetry on free energy barrier, orientations and dynamic of DNA translocation through graphene nanopore. Our study proposes that the symmetric circular geometry of monolayer graphene nanopore with high pulling velocity can be used for DNA sequencing.

  18. Qualitatively predicting acetylation and methylation areas in DNA sequences.

    Science.gov (United States)

    Pham, Tho Hoan; Tran, Dang Hung; Ho, Tu Bao; Satou, Kenji; Valiente, Gabriel

    2005-01-01

    Eukaryotic genomes are packaged by the wrapping of DNA around histone octamers to form nucleosomes. Nucleosome occupancy, acetylation, and methylation, which have a major impact on all nuclear processes involving DNA, have been recently mapped across the yeast genome using chromatin immunoprecipitation and DNA microarrays. However, this experimental protocol is laborious and expensive. Moreover, experimental methods often produce noisy results. In this paper, we introduce a computational approach to the qualitative prediction of nucleosome occupancy, acetylation, and methylation areas in DNA sequences. Our method uses support vector machines to discriminate between DNA areas with high and low relative occupancy, acetylation, or methylation, and rank k-gram features based on their support for these DNA modifications. Experimental results on the yeast genome reveal genetic area preferences of nucleosome occupancy, acetylation, and methylation that are consistent with previous studies. Supplementary files are available from http://www.jaist.ac.jp/~tran/nucleosome/.

  19. Ribosomal DNA copy number loss and sequence variation in cancer.

    Science.gov (United States)

    Xu, Baoshan; Li, Hua; Perry, John M; Singh, Vijay Pratap; Unruh, Jay; Yu, Zulin; Zakari, Musinu; McDowell, William; Li, Linheng; Gerton, Jennifer L

    2017-06-01

    Ribosomal DNA is one of the most variable regions in the human genome with respect to copy number. Despite the importance of rDNA for cellular function, we know virtually nothing about what governs its copy number, stability, and sequence in the mammalian genome due to challenges associated with mapping and analysis. We applied computational and droplet digital PCR approaches to measure rDNA copy number in normal and cancer states in human and mouse genomes. We find that copy number and sequence can change in cancer genomes. Counterintuitively, human cancer genomes show a loss of copies, accompanied by global copy number co-variation. The sequence can also be more variable in the cancer genome. Cancer genomes with lower copies have mutational evidence of mTOR hyperactivity. The PTEN phosphatase is a tumor suppressor that is critical for genome stability and a negative regulator of the mTOR kinase pathway. Surprisingly, but consistent with the human cancer genomes, hematopoietic cancer stem cells from a Pten-/- mouse model for leukemia have lower rDNA copy number than normal tissue, despite increased proliferation, rRNA production, and protein synthesis. Loss of copies occurs early and is associated with hypersensitivity to DNA damage. Therefore, copy loss is a recurrent feature in cancers associated with mTOR activation. Ribosomal DNA copy number may be a simple and useful indicator of whether a cancer will be sensitive to DNA damaging treatments.

  20. DNA methylation and transcription in HERV (K, W, E) and LINE sequences remain unchanged upon foreign DNA insertions.

    Science.gov (United States)

    Weber, Stefanie; Jung, Susan; Doerfler, Walter

    2016-02-01

    DNA methylation and transcriptional profiles were determined in the regulatory sequences of the human endogenous retroviral (HERV-K, -W, -E) and LINE-1.2 elements and were compared between non-transgenomic and plasmid-transgenomic cells. DNA methylation profiles in the HERV (K, W, E) and LINE sequences were determined by bisulfite genomic sequencing. The transcription of these genome segments was assessed by quantitative real-time PCR. In HERV-K, HERV-W and LINE-1.2 the levels of DNA methylation ranged between 75 and 98%, while in HERV-E they were around 60%. Nevertheless, the HERV and LINE-1.2 sequences were actively transcribed. No differences were found in comparisons of HERV and LINE-1.2 CpG methylation and transcription patterns between non-transgenomic and plasmid-transgenomic HCT116 cells. The insertion of a 5.6 kbp plasmid into the HCT116 genome had no effect on the HERV and LINE-1.2 methylation and transcription profiles, although other parts of the HCT116 genome had shown marked changes. These repetitive sequences are transcribed, probably because the large number of HERV and LINE-1.2 elements harbor copies with non- or hypo-methylated long terminal repeat sequences.

  1. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    Science.gov (United States)

    2011-01-01

    Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/. PMID:21385349

  2. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    Directory of Open Access Journals (Sweden)

    Baldwin Stephen A

    2011-03-01

    Full Text Available Abstract Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  3. VoSeq: a voucher and DNA sequence web application.

    Science.gov (United States)

    Peña, Carlos; Malm, Tobias

    2012-01-01

    There is an ever growing number of molecular phylogenetic studies published, due to, in part, the advent of new techniques that allow cheap and quick DNA sequencing. Hence, the demand for relational databases with which to manage and annotate the amassing DNA sequences, genes, voucher specimens and associated biological data is increasing. In addition, a user-friendly interface is necessary for easy integration and management of the data stored in the database back-end. Available databases allow management of a wide variety of biological data. However, most database systems are not specifically constructed with the aim of being an organizational tool for researchers working in phylogenetic inference. We here report a new software facilitating easy management of voucher and sequence data, consisting of a relational database as back-end for a graphic user interface accessed via a web browser. The application, VoSeq, includes tools for creating molecular datasets of DNA or amino acid sequences ready to be used in commonly used phylogenetic software such as RAxML, TNT, MrBayes and PAUP, as well as for creating tables ready for publishing. It also has inbuilt BLAST capabilities against all DNA sequences stored in VoSeq as well as sequences in NCBI GenBank. By using mash-ups and calls to web services, VoSeq allows easy integration with public services such as Yahoo! Maps, Flickr, Encyclopedia of Life (EOL) and GBIF (by generating data-dumps that can be processed with GBIF's Integrated Publishing Toolkit).

  4. Label-free DNA sequencing using Millikan detection.

    Science.gov (United States)

    Dettloff, Roger; Leiske, Danielle; Chow, Andrea; Farinas, Javier

    2015-10-15

    A label-free method for DNA sequencing based on the principle of the Millikan oil drop experiment was developed. This sequencing-by-synthesis approach sensed increases in bead charge as nucleotides were added by a polymerase to DNA templates attached to beads. The balance between an electrical force, which was dependent on the number of nucleotide charges on a bead, and opposing hydrodynamic drag and restoring tether forces resulted in a bead velocity that was a function of the number of nucleotides attached to the bead. The velocity of beads tethered via a polymer to a microfluidic channel and subjected to an oscillating electric field was measured using dark-field microscopy and used to determine how many nucleotides were incorporated during each sequencing-by-synthesis cycle. Increases in bead velocity of approximately 1% were reliably detected during DNA polymerization, allowing for sequencing of short DNA templates. The method could lead to a low-cost, high-throughput sequencing platform that could enable routine sequencing in medical applications.

  5. Hiding message into DNA sequence through DNA coding and chaotic maps.

    Science.gov (United States)

    Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman

    2014-09-01

    The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.

  6. DNA Methylation Pattern as Important Epigenetic Criterion in Cancer

    Directory of Open Access Journals (Sweden)

    Mehrdad Ghavifekr Fakhr

    2013-01-01

    Full Text Available Epigenetic modifications can affect the long-term gene expression without any change in nucleotide sequence of the DNA. Epigenetic processes intervene in the cell differentiation, chromatin structure, and activity of genes since the embryonic period. However, disorders in genes’ epigenetic pattern can affect the mechanisms such as cell division, apoptosis, and response to the environmental stimuli which may lead to the incidence of different diseases and cancers. Since epigenetic changes may return to their natural state, they could be used as important targets in the treatment of cancer and similar malignancies. The aim of this review is to assess the epigenetic changes in normal and cancerous cells, the causative factors, and epigenetic therapies and treatments.

  7. Mitochondrial DNA heteroplasmy in the emerging field of massively parallel sequencing

    Science.gov (United States)

    Just, Rebecca S.; Irwin, Jodi A.; Parson, Walther

    2015-01-01

    Long an important and useful tool in forensic genetic investigations, mitochondrial DNA (mtDNA) typing continues to mature. Research in the last few years has demonstrated both that data from the entire molecule will have practical benefits in forensic DNA casework, and that massively parallel sequencing (MPS) methods will make full mitochondrial genome (mtGenome) sequencing of forensic specimens feasible and cost-effective. A spate of recent studies has employed these new technologies to assess intraindividual mtDNA variation. However, in several instances, contamination and other sources of mixed mtDNA data have been erroneously identified as heteroplasmy. Well vetted mtGenome datasets based on both Sanger and MPS sequences have found authentic point heteroplasmy in approximately 25% of individuals when minor component detection thresholds are in the range of 10–20%, along with positional distribution patterns in the coding region that differ from patterns of point heteroplasmy in the well-studied control region. A few recent studies that examined very low-level heteroplasmy are concordant with these observations when the data are examined at a common level of resolution. In this review we provide an overview of considerations related to the use of MPS technologies to detect mtDNA heteroplasmy. In addition, we examine published reports on point heteroplasmy to characterize features of the data that will assist in the evaluation of future mtGenome data developed by any typing method. PMID:26009256

  8. Nonlinear Aspects of Coding and Noncoding DNA Sequences

    Science.gov (United States)

    Stanley, H. Eugene

    2001-03-01

    One of the most remarkable features of human DNA is that 97 percent is not coding for proteins. Studying this noncoding DNA is important both for practical reasons (to distinguish it from the coding DNA as the human genome is sequenced), and for scientific reasons (why is the noncoding DNA present at all, if it appears to have little if any purpose?). In this talk we discuss new methods of analyzing coding and noncoding DNA in parallel, with a view to uncovering different statistical properties of the two kinds of DNA. We also speculate on possible roles of noncoding DNA. The work reported here was carried out primarily by P. Bernaola-Galvan, S. V. Buldyrev, P. Carpena, N. Dokholyan, A. L. Goldberger, I. Grosse, S. Havlin, H. Herzel, J. L. Oliver, C.-K. Peng, M. Simons, H. E. Stanley, R. H. R. Stanley, and G. M. Viswanathan. [1] For a brief overview in language that physicists can understand, see H. E. Stanley, S. V. Buldyrev, A. L. Goldberger, S. Havlin, C.-K. Peng, and M. Simons, "Scaling Features of Noncoding DNA" [Proc. XII Max Born Symposium, Wroclaw], Physica A 273, 1-18 (1999). [2] I. Grosse, H. Herzel, S. V. Buldyrev, and H. E. Stanley, "Species Independence of Mutual Information in Coding and Noncoding DNA," Phys. Rev. E 61, 5624-5629 (2000). [3] P. Bernaola-Galvan, I. Grosse, P. Carpena, J. L. Oliver, and H. E. Stanley, "Identification of DNA Coding Regions Using an Entropic Segmentation Method," Phys. Rev. Lett. 84, 1342-1345 (2000). [4] N. Dokholyan, S. V. Buldyrev, S. Havlin, and H. E. Stanley, "Distributions of Dimeric Tandem Repeats in Non-coding and Coding DNA Sequences," J. Theor. Biol. 202, 273-282 (2000). [5] R. H. R. Stanley, N. V. Dokholyan, S. V. Buldyrev, S. Havlin, and H. E. Stanley, "Clumping of Identical Oligonucleotides in Coding and Noncoding DNA Sequences," J. Biomol. Structure and Design 17, 79-87 (1999). [6] N. Dokholyan, S. V. Buldyrev, S. Havlin, and H. E. Stanley, "Distribution of Base Pair Repeats in Coding and Noncoding DNA

  9. Mitochondrial DNA sequence of Onychostoma rara.

    Science.gov (United States)

    Zeng, Chun-Fang; Li, Xiao-Ling; Li, Chuan-Wu; Huang, Xiang-Rong; Wan, Yi-Wen

    2015-01-01

    The complete mitochondrial genome sequence of Onychostoma rara was determined to be 16,590 bp in length and contains 13 protein-coding genes (PCGs), 22 tRNA genes, large (rrnL) and small (rrnS) rRNA and the non-coding control region. Its total A + T content is 55.65%. We also analyzed the structure of control region, 6 CSBs (CSB-1, CSB-2, CSB-3, CSB-D, CSB-E and CSB-F) and 2 bp tandem repeat were detected.

  10. Dialects of the DNA uptake sequence in Neisseriaceae.

    Directory of Open Access Journals (Sweden)

    Stephan A Frye

    2013-04-01

    Full Text Available In all sexual organisms, adaptations exist that secure the safe reassortment of homologous alleles and prevent the intrusion of potentially hazardous alien DNA. Some bacteria engage in a simple form of sex known as transformation. In the human pathogen Neisseria meningitidis and in related bacterial species, transformation by exogenous DNA is regulated by the presence of a specific DNA Uptake Sequence (DUS, which is present in thousands of copies in the respective genomes. DUS affects transformation by limiting DNA uptake and recombination in favour of homologous DNA. The specific mechanisms of DUS-dependent genetic transformation have remained elusive. Bioinformatic analyses of family Neisseriaceae genomes reveal eight distinct variants of DUS. These variants are here termed DUS dialects, and their effect on interspecies commutation is demonstrated. Each of the DUS dialects is remarkably conserved within each species and is distributed consistent with a robust Neisseriaceae phylogeny based on core genome sequences. The impact of individual single nucleotide transversions in DUS on meningococcal transformation and on DNA binding and uptake is analysed. The results show that a DUS core 5'-CTG-3' is required for transformation and that transversions in this core reduce DNA uptake more than two orders of magnitude although the level of DNA binding remains less affected. Distinct DUS dialects are efficient barriers to interspecies recombination in N. meningitidis, N. elongata, Kingella denitrificans, and Eikenella corrodens, despite the presence of the core sequence. The degree of similarity between the DUS dialect of the recipient species and the donor DNA directly correlates with the level of transformation and DNA binding and uptake. Finally, DUS-dependent transformation is documented in the genera Eikenella and Kingella for the first time. The results presented here advance our understanding of the function and evolution of DUS and genetic

  11. DNA sequence alignment by microhomology sampling during homologous recombination.

    Science.gov (United States)

    Qi, Zhi; Redding, Sy; Lee, Ja Yil; Gibb, Bryan; Kwon, YoungHo; Niu, Hengyao; Gaines, William A; Sung, Patrick; Greene, Eric C

    2015-02-26

    Homologous recombination (HR) mediates the exchange of genetic information between sister or homologous chromatids. During HR, members of the RecA/Rad51 family of recombinases must somehow search through vast quantities of DNA sequence to align and pair single-strand DNA (ssDNA) with a homologous double-strand DNA (dsDNA) template. Here, we use single-molecule imaging to visualize Rad51 as it aligns and pairs homologous DNA sequences in real time. We show that Rad51 uses a length-based recognition mechanism while interrogating dsDNA, enabling robust kinetic selection of 8-nucleotide (nt) tracts of microhomology, which kinetically confines the search to sites with a high probability of being a homologous target. Successful pairing with a ninth nucleotide coincides with an additional reduction in binding free energy, and subsequent strand exchange occurs in precise 3-nt steps, reflecting the base triplet organization of the presynaptic complex. These findings provide crucial new insights into the physical and evolutionary underpinnings of DNA recombination. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Physical localisation of repetitive DNA sequences in Alstroemeria: karyotyping of two species with species-specific and ribosomal DNA.

    Science.gov (United States)

    Kamstra, S A; Kuipers, A G; De Jeu, M J; Ramanna, M S; Jacobsen, E

    1997-10-01

    Fluorescence in situ hybridization (FISH) was used to localise two species-specific repetitive DNA sequences, A001-I and D32-13, and two highly conserved 25S and 5S rDNA sequences on the metaphase chromosomes of two species of Alstroemeria. The Chilean species, Alstroemeria aurea (2n = 16), has abundant constitutive heterochromatin, whereas the Brazilian species, Alstroemeria inodora, has hardly any heterochromatin. The A. aurea specific A001-I probe hybridized specifically to the C-band regions on all chromosomes. The FISH patterns on A. inodora chromosomes using species-specific probe D32-13 resembled the C-banding pattern and the A001-I pattern on A. aurea chromosomes. There were notable differences in number and distribution of rDNA sites between the two species. The 25S rDNA probe revealed 16 sites in A. aurea that closely colocalised with A001-I sites and 12 in A. inodora that were predominantly detected in the centromeric regions. FISH karyotypes of the two Alstroemeria species were constructed accordingly, enabling full identification of all individual chromosomes. These FISH karyotypes will be useful for monitoring the chromosomes of both Alstroemeria species in hybrids and backcross derivatives.

  13. Rapid DNA sequencing by horizontal ultrathin gel electrophoresis.

    Science.gov (United States)

    Brumley, R L; Smith, L M

    1991-01-01

    A horizontal polyacrylamide gel electrophoresis apparatus has been developed that decreases the time required to separate the DNA fragments produced in enzymatic sequencing reactions. The configuration of this apparatus and the use of circulating coolant directly under the glass plates result in heat exchange that is approximately nine times more efficient than passive thermal transfer methods commonly used. Bubble-free gels as thin as 25 microns can be routinely cast on this device. The application to these ultrathin gels of electric fields up to 250 volts/cm permits the rapid separation of multiple DNA sequencing reactions in parallel. When used in conjunction with 32P-based autoradiography, the DNA bands appear substantially sharper than those obtained in conventional electrophoresis. This increased sharpness permits shorter autoradiographic exposure times and longer sequence reads. Images PMID:1870968

  14. Noninvasive prenatal paternity testing (NIPAT) through maternal plasma DNA sequencing

    DEFF Research Database (Denmark)

    Jiang, Haojun; Xie, Yifan; Li, Xuchao

    2016-01-01

    Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) have been already used to perform noninvasive prenatal paternity testing from maternal plasma DNA. The frequently used technologies were PCR followed by capillary electrophoresis and SNP typing array, respectively. Here, we...... developed a noninvasive prenatal paternity testing (NIPAT) based on SNP typing with maternal plasma DNA sequencing. We evaluated the influence factors (minor allele frequency (MAF), the number of total SNP, fetal fraction and effective sequencing depth) and designed three different selective SNP panels...... paternity test using STR multiplex system. Our study here proved that the maternal plasma DNA sequencing-based technology is feasible and accurate in determining paternity, which may provide an alternative in forensic application in the future....

  15. Accelerating Computation of DNA Sequence Alignment in Distributed Environment

    Science.gov (United States)

    Guo, Tao; Li, Guiyang; Deaton, Russel

    Sequence similarity and alignment are most important operations in computational biology. However, analyzing large sets of DNA sequence seems to be impractical on a regular PC. Using multiple threads with JavaParty mechanism, this project has successfully implemented in extending the capabilities of regular Java to a distributed environment for simulation of DNA computation. With the aid of JavaParty and the design of multiple threads, the results of this study demonstrated that the modified regular Java program could perform parallel computing without using RMI or socket communication. In this paper, an efficient method for modeling and comparing DNA sequences with dynamic programming and JavaParty was firstly proposed. Additionally, results of this method in distributed environment have been discussed.

  16. Facilitated diffusion on mobile DNA: configurational traps and sequence heterogeneity

    CERN Document Server

    Brackley, C A; Marenduzzo, D; 10.1103/PhysRevLett.109.168103

    2012-01-01

    We present Brownian dynamics simulations of the facilitated diffusion of a protein, modelled as a sphere with a binding site on its surface, along DNA, modelled as a semi-flexible polymer. We consider both the effect of DNA organisation in 3D, and of sequence heterogeneity. We find that in a network of DNA loops, as are thought to be present in bacterial DNA, the search process is very sensitive to the spatial location of the target within such loops. Therefore, specific genes might be repressed or promoted by changing the local topology of the genome. On the other hand, sequence heterogeneity creates traps which normally slow down facilitated diffusion. When suitably positioned, though, these traps can, surprisingly, render the search process much more efficient.

  17. Applying Small-Scale DNA Signatures as an Aid in Assembling Soybean Chromosome Sequences

    Directory of Open Access Journals (Sweden)

    Myron Peto

    2010-01-01

    Full Text Available Previous work has established a genomic signature based on relative counts of the 16 possible dinucleotides. Until now, it has been generally accepted that the dinucleotide signature is characteristic of a genome and is relatively homogeneous across a genome. However, we found some local regions of the soybean genome with a signature differing widely from that of the rest of the genome. Those regions were mostly centromeric and pericentromeric, and enriched for repetitive sequences. We found that DNA binding energy also presented large-scale patterns across soybean chromosomes. These two patterns were helpful during assembly and quality control of soybean whole genome shotgun scaffold sequences into chromosome pseudomolecules.

  18. Vector sequences - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us ...od - Number of data entries 7 entries - Joomla SEF URLs by Artio About This Database Database Description Download License Update His...tory of This Database Site Policy | Contact Us Vector sequences - Budding yeast cDNA sequencing project | LSDB Archive ...

  19. Comment on "Linguistic features of noncoding DNA sequences"

    CERN Document Server

    Israeloff, N E; Chan, K; Israeloff, N E; Kagalenko, M; Chan, K

    1995-01-01

    In a recent Physical Review Letter, Mantegna et. al., report that certain statistical signatures of natural language can be found in non-coding DNA sequences. In this comment we show that random noise with power-law correlation similar to 1/f noise, exhibits the same "linguistic" signature as those found in non-coding DNA. We conclude that these signa- tures cannot distinguish languages from noise.

  20. Sequence dependence of transcription factor-mediated DNA looping.

    Science.gov (United States)

    Johnson, Stephanie; Lindén, Martin; Phillips, Rob

    2012-09-01

    DNA is subject to large deformations in a wide range of biological processes. Two key examples illustrate how such deformations influence the readout of the genetic information: the sequestering of eukaryotic genes by nucleosomes and DNA looping in transcriptional regulation in both prokaryotes and eukaryotes. These kinds of regulatory problems are now becoming amenable to systematic quantitative dissection with a powerful dialogue between theory and experiment. Here, we use a single-molecule experiment in conjunction with a statistical mechanical model to test quantitative predictions for the behavior of DNA looping at short length scales and to determine how DNA sequence affects looping at these lengths. We calculate and measure how such looping depends upon four key biological parameters: the strength of the transcription factor binding sites, the concentration of the transcription factor, and the length and sequence of the DNA loop. Our studies lead to the surprising insight that sequences that are thought to be especially favorable for nucleosome formation because of high flexibility lead to no systematically detectable effect of sequence on looping, and begin to provide a picture of the distinctions between the short length scale mechanics of nucleosome formation and looping.

  1. Label-Free DNA Sequencing Using Millikan Detection

    OpenAIRE

    Dettloff, Roger; Leiske, Danielle; Chow, Andrea; Farinas, Javier

    2015-01-01

    A label-free method for DNA sequencing based on the principle of the Millikan oil drop experiment was developed. This sequencing-by-synthesis approach sensed increases in bead charge as nucleotides were added by a polymerase to DNA templates attached to beads. The balance between an electrical force, which was dependent on the number of nucleotide charges on a bead, and opposing hydrodynamic drag and restoring tether forces resulted in a bead velocity that was a function of the number of nucl...

  2. Anaplasma phagocytophilum in Danish sheep: confirmation by DNA sequencing

    Directory of Open Access Journals (Sweden)

    Thamsborg Stig M

    2009-12-01

    Full Text Available Abstract Background The presence of Anaplasma phagocytophilum, an Ixodes ricinus transmitted bacterium, was investigated in two flocks of Danish grazing lambs. Direct PCR detection was performed on DNA extracted from blood and serum with subsequent confirmation by DNA sequencing. Methods 31 samples obtained from clinically normal lambs in 2000 from Fussingø, Jutland and 12 samples from ten lambs and two ewes from a clinical outbreak at Feddet, Zealand in 2006 were included in the study. Some of the animals from Feddet had shown clinical signs of polyarthritis and general unthriftiness prior to sampling. DNA extraction was optimized from blood and serum and detection achieved by a 16S rRNA targeted PCR with verification of the product by DNA sequencing. Results Five DNA extracts were found positive by PCR, including two samples from 2000 and three from 2006. For both series of samples the product was verified as A. phagocytophilum by DNA sequencing. Conclusions A. phagocytophilum was detected by molecular methods for the first time in Danish grazing lambs during the two seasons investigated (2000 and 2006.

  3. Sequence-selective DNA recognition with peptide-bisbenzamidine conjugates.

    Science.gov (United States)

    Sánchez, Mateo I; Vázquez, Olalla; Vázquez, M Eugenio; Mascareñas, José L

    2013-07-22

    Transcription factors (TFs) are specialized proteins that play a key role in the regulation of genetic expression. Their mechanism of action involves the interaction with specific DNA sequences, which usually takes place through specialized domains of the protein. However, achieving an efficient binding usually requires the presence of the full protein. This is the case for bZIP and zinc finger TF families, which cannot interact with their target sites when the DNA binding fragments are presented as isolated monomers. Herein it is demonstrated that the DNA binding of these monomeric peptides can be restored when conjugated to aza-bisbenzamidines, which are readily accessible molecules that interact with A/T-rich sites by insertion into their minor groove. Importantly, the fluorogenic properties of the aza-benzamidine unit provide details of the DNA interaction that are eluded in electrophoresis mobility shift assays (EMSA). The hybrids based on the GCN4 bZIP protein preferentially bind to composite sequences containing tandem bisbenzamidine-GCN4 binding sites (TCAT⋅AAATT). Fluorescence reverse titrations show an interesting multiphasic profile consistent with the formation of competitive nonspecific complexes at low DNA/peptide ratios. On the other hand, the conjugate with the DNA binding domain of the zinc finger protein GAGA binds with high affinity (KD≈12 nM) and specificity to a composite AATTT⋅GAGA sequence containing both the bisbenzamidine and the TF consensus binding sites.

  4. Preparation of next-generation sequencing libraries from damaged DNA.

    Science.gov (United States)

    Briggs, Adrian W; Heyn, Patricia

    2012-01-01

    Next-generation sequencing (NGS) has revolutionized ancient DNA research, especially when combined with high-throughput target enrichment methods. However, attaining high sequencing depth and accuracy from samples often remains problematic due to the damaged state of ancient DNA, in particular the extremely low copy number of ancient DNA and the abundance of uracil residues derived from cytosine deamination that lead to miscoding errors. It is therefore critical to use a highly efficient procedure for conversion of a raw DNA extract into an adaptor-ligated sequencing library, and equally important to reduce errors from uracil residues. We present a protocol for NGS library preparation that allows highly efficient conversion of DNA fragments into an adaptor-ligated form. The protocol incorporates an option to remove the vast majority of uracil miscoding lesions as part of the library preparation process. The procedure requires only two spin column purification steps and no gel purification or bead handling. Starting from an aliquot of DNA extract, a finished, highly amplified library can be generated in 5 h, or under 3 h if uracil removal is not required.

  5. Perspectives of DNA microarray and next-generation DNA sequencing technologies

    Institute of Scientific and Technical Information of China (English)

    TENG XiaoKun; XIAO HuaSheng

    2009-01-01

    DNA microarray and next-generation DNA sequencing technologies are important tools for high-throughput genome research, in revealing both the structural and functional characteristics of genomes. In the past decade the DNA microarray technologies have been widely applied in the studies of functional genomics, systems biology and pharmacogenomics. The next-generation DNA sequenc-ing method was first introduced by the 454 Company in 2003, immediately followed by the establish-ment of the Solexa and Solid techniques by other biotech companies. Though it has not been long since the first emergence of this technology, with the fast and impressive improvement, the application of this technology has extended to almost all fields of genomics research, as a rival challenging the existing DNA microarray technology. This paper briefly reviews the working principles of these two technologies as well as their application and perspectives in genome research.

  6. ICRPfinder: a fast pattern design algorithm for coding sequences and its application in finding potential restriction enzyme recognition sites

    Directory of Open Access Journals (Sweden)

    Stafford Phillip

    2009-09-01

    Full Text Available Abstract Background Restriction enzymes can produce easily definable segments from DNA sequences by using a variety of cut patterns. There are, however, no software tools that can aid in gene building -- that is, modifying wild-type DNA sequences to express the same wild-type amino acid sequences but with enhanced codons, specific cut sites, unique post-translational modifications, and other engineered-in components for recombinant applications. A fast DNA pattern design algorithm, ICRPfinder, is provided in this paper and applied to find or create potential recognition sites in target coding sequences. Results ICRPfinder is applied to find or create restriction enzyme recognition sites by introducing silent mutations. The algorithm is shown capable of mapping existing cut-sites but importantly it also can generate specified new unique cut-sites within a specified region that are guaranteed not to be present elsewhere in the DNA sequence. Conclusion ICRPfinder is a powerful tool for finding or creating specific DNA patterns in a given target coding sequence. ICRPfinder finds or creates patterns, which can include restriction enzyme recognition sites, without changing the translated protein sequence. ICRPfinder is a browser-based JavaScript application and it can run on any platform, in on-line or off-line mode.

  7. Real-time DNA sequencing from single polymerase molecules.

    Science.gov (United States)

    Eid, John; Fehr, Adrian; Gray, Jeremy; Luong, Khai; Lyle, John; Otto, Geoff; Peluso, Paul; Rank, David; Baybayan, Primo; Bettman, Brad; Bibillo, Arkadiusz; Bjornson, Keith; Chaudhuri, Bidhan; Christians, Frederick; Cicero, Ronald; Clark, Sonya; Dalal, Ravindra; Dewinter, Alex; Dixon, John; Foquet, Mathieu; Gaertner, Alfred; Hardenbol, Paul; Heiner, Cheryl; Hester, Kevin; Holden, David; Kearns, Gregory; Kong, Xiangxu; Kuse, Ronald; Lacroix, Yves; Lin, Steven; Lundquist, Paul; Ma, Congcong; Marks, Patrick; Maxham, Mark; Murphy, Devon; Park, Insil; Pham, Thang; Phillips, Michael; Roy, Joy; Sebra, Robert; Shen, Gene; Sorenson, Jon; Tomaney, Austin; Travers, Kevin; Trulson, Mark; Vieceli, John; Wegener, Jeffrey; Wu, Dawn; Yang, Alicia; Zaccarin, Denis; Zhao, Peter; Zhong, Frank; Korlach, Jonas; Turner, Stephen

    2009-01-02

    We present single-molecule, real-time sequencing data obtained from a DNA polymerase performing uninterrupted template-directed synthesis using four distinguishable fluorescently labeled deoxyribonucleoside triphosphates (dNTPs). We detected the temporal order of their enzymatic incorporation into a growing DNA strand with zero-mode waveguide nanostructure arrays, which provide optical observation volume confinement and enable parallel, simultaneous detection of thousands of single-molecule sequencing reactions. Conjugation of fluorophores to the terminal phosphate moiety of the dNTPs allows continuous observation of DNA synthesis over thousands of bases without steric hindrance. The data report directly on polymerase dynamics, revealing distinct polymerization states and pause sites corresponding to DNA secondary structure. Sequence data were aligned with the known reference sequence to assay biophysical parameters of polymerization for each template position. Consensus sequences were generated from the single-molecule reads at 15-fold coverage, showing a median accuracy of 99.3%, with no systematic error beyond fluorophore-dependent error rates.

  8. Identification of Bacterial Species in Kuwaiti Waters Through DNA Sequencing

    Science.gov (United States)

    Chen, K.

    2017-01-01

    With an objective of identifying the bacterial diversity associated with ecosystem of various Kuwaiti Seas, bacteria were cultured and isolated from 3 water samples. Due to the difficulties for cultured and isolated fecal coliforms on the selective agar plates, bacterial isolates from marine agar plates were selected for molecular identification. 16S rRNA genes were successfully amplified from the genome of the selected isolates using Universal Eubacterial 16S rRNA primers. The resulted amplification products were subjected to automated DNA sequencing. Partial 16S rDNA sequences obtained were compared directly with sequences in the NCBI database using BLAST as well as with the sequences available with Ribosomal Database Project (RDP).

  9. DNA qualification workflow for next generation sequencing of histopathological samples.

    Directory of Open Access Journals (Sweden)

    Michele Simbolo

    Full Text Available Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF tissues, 6 formalin-fixed paraffin-embedded (FFPE tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard

  10. DNA qualification workflow for next generation sequencing of histopathological samples.

    Science.gov (United States)

    Simbolo, Michele; Gottardi, Marisa; Corbo, Vincenzo; Fassan, Matteo; Mafficini, Andrea; Malpeli, Giorgio; Lawlor, Rita T; Scarpa, Aldo

    2013-01-01

    Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA) and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR) was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF) tissues, 6 formalin-fixed paraffin-embedded (FFPE) tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard workflow for

  11. Ligation bias in illumina next-generation DNA libraries: implications for sequencing ancient genomes.

    Directory of Open Access Journals (Sweden)

    Andaine Seguin-Orlando

    Full Text Available Ancient DNA extracts consist of a mixture of endogenous molecules and contaminant DNA templates, often originating from environmental microbes. These two populations of templates exhibit different chemical characteristics, with the former showing depurination and cytosine deamination by-products, resulting from post-mortem DNA damage. Such chemical modifications can interfere with the molecular tools used for building second-generation DNA libraries, and limit our ability to fully characterize the true complexity of ancient DNA extracts. In this study, we first use fresh DNA extracts to demonstrate that library preparation based on adapter ligation at AT-overhangs are biased against DNA templates starting with thymine residues, contrarily to blunt-end adapter ligation. We observe the same bias on fresh DNA extracts sheared on Bioruptor, Covaris and nebulizers. This contradicts previous reports suggesting that this bias could originate from the methods used for shearing DNA. This also suggests that AT-overhang adapter ligation efficiency is affected in a sequence-dependent manner and results in an uneven representation of different genomic contexts. We then show how this bias could affect the base composition of ancient DNA libraries prepared following AT-overhang ligation, mainly by limiting the ability to ligate DNA templates starting with thymines and therefore deaminated cytosines. This results in particular nucleotide misincorporation damage patterns, deviating from the signature generally expected for authenticating ancient sequence data. Consequently, we show that models adequate for estimating post-mortem DNA damage levels must be robust to the molecular tools used for building ancient DNA libraries.

  12. Ligation bias in illumina next-generation DNA libraries: implications for sequencing ancient genomes.

    Science.gov (United States)

    Seguin-Orlando, Andaine; Schubert, Mikkel; Clary, Joel; Stagegaard, Julia; Alberdi, Maria T; Prado, José Luis; Prieto, Alfredo; Willerslev, Eske; Orlando, Ludovic

    2013-01-01

    Ancient DNA extracts consist of a mixture of endogenous molecules and contaminant DNA templates, often originating from environmental microbes. These two populations of templates exhibit different chemical characteristics, with the former showing depurination and cytosine deamination by-products, resulting from post-mortem DNA damage. Such chemical modifications can interfere with the molecular tools used for building second-generation DNA libraries, and limit our ability to fully characterize the true complexity of ancient DNA extracts. In this study, we first use fresh DNA extracts to demonstrate that library preparation based on adapter ligation at AT-overhangs are biased against DNA templates starting with thymine residues, contrarily to blunt-end adapter ligation. We observe the same bias on fresh DNA extracts sheared on Bioruptor, Covaris and nebulizers. This contradicts previous reports suggesting that this bias could originate from the methods used for shearing DNA. This also suggests that AT-overhang adapter ligation efficiency is affected in a sequence-dependent manner and results in an uneven representation of different genomic contexts. We then show how this bias could affect the base composition of ancient DNA libraries prepared following AT-overhang ligation, mainly by limiting the ability to ligate DNA templates starting with thymines and therefore deaminated cytosines. This results in particular nucleotide misincorporation damage patterns, deviating from the signature generally expected for authenticating ancient sequence data. Consequently, we show that models adequate for estimating post-mortem DNA damage levels must be robust to the molecular tools used for building ancient DNA libraries.

  13. Enhancing Gibbs sampling method for motif finding in DNA with initial graph representation of sequences.

    Science.gov (United States)

    Stepančič, Ziva

    2014-10-01

    Finding short patterns with residue variation in a set of sequences is still an open problem in genetics, since motif-finding techniques on DNA and protein sequences are inconclusive on real data sets and their performance varies on different species. Hence, finding new algorithms and evolving established methods are vital to further understanding of genome properties and the mechanisms of protein development. In this work, we present an approach to finding functional motifs in DNA sequences in connection to Gibbs sampling method. Starting points in the search space are partly determined via graphical representation of input sequences opposed to completely random initial points with the standard Gibbs sampling. Our algorithm is evaluated on synthetic as well as on real data sets by using several statistics, such as sensitivity, positive predictive value, specificity, performance, and correlation coefficient. Additionally, a comparison between our algorithm and the basic standard Gibbs sampling algorithm is made to show improvement in accuracy, repeatability, and performance.

  14. The chemical structure of DNA sequence signals for RNA transcription

    Science.gov (United States)

    George, D. G.; Dayhoff, M. O.

    1982-01-01

    The proposed recognition sites for RNA transcription for E. coli NRA polymerase, bacteriophage T7 RNA polymerase, and eukaryotic RNA polymerase Pol II are evaluated in the light of the requirements for efficient recognition. It is shown that although there is good experimental evidence that specific nucleic acid sequence patterns are involved in transcriptional regulation in bacteria and bacterial viruses, among the sequences now available, only in the case of the promoters recognized by bacteriophage T7 polymerase does it seem likely that the pattern is sufficient. It is concluded that the eukaryotic pattern that is investigated is not restrictive enough to serve as a recognition site.

  15. DNA methyltransferase 1 and DNA methylation patterning contribute to germinal center B-cell differentiation

    DEFF Research Database (Denmark)

    Shaknovich, Rita; Cerchietti, Leandro; Tsikitas, Lucas;

    2011-01-01

    The phenotype of germinal center (GC) B cells includes the unique ability to tolerate rapid proliferation and the mutagenic actions of activation induced cytosine deaminase (AICDA). Given the importance of epigenetic patterning in determining cellular phenotypes, we examined DNA methylation and t......, the GC B cells of Dnmt1 hypomorphic animals showed evidence of increased DNA damage, suggesting dual roles for DNMT1 in DNA methylation and double strand DNA break repair.......The phenotype of germinal center (GC) B cells includes the unique ability to tolerate rapid proliferation and the mutagenic actions of activation induced cytosine deaminase (AICDA). Given the importance of epigenetic patterning in determining cellular phenotypes, we examined DNA methylation...... and the role of DNA methyltransferases in the formation of GCs. DNA methylation profiling revealed a marked shift in DNA methylation patterning in GC B cells versus resting/naive B cells. This shift included significant differential methylation of 235 genes, with concordant inverse changes in gene expression...

  16. Identification of tissue-specific cell death using methylation patterns of circulating DNA.

    Science.gov (United States)

    Lehmann-Werman, Roni; Neiman, Daniel; Zemmour, Hai; Moss, Joshua; Magenheim, Judith; Vaknin-Dembinsky, Adi; Rubertsson, Sten; Nellgård, Bengt; Blennow, Kaj; Zetterberg, Henrik; Spalding, Kirsty; Haller, Michael J; Wasserfall, Clive H; Schatz, Desmond A; Greenbaum, Carla J; Dorrell, Craig; Grompe, Markus; Zick, Aviad; Hubert, Ayala; Maoz, Myriam; Fendrich, Volker; Bartsch, Detlef K; Golan, Talia; Ben Sasson, Shmuel A; Zamir, Gideon; Razin, Aharon; Cedar, Howard; Shapiro, A M James; Glaser, Benjamin; Shemer, Ruth; Dor, Yuval

    2016-03-29

    Minimally invasive detection of cell death could prove an invaluable resource in many physiologic and pathologic situations. Cell-free circulating DNA (cfDNA) released from dying cells is emerging as a diagnostic tool for monitoring cancer dynamics and graft failure. However, existing methods rely on differences in DNA sequences in source tissues, so that cell death cannot be identified in tissues with a normal genome. We developed a method of detecting tissue-specific cell death in humans based on tissue-specific methylation patterns in cfDNA. We interrogated tissue-specific methylome databases to identify cell type-specific DNA methylation signatures and developed a method to detect these signatures in mixed DNA samples. We isolated cfDNA from plasma or serum of donors, treated the cfDNA with bisulfite, PCR-amplified the cfDNA, and sequenced it to quantify cfDNA carrying the methylation markers of the cell type of interest. Pancreatic β-cell DNA was identified in the circulation of patients with recently diagnosed type-1 diabetes and islet-graft recipients; oligodendrocyte DNA was identified in patients with relapsing multiple sclerosis; neuronal/glial DNA was identified in patients after traumatic brain injury or cardiac arrest; and exocrine pancreas DNA was identified in patients with pancreatic cancer or pancreatitis. This proof-of-concept study demonstrates that the tissue origins of cfDNA and thus the rate of death of specific cell types can be determined in humans. The approach can be adapted to identify cfDNA derived from any cell type in the body, offering a minimally invasive window for diagnosing and monitoring a broad spectrum of human pathologies as well as providing a better understanding of normal tissue dynamics.

  17. High-throughput DNA sequencing: a genomic data manufacturing process.

    Science.gov (United States)

    Huang, G M

    1999-01-01

    The progress trends in automated DNA sequencing operation are reviewed. Technological development in sequencing instruments, enzymatic chemistry and robotic stations has resulted in ever-increasing capacity of sequence data production. This progress leads to a higher demand on laboratory information management and data quality assessment. High-throughput laboratories face the challenge of organizational management, as well as technology management. Engineering principles of process control should be adopted in this biological data manufacturing procedure. While various systems attempt to provide solutions to automate different parts of, or even the entire process, new technical advances will continue to change the paradigm and provide new challenges.

  18. RNA-DNA sequence differences spell genetic code ambiguities

    DEFF Research Database (Denmark)

    Bentin, Thomas; Nielsen, Michael L

    2013-01-01

    A recent paper in Science by Li et al. 2011(1) reports widespread sequence differences in the human transcriptome between RNAs and their encoding genes termed RNA-DNA differences (RDDs). The findings could add a new layer of complexity to gene expression but the study has been criticized. ...

  19. Functionalized nanopore-embedded electrodes for rapid DNA sequencing

    CERN Document Server

    He, Haiying; Pandey, Ravindra; Rocha, Alexandre Reily; Sanvito, Stefano; Grigoriev, Anton; Ahuja, Rajeev; Karna, Shashi P

    2007-01-01

    The determination of a patient's DNA sequence can, in principle, reveal an increased risk to fall ill with particular diseases [1,2] and help to design "personalized medicine" [3]. Moreover, statistical studies and comparison of genomes [4] of a large number of individuals are crucial for the analysis of mutations [5] and hereditary diseases, paving the way to preventive medicine [6]. DNA sequencing is, however, currently still a vastly time-consuming and very expensive task [4], consisting of pre-processing steps, the actual sequencing using the Sanger method, and post-processing in the form of data analysis [7]. Here we propose a new approach that relies on functionalized nanopore-embedded electrodes to achieve an unambiguous distinction of the four nucleic acid bases in the DNA sequencing process. This represents a significant improvement over previously studied designs [8,9] which cannot reliably distinguish all four bases of DNA. The transport properties of the setup investigated by us, employing state-o...

  20. POSA : Perl objects for DNA sequencing data analysis

    NARCIS (Netherlands)

    Aerts, JA; Jungerius, BJ; Groenen, MA

    2004-01-01

    Background: Capillary DNA sequencing machines allow the generation of vast amounts of data with little hands-on time. With this expansion of data generation, there is a growing need for automated data processing. Most available software solutions, however, still require user intervention or provide

  1. POSA: perl objects for DNA sequencing data analysis

    NARCIS (Netherlands)

    Aerts, J.A.; Jungerius, B.J.; Groenen, M.A.M.

    2004-01-01

    Background - Capillary DNA sequencing machines allow the generation of vast amounts of data with little hands-on time. With this expansion of data generation, there is a growing need for automated data processing. Most available software solutions, however, still require user intervention or provide

  2. DNA sequence handling programs in BASIC for home computers.

    OpenAIRE

    Biro, P A

    1984-01-01

    This paper describes a DNA sequence handling program written entirely in BASIC and designed to be run on an Atari home computer. Many of the features common to more sophisticated programs have been included. The advantage of this program are its convenience, its transportability and its potential for user modification. The disadvantages are lack of sophistication and speed.

  3. Decoding long nanopore sequencing reads of natural DNA.

    Science.gov (United States)

    Laszlo, Andrew H; Derrington, Ian M; Ross, Brian C; Brinkerhoff, Henry; Adey, Andrew; Nova, Ian C; Craig, Jonathan M; Langford, Kyle W; Samson, Jenny Mae; Daza, Riza; Doering, Kenji; Shendure, Jay; Gundlach, Jens H

    2014-08-01

    Nanopore sequencing of DNA is a single-molecule technique that may achieve long reads, low cost and high speed with minimal sample preparation and instrumentation. Here, we build on recent progress with respect to nanopore resolution and DNA control to interpret the procession of ion current levels observed during the translocation of DNA through the pore MspA. As approximately four nucleotides affect the ion current of each level, we measured the ion current corresponding to all 256 four-nucleotide combinations (quadromers). This quadromer map is highly predictive of ion current levels of previously unmeasured sequences derived from the bacteriophage phi X 174 genome. Furthermore, we show nanopore sequencing reads of phi X 174 up to 4,500 bases in length, which can be unambiguously aligned to the phi X 174 reference genome, and demonstrate proof-of-concept utility with respect to hybrid genome assembly and polymorphism detection. This work provides a foundation for nanopore sequencing of long, natural DNA strands.

  4. POSA : Perl objects for DNA sequencing data analysis

    NARCIS (Netherlands)

    Aerts, JA; Jungerius, BJ; Groenen, MA

    2004-01-01

    Background: Capillary DNA sequencing machines allow the generation of vast amounts of data with little hands-on time. With this expansion of data generation, there is a growing need for automated data processing. Most available software solutions, however, still require user intervention or provide

  5. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.

    Science.gov (United States)

    Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook

    2014-11-01

    As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of

  6. A simple method encoding linear single strain DNA sequence with natural numbers

    Institute of Scientific and Technical Information of China (English)

    LI Jiye; XU Yuan; ZHANG Wang

    2008-01-01

    A simple method presenting linear single strain DNA (LssDNA) sequence with natural numbers is introduced in this paper. The method presents LssDNA correspondingly with the numerals 1, 2, 3 and 4. After calculation, the sequence can be coded in natural numbers which can also be decoded into the DNA sequence. Thus, an LssDNA sequence can be expressed in a natural number and a dot at coordinate axes. In the future, a new LssDNA sequences database termed "DotBank" would be realized in which each LssDNA sequence is determined as a dot.

  7. Sequence heterogeneity accelerates protein search for targets on DNA

    Energy Technology Data Exchange (ETDEWEB)

    Shvets, Alexey A.; Kolomeisky, Anatoly B., E-mail: tolya@rice.edu [Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005 (United States)

    2015-12-28

    The process of protein search for specific binding sites on DNA is fundamentally important since it marks the beginning of all major biological processes. We present a theoretical investigation that probes the role of DNA sequence symmetry, heterogeneity, and chemical composition in the protein search dynamics. Using a discrete-state stochastic approach with a first-passage events analysis, which takes into account the most relevant physical-chemical processes, a full analytical description of the search dynamics is obtained. It is found that, contrary to existing views, the protein search is generally faster on DNA with more heterogeneous sequences. In addition, the search dynamics might be affected by the chemical composition near the target site. The physical origins of these phenomena are discussed. Our results suggest that biological processes might be effectively regulated by modifying chemical composition, symmetry, and heterogeneity of a genome.

  8. Solid-State Nanopore-Based DNA Sequencing Technology

    Directory of Open Access Journals (Sweden)

    Zewen Liu

    2016-01-01

    Full Text Available The solid-state nanopore-based DNA sequencing technology is becoming more and more attractive for its brand new future in gene detection field. The challenges that need to be addressed are diverse: the effective methods to detect base-specific signatures, the control of the nanopore’s size and surface properties, and the modulation of translocation velocity and behavior of the DNA molecules. Among these challenges, the realization of the high-quality nanopores with the help of modern micro/nanofabrication technologies is a crucial one. In this paper, typical technologies applied in the field of solid-state nanopore-based DNA sequencing have been reviewed.

  9. Electronic density of states in sequence dependent DNA molecules

    Science.gov (United States)

    de Oliveira, B. P. W.; Albuquerque, E. L.; Vasconcelos, M. S.

    2006-09-01

    We report in this work a numerical study of the electronic density of states (DOS) in π-stacked arrays of DNA single-strand segments made up from the nucleotides guanine G, adenine A, cytosine C and thymine T, forming a Rudin-Shapiro (RS) as well as a Fibonacci (FB) polyGC quasiperiodic sequences. Both structures are constructed starting from a G nucleotide as seed and following their respective inflation rules. Our theoretical method uses Dyson's equation together with a transfer-matrix treatment, within an electronic tight-binding Hamiltonian model, suitable to describe the DNA segments modelled by the quasiperiodic chains. We compared the DOS spectra found for the quasiperiodic structure to those using a sequence of natural DNA, as part of the human chromosome Ch22, with a remarkable concordance, as far as the RS structure is concerned. The electronic spectrum shows several peaks, corresponding to localized states, as well as a striking self-similar aspect.

  10. A blind testing design for authenticating ancient DNA sequences.

    Science.gov (United States)

    Yang, H; Golenberg, E M; Shoshani, J

    1997-04-01

    Reproducibility is a serious concern among researchers of ancient DNA. We designed a blind testing procedure to evaluate laboratory accuracy and authenticity of ancient DNA obtained from closely related extant and extinct species. Soft tissue and bones of fossil and contemporary museum proboscideans were collected and identified based on morphology by one researcher, and other researchers carried out DNA testing on the samples, which were assigned anonymous numbers. DNA extracted using three principal isolation methods served as template in PCR amplifications of a segment of the cytochrome b gene (mitochondrial genome), and the PCR product was directly sequenced and analyzed. The results show that such a blind testing design performed in one laboratory, when coupled with phylogenetic analysis, can nonarbitrarily test the consistency and reliability of ancient DNA results. Such reproducible results obtained from the blind testing can increase confidence in the authenticity of ancient sequences obtained from postmortem specimens and avoid bias in phylogenetic analysis. A blind testing design may be applicable as an alternative to confirm ancient DNA results in one laboratory when independent testing by two laboratories is not available.

  11. POSA: Perl Objects for DNA Sequencing Data Analysis

    Directory of Open Access Journals (Sweden)

    Jungerius Bart J

    2004-08-01

    Full Text Available Abstract Background Capillary DNA sequencing machines allow the generation of vast amounts of data with little hands-on time. With this expansion of data generation, there is a growing need for automated data processing. Most available software solutions, however, still require user intervention or provide modules that need advanced informatics skills to allow implementation in pipelines. Results Here we present POSA, a pair of new perl objects that describe DNA sequence traces and Phrap contig assemblies in detail. Methods included in POSA include basecalling with quality scores (by Phred, contig assembly (by Phrap, generation of primer3 input and automated SNP annotation (by PolyPhred. Although easily implemented by users with only limited programming experience, these objects considerabily reduce hands-on analysis time compared to using the Staden package for extracting sequence information from raw sequencing files and for SNP discovery. Conclusions The POSA objects allow a flexible and easy design, implementation and usage of perl-based pipelines to handle and analyze DNA sequencing data, while requiring only minor programming skills.

  12. DNA watermarks in non-coding regulatory sequences

    Directory of Open Access Journals (Sweden)

    Pyka Martin

    2009-07-01

    Full Text Available Abstract Background DNA watermarks can be applied to identify the unauthorized use of genetically modified organisms. It has been shown that coding regions can be used to encrypt information into living organisms by using the DNA-Crypt algorithm. Yet, if the sequence of interest presents a non-coding DNA sequence, either the function of a resulting functional RNA molecule or a regulatory sequence, such as a promoter, could be affected. For our studies we used the small cytoplasmic RNA 1 in yeast and the lac promoter region of Escherichia coli. Findings The lac promoter was deactivated by the integrated watermark. In addition, the RNA molecules displayed altered configurations after introducing a watermark, but surprisingly were functionally intact, which has been verified by analyzing the growth characteristics of both wild type and watermarked scR1 transformed yeast cells. In a third approach we introduced a second overlapping watermark into the lac promoter, which did not affect the promoter activity. Conclusion Even though the watermarked RNA and one of the watermarked promoters did not show any significant differences compared to the wild type RNA and wild type promoter region, respectively, it cannot be generalized that other RNA molecules or regulatory sequences behave accordingly. Therefore, we do not recommend integrating watermark sequences into regulatory regions.

  13. DNA sequence analysis of newly formed telomeres in yeast.

    Science.gov (United States)

    Wang, S S; Pluta, A F; Zakian, V A

    1989-01-01

    A plasmid can be maintained in linear form in baker's yeast if it bears telomeric sequences at each end. Linear plasmids bearing cloned telomeric C4A4 repeats at one end (test end) and a natural DNA terminus with approximately 300 bps of C4A2 repeats at the other or control end were introduced by transformation into yeast. Test-end termini of 28 to 112 bps supported telomere formation. During telomere formation, C4A2 repeats were often transferred to test-end termini. To determine in greater detail the fate of test-end sequences on these plasmids after propagation in yeast, test-end telomeres were subcloned into E. coli and sequenced. DNA sequencing established a number of points about the molecular events involved in telomere formation in yeast. The results suggest that there are at least two mechanisms for telomere formation in yeast. One is mediated by a recombination event that requires neither a long stretch of homology nor the RAD52 gene product. The other mechanism is by addition of C1-3A repeats to the termini of linear DNA molecules. The telomeric sequence required to support C1-3A addition need not be at the very end of a molecule for telomere formation.

  14. A Nano-Biosensor for DNA Sequence Detection Using Absorption Spectra of SWNT-DNA Composite

    Directory of Open Access Journals (Sweden)

    J. Bansal

    2011-01-01

    Full Text Available A biosensor based on Single Walled Carbon Nanotube (SWNT-Poly (GTn ssDNA hybrid has been developed for medical diagnostics. The absorption spectrum of this assay is determined with the help of a Shimadzu UV-VIS-NIR spectrophotometer. Two distinct bands each containing three peaks corresponding to first and second van Hove singularities in the density of states of the nanotubes were observed in the absorption spectrum. When a single-stranded DNA (ssDNA having a sequence complementary to probic DNA is added to the ssDNA-SWNT conjugates, hybridization takes place, which causes the red shift of absorption spectrum of nanotubes. On the other hand, when the DNA is noncomplementary, no shift in the absorption spectrum occurs since hybridization between the DNA and probe does not take place. The red shifting of the spectrum is considered to be due to change in the dielectric environment around nanotubes.

  15. Environmental DNA sequencing primers for eutardigrades and bdelloid rotifers

    Directory of Open Access Journals (Sweden)

    Martin Andrew P

    2009-12-01

    Full Text Available Abstract Background The time it takes to isolate individuals from environmental samples and then extract DNA from each individual is one of the problems with generating molecular data from meiofauna such as eutardigrades and bdelloid rotifers. The lack of consistent morphological information and the extreme abundance of these classes makes morphological identification of rare, or even common cryptic taxa a large and unwieldy task. This limits the ability to perform large-scale surveys of the diversity of these organisms. Here we demonstrate a culture-independent molecular survey approach that enables the generation of large amounts of eutardigrade and bdelloid rotifer sequence data directly from soil. Our PCR primers, specific to the 18s small-subunit rRNA gene, were developed for both eutardigrades and bdelloid rotifers. Results The developed primers successfully amplified DNA of their target organism from various soil DNA extracts. This was confirmed by both the BLAST similarity searches and phylogenetic analyses. Tardigrades showed much better phylogenetic resolution than bdelloids. Both groups of organisms exhibited varying levels of endemism. Conclusion The development of clade-specific primers for characterizing eutardigrades and bdelloid rotifers from environmental samples should greatly increase our ability to characterize the composition of these taxa in environmental samples. Environmental sequencing as shown here differs from other molecular survey methods in that there is no need to pre-isolate the organisms of interest from soil in order to amplify their DNA. The DNA sequences obtained from methods that do not require culturing can be identified post-hoc and placed phylogenetically as additional closely related sequences are obtained from morphologically identified conspecifics. Our non-cultured environmental sequence based approach will be able to provide a rapid and large-scale screening of the presence, absence and diversity of

  16. Environmental DNA sequencing primers for eutardigrades and bdelloid rotifers

    Science.gov (United States)

    2009-01-01

    Background The time it takes to isolate individuals from environmental samples and then extract DNA from each individual is one of the problems with generating molecular data from meiofauna such as eutardigrades and bdelloid rotifers. The lack of consistent morphological information and the extreme abundance of these classes makes morphological identification of rare, or even common cryptic taxa a large and unwieldy task. This limits the ability to perform large-scale surveys of the diversity of these organisms. Here we demonstrate a culture-independent molecular survey approach that enables the generation of large amounts of eutardigrade and bdelloid rotifer sequence data directly from soil. Our PCR primers, specific to the 18s small-subunit rRNA gene, were developed for both eutardigrades and bdelloid rotifers. Results The developed primers successfully amplified DNA of their target organism from various soil DNA extracts. This was confirmed by both the BLAST similarity searches and phylogenetic analyses. Tardigrades showed much better phylogenetic resolution than bdelloids. Both groups of organisms exhibited varying levels of endemism. Conclusion The development of clade-specific primers for characterizing eutardigrades and bdelloid rotifers from environmental samples should greatly increase our ability to characterize the composition of these taxa in environmental samples. Environmental sequencing as shown here differs from other molecular survey methods in that there is no need to pre-isolate the organisms of interest from soil in order to amplify their DNA. The DNA sequences obtained from methods that do not require culturing can be identified post-hoc and placed phylogenetically as additional closely related sequences are obtained from morphologically identified conspecifics. Our non-cultured environmental sequence based approach will be able to provide a rapid and large-scale screening of the presence, absence and diversity of Bdelloidea and Eutardigrada in

  17. VoSeq: a voucher and DNA sequence web application.

    Directory of Open Access Journals (Sweden)

    Carlos Peña

    Full Text Available There is an ever growing number of molecular phylogenetic studies published, due to, in part, the advent of new techniques that allow cheap and quick DNA sequencing. Hence, the demand for relational databases with which to manage and annotate the amassing DNA sequences, genes, voucher specimens and associated biological data is increasing. In addition, a user-friendly interface is necessary for easy integration and management of the data stored in the database back-end. Available databases allow management of a wide variety of biological data. However, most database systems are not specifically constructed with the aim of being an organizational tool for researchers working in phylogenetic inference. We here report a new software facilitating easy management of voucher and sequence data, consisting of a relational database as back-end for a graphic user interface accessed via a web browser. The application, VoSeq, includes tools for creating molecular datasets of DNA or amino acid sequences ready to be used in commonly used phylogenetic software such as RAxML, TNT, MrBayes and PAUP, as well as for creating tables ready for publishing. It also has inbuilt BLAST capabilities against all DNA sequences stored in VoSeq as well as sequences in NCBI GenBank. By using mash-ups and calls to web services, VoSeq allows easy integration with public services such as Yahoo! Maps, Flickr, Encyclopedia of Life (EOL and GBIF (by generating data-dumps that can be processed with GBIF's Integrated Publishing Toolkit.

  18. Cloning, characterization, and properties of seven triplet repeat DNA sequences.

    Science.gov (United States)

    Ohshima, K; Kang, S; Larson, J E; Wells, R D

    1996-07-12

    Several neuromuscular and neurodegenerative diseases are caused by genetically unstable triplet repeat sequences (CTG.CAG, CGG.CCG, or AAG.CTT) in or near the responsible genes. We implemented novel cloning strategies with chemically synthesized oligonucleotides to clone seven of the triplet repeat sequences (GTA.TAC, GAT.ATC, GTT.AAC, CAC.GTG, AGG.CCT, TCG.CGA, and AAG.CTT), and the adjoining paper (Ohshima, K., Kang, S., Larson, J. E., and Wells, R. D.(1996) J. Biol. Chem. 271, 16784-16791) describes studies on TTA.TAA. This approach in conjunction with in vivo expansion studies in Escherichia coli enabled the preparation of at least 81 plasmids containing the repeat sequences with lengths of approximately 16 up to 158 triplets in both orientations with varying extents of polymorphisms. The inserts were characterized by DNA sequencing as well as DNA polymerase pausings, two-dimensional agarose gel electrophoresis, and chemical probe analyses to evaluate the capacity to adopt negative supercoil induced non-B DNA conformations. AAG.CTT and AGG.CCT form intramolecular triplexes, and the other five repeat sequences do not form any previously characterized non-B structures. However, long tracts of TCG.CGA showed strong inhibition of DNA synthesis at specific loci in the repeats as seen in the cases of CTG.CAG and CGG.CCG (Kang, S., Ohshima, K., Shimizu, M., Amirhaeri, S., and Wells, R. D.(1995) J. Biol. Chem. 270, 27014-27021). This work along with other studies (Wells, R. D.(1996) J. Biol. Chem. 271, 2875-2878) on CTG.CAG, CGG.CCG, and TTA.TAA makes available long inserts of all 10 triplet repeat sequences for a variety of physical, molecular biological, genetic, and medical investigations. A model to explain the reduction in mRNA abundance in Friedreich's ataxia based on intermolecular triplex formation is proposed.

  19. Scalable lithography from Natural DNA Patterns via polyacrylamide gel

    National Research Council Canada - National Science Library

    Qu, JieHao; Hou, XianLiang; Fan, WanChao; Xi, GuangHui; Diao, HongYan; Liu, XiangDon

    2015-01-01

    ...) that controllably and precisely shrinks and swells with water content. Aligned patterns of natural DNA molecules were prepared by evaporative self-assembly on a PMMA substrate, and were transferred to unsaturated polyester resin (UPR...

  20. Perspectives of DNA microarray and next-generation DNA sequencing technologies

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    DNA microarray and next-generation DNA sequencing technologies are important tools for high-throughput genome research,in revealing both the structural and functional characteristics of genomes.In the past decade the DNA microarray technologies have been widely applied in the studies of functional genomics,systems biology and pharmacogenomics.The next-generation DNA sequencing method was first introduced by the 454 Company in 2003,immediately followed by the establishment of the Solexa and Solid techniques by other biotech companies.Though it has not been long since the first emergence of this technology,with the fast and impressive improvement,the application of this technology has extended to almost all fields of genomics research,as a rival challenging the existing DNA microarray technology.This paper briefly reviews the working principles of these two technologies as well as their application and perspectives in genome research.

  1. Recent developments in sequence selective minor groove DNA effectors.

    Science.gov (United States)

    Reddy, B S; Sharma, S K; Lown, J W

    2001-04-01

    DNA is a well characterized intracellular target but its large size and sequential nature make it an elusive target for selective drug action. Binding of low molecular weight ligands to DNA causes a wide variety of potential biological responses. In this respect the main consideration is given to recent developments in DNA sequence selective binding agents bearing conjugated effectors because of their potential application in diagnosis and treatment of cancers as well as in molecular biology. Recent progress in the development of cross linked lexitropsin oligopeptides and hairpins, which bind selectively to the minor groove of duplex DNA, is discussed. Bis-distamycins and related lexitropsins show inhibitory activity against HIV-1 and HIV-2 integrases at low nanomolar concentrations. Benzoyl nitrogen mustard analogs of lexitropsins are active against a variety of tumor models. Certain of the bis-benzimidazoles show altered DNA sequence preference and bind to DNA at 5'CG and TG sequences rather than at the preferred AT sites of the parent drug. A comparison of bifunctional bizelesin with monoalkylating adozelesin shows that it appears to have an increased sequence selectivity such that monoalkylating compounds react at more than one site but bizelesin reacts only at sites where there are two suitably positioned alkylation sites. Adozelesin, bizelesin and carzelesin are far more potent as cytotoxic agents than cisplatin or doxorubicin. A new class of 1,2,9,9a-tetrahydrocyclo-propa[c]benz[e]indole-4-one (CBI) analogs i.e., CBI-lexitropsin conjugates arising from the latter leads are also discussed.A number of cyclopropylpyrroloindole (CPI) and CBI-lexitropsin conjugates related to CC-1065 alkylate at the N3 position of adenine in the minor groove of DNA in a sequence specific manner, and also show cytotoxicities in the femtomolar range. The cross linking efficiency of PBD dimers is much greater than that of other cross linkers including cisplatin, and melphalan. A new

  2. [Characterization and modification of phage T7 DNA polymerase for use in DNA sequencing]: Progress report

    Energy Technology Data Exchange (ETDEWEB)

    1992-01-01

    This project focuses on the DNA polymerase and accessory proteins of phage T7 for use in DNA sequence analysis. T7 DNA polymerase (gene 5 protein) interacts with accessory proteins for the acquisition of properties such as processivity that are necessary for DNA replication. One goal is to understand these interactions in order to modify the proteins to increase their usefulness with DNA sequence analysis. Using a genetically modified gene 5 protein lacking 3' to 5' exonuclease activity we have found that in the presence of manganese there is no discrimination against dideoxynucleotides, a property that enables novel approaches to DNA sequencing using automated technology. Pyrophosphorolysis can create problems in DNA sequence determination, a problem that can be eliminated by the addition of pyrophosphatase. Crystals of the gene 5 protein/thioredoxin complex have now been obtained and X-ray diffraction analysis will be undertaken once their quality has been improved. Amino acid changes in gene 5 protein have been identified that alter its interaction with thioredoxin. Characterization of these proteins should help determine how thioredoxin confers processivity on polymerization. We have characterized the 17 DNA binding protein, the gene 2.5 protein, and shown that it interacts with gene 5 protein and gene 4 protein. The gene 2.5 protein mediates homologous base pairing and strand uptake. Gene 5.5 protein interacts with E. coli Hl protein and affects gene expression. Biochemical and genetic studies on the T7 56-kDa gene 4 protein, the helicase, are focused on its physical interaction with T7 DNA polymerase and the mechanism by which the hydrolysis of nucleoside triphosphates fuels its unidirectional translocation on DNA.

  3. [Characterization and modification of phage T7 DNA polymerase for use in DNA sequencing]: Progress report

    Energy Technology Data Exchange (ETDEWEB)

    1992-12-31

    This project focuses on the DNA polymerase and accessory proteins of phage T7 for use in DNA sequence analysis. T7 DNA polymerase (gene 5 protein) interacts with accessory proteins for the acquisition of properties such as processivity that are necessary for DNA replication. One goal is to understand these interactions in order to modify the proteins to increase their usefulness with DNA sequence analysis. Using a genetically modified gene 5 protein lacking 3` to 5` exonuclease activity we have found that in the presence of manganese there is no discrimination against dideoxynucleotides, a property that enables novel approaches to DNA sequencing using automated technology. Pyrophosphorolysis can create problems in DNA sequence determination, a problem that can be eliminated by the addition of pyrophosphatase. Crystals of the gene 5 protein/thioredoxin complex have now been obtained and X-ray diffraction analysis will be undertaken once their quality has been improved. Amino acid changes in gene 5 protein have been identified that alter its interaction with thioredoxin. Characterization of these proteins should help determine how thioredoxin confers processivity on polymerization. We have characterized the 17 DNA binding protein, the gene 2.5 protein, and shown that it interacts with gene 5 protein and gene 4 protein. The gene 2.5 protein mediates homologous base pairing and strand uptake. Gene 5.5 protein interacts with E. coli Hl protein and affects gene expression. Biochemical and genetic studies on the T7 56-kDa gene 4 protein, the helicase, are focused on its physical interaction with T7 DNA polymerase and the mechanism by which the hydrolysis of nucleoside triphosphates fuels its unidirectional translocation on DNA.

  4. Next generation sequencing of DNA-launched Chikungunya vaccine virus

    Energy Technology Data Exchange (ETDEWEB)

    Hidajat, Rachmat; Nickols, Brian [Medigen, Inc., 8420 Gas House Pike, Suite S, Frederick, MD 21701 (United States); Forrester, Naomi [Institute for Human Infections and Immunity, Sealy Center for Vaccine Development and Department of Pathology, University of Texas Medical Branch, GNL, 301 University Blvd., Galveston, TX 77555 (United States); Tretyakova, Irina [Medigen, Inc., 8420 Gas House Pike, Suite S, Frederick, MD 21701 (United States); Weaver, Scott [Institute for Human Infections and Immunity, Sealy Center for Vaccine Development and Department of Pathology, University of Texas Medical Branch, GNL, 301 University Blvd., Galveston, TX 77555 (United States); Pushko, Peter, E-mail: ppushko@medigen-usa.com [Medigen, Inc., 8420 Gas House Pike, Suite S, Frederick, MD 21701 (United States)

    2016-03-15

    Chikungunya virus (CHIKV) represents a pandemic threat with no approved vaccine available. Recently, we described a novel vaccination strategy based on iDNA® infectious clone designed to launch a live-attenuated CHIKV vaccine from plasmid DNA in vitro or in vivo. As a proof of concept, we prepared iDNA plasmid pCHIKV-7 encoding the full-length cDNA of the 181/25 vaccine. The DNA-launched CHIKV-7 virus was prepared and compared to the 181/25 virus. Illumina HiSeq2000 sequencing revealed that with the exception of the 3′ untranslated region, CHIKV-7 viral RNA consistently showed a lower frequency of single-nucleotide polymorphisms than the 181/25 RNA including at the E2-12 and E2-82 residues previously identified as attenuating mutations. In the CHIKV-7, frequencies of reversions at E2-12 and E2-82 were 0.064% and 0.086%, while in the 181/25, frequencies were 0.179% and 0.133%, respectively. We conclude that the DNA-launched virus has a reduced probability of reversion mutations, thereby enhancing vaccine safety. - Highlights: • Chikungunya virus (CHIKV) is an emerging pandemic threat. • In vivo DNA-launched attenuated CHIKV is a novel vaccine technology. • DNA-launched virus was sequenced using HiSeq2000 and compared to the 181/25 virus. • DNA-launched virus has lower frequency of SNPs at E2-12 and E2-82 attenuation loci.

  5. Heterogeneity of DNA Distribution Pattern in Renal Tumours

    Directory of Open Access Journals (Sweden)

    Hans Nenning

    1997-01-01

    Full Text Available The presence of intratumoural heterogeneity in DNA distribution patterns has been accepted. However, most previous studies have not taken this fact into consideration. The value of DNA cytometry depends on its reproducibility. This could be influenced by heterogeneity failure. The aim of the present study is to evaluate intratumoural heterogeneity in renal cell cancer.

  6. Patterns of selective constraints in noncoding DNA of rice

    Directory of Open Access Journals (Sweden)

    Keightley Peter D

    2007-11-01

    Full Text Available Abstract Background Several studies have investigated the relationships between selective constraints in introns and their length, GC content and location within genes. To date, however, no such investigation has been done in plants. Studies of selective constraints in noncoding DNA have generally involved interspecific comparisons, under the assumption of the same selective pressures acting in each lineage. Such comparisons are limited to cases in which the noncoding sequences are not too strongly diverged so that reliable sequence alignments can be obtained. Here, we investigate selective constraints in a recent segmental duplication that includes 605 paralogous intron pairs that occurred about 7 million years ago in rice (O. sativa. Results Our principal findings are: (1 intronic divergence is negatively correlated with intron length, a pattern that has previously been described in Drosophila and mammals; (2 there is a signature of strong purifying selection at splice control sites; (3 first introns are significantly longer and have a higher GC content than other introns; (4 the divergences of first and non-first introns are not significantly different from one another, a pattern that differs from Drosophila and mammals; and (5 short introns are more diverged than four-fold degenerate sites suggesting that selection reduces divergence at four-fold sites. Conclusion Our observation of stronger selective constraints in long introns suggests that functional elements subject to purifying selection may be concentrated within long introns. Our results are consistent with the presence of strong purifying selection at splicing control sites. Selective constraints are not significantly stronger in first introns of rice, as they are in other species.

  7. Delineating relative homogeneous G+C domains in DNA sequences.

    Science.gov (United States)

    Li, W

    2001-10-03

    The concept of homogeneity of G+C content is always relative and subjective. This point is emphasized and quantified in this paper using a simple example of one sequence segmented into two subsequences. Whether the sequence is homogeneous or not can be answered by whether the two-subsequence model describes the DNA sequence better than the one-sequence model. There are at least three equivalent ways of looking at the 1-to-2 segmentation: Jensen-Shannon divergence measure, log likelihood ratio test, and model selection using Bayesian information criterion. Once a criterion is chosen, a DNA sequence can be recursively segmented into multiple domains. We use one subjective criterion called segmentation strength based on the Bayesian information criterion. Whether or not a sequence is homogeneous and how many domains it has depend on this criterion. We compare six different genome sequences (yeast S. cerevisiae chromosome III and IV, bacterium M. pneumoniae, human major histocompatibility complex sequence, longest contigs in human chromosome 21 and 22) by recursive segmentations at different strength criteria. Results by recursive segmentation confirm that yeast chromosome IV is more homogeneous than yeast chromosome III, human chromosome 21 is more homogeneous than human chromosome 22, and bacterial genomes may not be homogeneous due to short segments with distinct base compositions. The recursive segmentation also provides a quantitative criterion for identifying isochores in human sequences. Some features of our recursive segmentation, such as the possibility of delineating domain borders accurately, are superior to those of the moving-window approach commonly used in such analyses.

  8. Automated parallel DNA sequencing on multiple channel microchips.

    Science.gov (United States)

    Liu, S; Ren, H; Gao, Q; Roach, D J; Loder, R T; Armstrong, T M; Mao, Q; Blaga, I; Barker, D L; Jovanovich, S B

    2000-05-09

    We report automated DNA sequencing in 16-channel microchips. A microchip prefilled with sieving matrix is aligned on a heating plate affixed to a movable platform. Samples are loaded into sample reservoirs by using an eight-tip pipetting device, and the chip is docked with an array of electrodes in the focal plane of a four-color scanning detection system. Under computer control, high voltage is applied to the appropriate reservoirs in a programmed sequence that injects and separates the DNA samples. An integrated four-color confocal fluorescent detector automatically scans all 16 channels. The system routinely yields more than 450 bases in 15 min in all 16 channels. In the best case using an automated base-calling program, 543 bases have been called at an accuracy of >99%. Separations, including automated chip loading and sample injection, normally are completed in less than 18 min. The advantages of DNA sequencing on capillary electrophoresis chips include uniform signal intensity and tolerance of high DNA template concentration. To understand the fundamentals of these unique features we developed a theoretical treatment of cross-channel chip injection that we call the differential concentration effect. We present experimental evidence consistent with the predictions of the theory.

  9. Impacts of degraded DNA on restriction enzyme associated DNA sequencing (RADSeq).

    Science.gov (United States)

    Graham, Carly F; Glenn, Travis C; McArthur, Andrew G; Boreham, Douglas R; Kieran, Troy; Lance, Stacey; Manzon, Richard G; Martino, Jessica A; Pierson, Todd; Rogers, Sean M; Wilson, Joanna Y; Somers, Christopher M

    2015-11-01

    Degraded DNA from suboptimal field sampling is common in molecular ecology. However, its impact on techniques that use restriction site associated next-generation DNA sequencing (RADSeq, GBS) is unknown. We experimentally examined the effects of in situDNA degradation on data generation for a modified double-digest RADSeq approach (3RAD). We generated libraries using genomic DNA serially extracted from the muscle tissue of 8 individual lake whitefish (Coregonus clupeaformis) following 0-, 12-, 48- and 96-h incubation at room temperature posteuthanasia. This treatment of the tissue resulted in input DNA that ranged in quality from nearly intact to highly sheared. All samples were sequenced as a multiplexed pool on an Illumina MiSeq. Libraries created from low to moderately degraded DNA (12-48 h) performed well. In contrast, the number of RADtags per individual, number of variable sites, and percentage of identical RADtags retained were all dramatically reduced when libraries were made using highly degraded DNA (96-h group). This reduction in performance was largely due to a significant and unexpected loss of raw reads as a result of poor quality scores. Our findings remained consistent after changes in restriction enzymes, modified fold coverage values (2- to 16-fold), and additional read-length trimming. We conclude that starting DNA quality is an important consideration for RADSeq; however, the approach remains robust until genomic DNA is extensively degraded.

  10. Assessing the fidelity of ancient DNA sequences amplified from nuclear genes

    DEFF Research Database (Denmark)

    Binladen, Jonas; Wiuf, Carsten Henrik; Gilbert, M. Thomas P.

    2006-01-01

    To date, the field of ancient DNA has relied almost exclusively on mitochondrial DNA (mtDNA) sequences. However, a number of recent studies have reported the successful recovery of ancient nuclear DNA (nuDNA) sequences, thereby allowing the characterization of genetic loci directly involved in ph...

  11. Pattern matching in indeterminate and Arc-annotated sequences.

    Science.gov (United States)

    Aumi, Md Tanvir Islam; Moosa, Tanaeem M; Rahman, M Sohel

    2013-08-01

    In this paper, we present efficient algorithms for finding indeterminate Arc-Annotated patterns in indeterminate Arc-Annotated references. Our algorithms run in O(m+ (nm) w) time where n and m are respectively the length of our reference and pattern strings and w is the target machine word size. Here we have assumed the alphabet size to be constant, because, indeterminate Arc-Annotated sequences are used to model biological sequences. Clearly, for short patterns, our algorithms run in linear time and efficient algorithms for matching short patterns to reference genomes have huge applications in practical settings. We have also applied our algorithms to scan the ncRNAs without pseudoknots. We scanned three whole human chromosomes and it took only 2.5 - 4 minutes to scan one whole chromosome for an ncRNA family. Some relevant patents are discussed in.

  12. Fast and low-cost structured light pattern sequence projection.

    Science.gov (United States)

    Wissmann, Patrick; Forster, Frank; Schmitt, Robert

    2011-11-21

    We present a high-speed and low-cost approach for structured light pattern sequence projection. Using a fast rotating binary spatial light modulator, our method is potentially capable of projection frequencies in the kHz domain, while enabling pattern rasterization as low as 2 μm pixel size and inherently linear grayscale reproduction quantized at 12 bits/pixel or better. Due to the circular arrangement of the projected fringe patterns, we extend the widely used ray-plane triangulation method to ray-cone triangulation and provide a detailed description of the optical calibration procedure. Using the proposed projection concept in conjunction with the recently published coded phase shift (CPS) pattern sequence, we demonstrate high accuracy 3-D measurement at 200 Hz projection frequency and 20 Hz 3-D reconstruction rate.

  13. Early Lyme disease with spirochetemia - diagnosed by DNA sequencing

    Directory of Open Access Journals (Sweden)

    Jones William

    2010-11-01

    Full Text Available Abstract Background A sensitive and analytically specific nucleic acid amplification test (NAAT is valuable in confirming the diagnosis of early Lyme disease at the stage of spirochetemia. Findings Venous blood drawn from patients with clinical presentations of Lyme disease was tested for the standard 2-tier screen and Western Blot serology assay for Lyme disease, and also by a nested polymerase chain reaction (PCR for B. burgdorferi sensu lato 16S ribosomal DNA. The PCR amplicon was sequenced for B. burgdorferi genomic DNA validation. A total of 130 patients visiting emergency room (ER or Walk-in clinic (WALKIN, and 333 patients referred through the private physicians' offices were studied. While 5.4% of the ER/WALKIN patients showed DNA evidence of spirochetemia, none (0% of the patients referred from private physicians' offices were DNA-positive. In contrast, while 8.4% of the patients referred from private physicians' offices were positive for the 2-tier Lyme serology assay, only 1.5% of the ER/WALKIN patients were positive for this antibody test. The 2-tier serology assay missed 85.7% of the cases of early Lyme disease with spirochetemia. The latter diagnosis was confirmed by DNA sequencing. Conclusion Nested PCR followed by automated DNA sequencing is a valuable supplement to the standard 2-tier antibody assay in the diagnosis of early Lyme disease with spirochetemia. The best time to test for Lyme spirochetemia is when the patients living in the Lyme disease endemic areas develop unexplained symptoms or clinical manifestations that are consistent with Lyme disease early in the course of their illness.

  14. Finding discriminative and interpretable patterns in sequences of surgical activities.

    Science.gov (United States)

    Forestier, Germain; Petitjean, François; Senin, Pavel; Riffaud, Laurent; Henaux, Pierre-Louis; Jannin, Pierre

    2017-09-21

    Surgery is one of the riskiest and most important medical acts that is performed today. Understanding the ways in which surgeries are similar or different from each other is of major interest to understand and analyze surgical behaviors. This article addresses the issue of identifying discriminative patterns of surgical practice from recordings of surgeries. These recordings are sequences of low-level surgical activities representing the actions performed by surgeons during surgeries. To discover patterns that are specific to a group of surgeries, we use the vector space model (VSM) which is originally an algebraic model for representing text documents. We split long sequences of surgical activities into subsequences of consecutive activities. We then compute the relative frequencies of these subsequences using the tf*idf framework and we use the Cosine similarity to classify the sequences. This process makes it possible to discover which patterns discriminate one set of surgeries recordings from another set. Experiments were performed on 40 neurosurgeries of anterior cervical discectomy (ACD). The results demonstrate that our method accurately identifies patterns that can discriminate between (1) locations where the surgery took place, (2) levels of expertise of surgeons (i.e., expert vs. intermediate) and even (3) individual surgeons who performed the intervention. We also show how the tf*idf weight vector can be used to both visualize the most interesting patterns and to highlight the parts of a given surgery that are the most interesting. Identifying patterns that discriminate groups of surgeon is a very important step in improving the understanding of surgical processes. The proposed method finds discriminative and interpretable patterns in sequences of surgical activities. Our approach provides intuitive results, as it identifies automatically the set of patterns explaining the differences between the groups. Copyright © 2017 Elsevier B.V. All rights

  15. A CLIQUE algorithm using DNA computing techniques based on closed-circle DNA sequences.

    Science.gov (United States)

    Zhang, Hongyan; Liu, Xiyu

    2011-07-01

    DNA computing has been applied in broad fields such as graph theory, finite state problems, and combinatorial problem. DNA computing approaches are more suitable used to solve many combinatorial problems because of the vast parallelism and high-density storage. The CLIQUE algorithm is one of the gird-based clustering techniques for spatial data. It is the combinatorial problem of the density cells. Therefore we utilize DNA computing using the closed-circle DNA sequences to execute the CLIQUE algorithm for the two-dimensional data. In our study, the process of clustering becomes a parallel bio-chemical reaction and the DNA sequences representing the marked cells can be combined to form a closed-circle DNA sequences. This strategy is a new application of DNA computing. Although the strategy is only for the two-dimensional data, it provides a new idea to consider the grids to be vertexes in a graph and transform the search problem into a combinatorial problem.

  16. Short sequence effect of ancient DNA on mammoth phylogenetic analyses

    Institute of Scientific and Technical Information of China (English)

    Guilian SHENG; Lianjuan WU; Xindong HOU; Junxia YUAN; Shenghong CHENG; Bojian ZHONG; Xulong LAI

    2009-01-01

    The evolution of Elephantidae has been intensively studied in the past few years, especially after 2006. The molecular approaches have made great contribution to the assumption that the extinct woolly mammoth has a close relationship with the Asian elephant instead of the African elephant. In this study, partial ancient DNA sequences of cytochrome b (cyt b) gene in mitochondrial genome were successfully retrieved from Late Pleistocene Mammuthus primigenius bones collected from Heilongjiang Province in Northeast China. Both the partial and complete homologous cyt b gene sequences and the whole mitochondrial genome sequences extracted from GenBank were aligned and used as datasets for phylogenetic analyses. All of the phylogenetic trees, based on either the partial or the complete cyt b gene, reject the relationship constructed by the whole mitochondrial genome, showing the occurrence of an effect of sequence length of cyt b gene on mammoth phylogenetic analyses.

  17. Viral discovery and sequence recovery using DNA microarrays.

    Directory of Open Access Journals (Sweden)

    David Wang

    2003-11-01

    Full Text Available Because of the constant threat posed by emerging infectious diseases and the limitations of existing approaches used to identify new pathogens, there is a great demand for new technological methods for viral discovery. We describe herein a DNA microarray-based platform for novel virus identification and characterization. Central to this approach was a DNA microarray designed to detect a wide range of known viruses as well as novel members of existing viral families; this microarray contained the most highly conserved 70mer sequences from every fully sequenced reference viral genome in GenBank. During an outbreak of severe acute respiratory syndrome (SARS in March 2003, hybridization to this microarray revealed the presence of a previously uncharacterized coronavirus in a viral isolate cultivated from a SARS patient. To further characterize this new virus, approximately 1 kb of the unknown virus genome was cloned by physically recovering viral sequences hybridized to individual array elements. Sequencing of these fragments confirmed that the virus was indeed a new member of the coronavirus family. This combination of array hybridization followed by direct viral sequence recovery should prove to be a general strategy for the rapid identification and characterization of novel viruses and emerging infectious disease.

  18. Phylogenetic relationships of the Gomphales based on nuc-25S-rDNA, mit-12S-rDNA, and mit-atp6-DNA combined sequences

    Science.gov (United States)

    Admir J. Giachini; Kentaro Hosaka; Eduardo Nouhra; Joseph Spatafora; James M. Trappe

    2010-01-01

    Phylogenetic relationships among Geastrales, Gomphales, Hysterangiales, and Phallales were estimated via combined sequences: nuclear large subunit ribosomal DNA (nuc-25S-rDNA), mitochondrial small subunit ribosomal DNA (mit-12S-rDNA), and mitochondrial atp6 DNA (mit-atp6-DNA). Eighty-one taxa comprising 19 genera and 58 species...

  19. MtDNA mutation pattern in tumors and human evolution are shaped by similar selective constraints.

    Science.gov (United States)

    Zhidkov, Ilia; Livneh, Erez A; Rubin, Eitan; Mishmar, Dan

    2009-04-01

    Multiple human mutational landscapes of normal and cancer conditions are currently available. However, while the unique mutational patterns of tumors have been extensively studied, little attention has been paid to similarities between malignant and normal conditions. Here we compared the pattern of mutations in the mitochondrial genomes (mtDNAs) of cancer (98 sequences) and natural populations (2400 sequences). De novo mtDNA mutations in cancer preferentially colocalized with ancient variants in human phylogeny. A significant portion of the cancer mutations was organized in recurrent combinations (COMs), reaching a length of seven mutations, which also colocalized with ancient variants. Thus, by analyzing similarities rather than differences in patterns of mtDNA mutations in tumor and human evolution, we discovered evidence for similar selective constraints, suggesting a functional potential for these mutations.

  20. Sequential growth of long DNA strands with user-defined patterns for nanostructures and scaffolds

    Science.gov (United States)

    Hamblin, Graham D.; Rahbani, Janane F.; Sleiman, Hanadi F.

    2015-05-01

    DNA strands of well-defined sequence are valuable in synthetic biology and nanostructure assembly. Drawing inspiration from solid-phase synthesis, here we describe a DNA assembly method that uses time, or order of addition, as a parameter to define structural complexity. DNA building blocks are sequentially added with in-situ ligation, then enzymatic enrichment and isolation. This yields a monodisperse, single-stranded long product (for example, 1,000 bases) with user-defined length and sequence pattern. The building blocks can be repeated with different order of addition, giving different DNA patterns. We organize DNA nanostructures and quantum dots using these backbones. Generally, only a small portion of a DNA structure needs to be addressable, while the rest is purely structural. Scaffolds with specifically placed unique sites in a repeating motif greatly minimize the number of components used, while maintaining addressability. This combination of symmetry and site-specific asymmetry within a DNA strand is easily accomplished with our method.

  1. Phylogenetic analysis of the genus Hordeum using repetitive DNA sequences

    DEFF Research Database (Denmark)

    Svitashev, S.; Bryngelsson, T.; Vershinin, A.

    1994-01-01

    A set of six cloned barley (Hordeum vulgare) repetitive DNA sequences was used for the analysis of phylogenetic relationships among 31 species (46 taxa) of the genus Hordeum, using molecular hybridization techniques. In situ hybridization experiments showed dispersed organization of the sequences...... over all chromosomes of H. vulgare and the wild barley species H. bulbosum, H. marinum and H. murinum. Southern blot hybridization revealed different levels of polymorphism among barley species and the RFLP data were used to generate a phylogenetic tree for the genus Hordeum. Our data are in a good...

  2. The implementation of bit-parallelism for DNA sequence alignment

    Science.gov (United States)

    Setyorini; Kuspriyanto; Widyantoro, D. H.; Pancoro, A.

    2017-05-01

    Dynamic Programming (DP) remain the central algorithm of biological sequence alignment. Matching score computation is the most time-consuming process. Bit-parallelism is one of approximate string matching techniques that transform DP matrix cell unit processing into word unit (groups of cell). Bit-parallelism computate the scores column-wise. Adopting from word processing in computer system work, this technique promise reducing time in score computing process in DP matrix. In this paper, we implement bit-parallelism technique for DNA sequence alignment. Our bit-parallelism implementation have less time for score computational process but still need improvement for there construction process.

  3. Evaluation of intra- and interspecific divergence of satellite DNA sequences by nucleotide frequency calculation and pairwise sequence comparison

    Directory of Open Access Journals (Sweden)

    Kato Mikio

    2003-01-01

    Full Text Available Satellite DNA sequences are known to be highly variable and to have been subjected to concerted evolution that homogenizes member sequences within species. We have analyzed the mode of evolution of satellite DNA sequences in four fishes from the genus Diplodus by calculating the nucleotide frequency of the sequence array and the phylogenetic distances between member sequences. Calculation of nucleotide frequency and pairwise sequence comparison enabled us to characterize the divergence among member sequences in this satellite DNA family. The results suggest that the evolutionary rate of satellite DNA in D. bellottii is about two-fold greater than the average of the other three fishes, and that the sequence homogenization event occurred in D. puntazzo more recently than in the others. The procedures described here are effective to characterize mode of evolution of satellite DNA.

  4. Effect of dephasing on DNA sequencing via transverse electronic transport

    Energy Technology Data Exchange (ETDEWEB)

    Zwolak, Michael [Los Alamos National Laboratory; Krems, Matt [NON LANL; Pershin, Yuriy V [NON LANL; Di Ventra, Massimiliano [NON LANL

    2009-01-01

    We study theoretically the effects of dephasing on DNA sequencing in a nanopore via transverse electronic transport. To do this, we couple classical molecular dynamics simulations with transport calculations using scattering theory. Previous studies, which did not include dephasing, have shown that by measuring the transverse current of a particular base multiple times, one can get distributions of currents for each base that are distinguishable. We introduce a dephasing parameter into transport calculations to simulate the effects of the ions and other fluctuations. These effects lower the overall magnitude of the current, but have little effect on the current distributions themselves. The results of this work further implicate that distinguishing DNA bases via transverse electronic transport has potential as a sequencing tool.

  5. Pattern Recognition of mtDNA with Associative Models

    Directory of Open Access Journals (Sweden)

    Acevedo María Elena

    2016-01-01

    Full Text Available In this paper we applied an associative memory for the pattern recognition of mtDNA that can be useful to identify bodies and human remains. In particular, we used both morphological hetroassociative memories: max and min. We process the problem of pattern recognition as a classification task. Our proposal showed a correct recall, we obtained the 100% of recalling of all the learned patterns. We simulated a corrupted sample of mtDNA by adding noise of two types: additive and subtractive. The memory showed a correct recall when we applied less or equal than 55% of both types of noise.

  6. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA

    DEFF Research Database (Denmark)

    Alquezar-Planas, David E; Fordyce, Sarah Louise

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure tha...

  7. DNA immunoprecipitation semiconductor sequencing (DIP-SC-seq) as a rapid method to generate genome wide epigenetic signatures.

    Science.gov (United States)

    Thomson, John P; Fawkes, Angie; Ottaviano, Raffaele; Hunter, Jennifer M; Shukla, Ruchi; Mjoseng, Heidi K; Clark, Richard; Coutts, Audrey; Murphy, Lee; Meehan, Richard R

    2015-05-14

    Modification of DNA resulting in 5-methylcytosine (5 mC) or 5-hydroxymethylcytosine (5hmC) has been shown to influence the local chromatin environment and affect transcription. Although recent advances in next generation sequencing technology allow researchers to map epigenetic modifications across the genome, such experiments are often time-consuming and cost prohibitive. Here we present a rapid and cost effective method of generating genome wide DNA modification maps utilising commercially available semiconductor based technology (DNA immunoprecipitation semiconductor sequencing; "DIP-SC-seq") on the Ion Proton sequencer. Focussing on the 5hmC mark we demonstrate, by directly comparing with alternative sequencing strategies, that this platform can successfully generate genome wide 5hmC patterns from as little as 500 ng of genomic DNA in less than 4 days. Such a method can therefore facilitate the rapid generation of multiple genome wide epigenetic datasets.

  8. Measuring cation dependent DNA polymerase fidelity landscapes by deep sequencing.

    Directory of Open Access Journals (Sweden)

    Bradley Michael Zamft

    Full Text Available High-throughput recording of signals embedded within inaccessible micro-environments is a technological challenge. The ideal recording device would be a nanoscale machine capable of quantitatively transducing a wide range of variables into a molecular recording medium suitable for long-term storage and facile readout in the form of digital data. We have recently proposed such a device, in which cation concentrations modulate the misincorporation rate of a DNA polymerase (DNAP on a known template, allowing DNA sequences to encode information about the local cation concentration. In this work we quantify the cation sensitivity of DNAP misincorporation rates, making possible the indirect readout of cation concentration by DNA sequencing. Using multiplexed deep sequencing, we quantify the misincorporation properties of two DNA polymerases--Dpo4 and Klenow exo(---obtaining the probability and base selectivity of misincorporation at all positions within the template. We find that Dpo4 acts as a DNA recording device for Mn(2+ with a misincorporation rate gain of ∼2%/mM. This modulation of misincorporation rate is selective to the template base: the probability of misincorporation on template T by Dpo4 increases >50-fold over the range tested, while the other template bases are affected less strongly. Furthermore, cation concentrations act as scaling factors for misincorporation: on a given template base, Mn(2+ and Mg(2+ change the overall misincorporation rate but do not alter the relative frequencies of incoming misincorporated nucleotides. Characterization of the ion dependence of DNAP misincorporation serves as the first step towards repurposing it as a molecular recording device.

  9. An automated annotation tool for genomic DNA sequences using GeneScan and BLAST

    Indian Academy of Sciences (India)

    Andrew M. Lynn; Chakresh Kumar Jain; K. Kosalai; Pranjan Barman; Nupur Thakur; Harish Batra; Alok Bhattacharya

    2001-04-01

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated annotation of genome DNA sequences.

  10. Color image encryption scheme using CML and DNA sequence operations.

    Science.gov (United States)

    Wang, Xing-Yuan; Zhang, Hui-Li; Bao, Xue-Mei

    2016-06-01

    In this paper, an encryption algorithm for color images using chaotic system and DNA (Deoxyribonucleic acid) sequence operations is proposed. Three components for the color plain image is employed to construct a matrix, then perform confusion operation on the pixels matrix generated by the spatiotemporal chaos system, i.e., CML (coupled map lattice). DNA encoding rules, and decoding rules are introduced in the permutation phase. The extended Hamming distance is proposed to generate new initial values for CML iteration combining color plain image. Permute the rows and columns of the DNA matrix and then get the color cipher image from this matrix. Theoretical analysis and experimental results prove the cryptosystem secure and practical, and it is suitable for encrypting color images of any size. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  11. Rapid sequencing of DNA based on single-molecule detection

    Science.gov (United States)

    Soper, Steven A.; Davis, Lloyd M.; Fairfield, Frederick R.; Hammond, Mark L.; Harger, Carol A.; Jett, James H.; Keller, Richard A.; Marrone, Babetta L.; Martin, John C.; Nutter, Harvey L.; Shera, E. Brooks; Simpson, Daniel J.

    1991-07-01

    Sequencing the human genome is a major undertaking considering the large number of nucleotides present in the genome and the slow methods currently available to perform the task. The authors have recently reported on a scheme to sequence DNA rapidly using a non-gel based technique. The concept is based upon the incorporation of fluorescently labeled nucleotides into a strand of DNA, isolation and manipulation of a labeled DNA fragment and the detection of single nucleotides using ultra-sensitive laser-induced fluorescence detection following their cleavage from the fragment. Detection of individual fluorophores in the liquid phase was accomplished with time-gated detection following pulsed-laser excitation. The photon bursts from individual rhodamine 6G (R6G) molecules travelling through a laser beam have been observed, as have bursts from single fluorescently modified nucleotides. Using two different biotinylated nucleotides as a model system for fluorescently labeled nucleotides, the authors have observed synthesis of the complementary copy of M13 bacteriophage. Work with fluorescently labeled nucleotides is underway. Individual molecules of DNA attached to a microbead have been observed and manipulated with an epifluorescence microscope.

  12. Human mitochondrial DNA complete amplification and sequencing: a new validated primer set that prevents nuclear DNA sequences of mitochondrial origin co-amplification.

    Science.gov (United States)

    Ramos, Amanda; Santos, Cristina; Alvarez, Luis; Nogués, Ramon; Aluja, Maria Pilar

    2009-05-01

    To date, there are no published primers to amplify the entire mitochondrial DNA (mtDNA) that completely prevent the amplification of nuclear DNA (nDNA) sequences of mitochondrial origin. The main goal of this work was to design, validate and describe a set of primers, to specifically amplify and sequence the complete human mtDNA, allowing the correct interpretation of mtDNA heteroplasmy in healthy and pathological samples. Validation was performed using two different approaches: (i) Basic Local Alignment Search Tool and (ii) amplification using isolated nDNA obtained from sperm cells by differential lyses. During the validation process, two mtDNA regions, with high similarity with nDNA, represent the major problematic areas for primer design. One of these could represent a non-published nuclear DNA sequence of mitochondrial origin. For two of the initially designed fragments, the amplification results reveal PCR artifacts that can be attributed to the poor quality of the DNA. After the validation, nine overlapping primer pairs to perform mtDNA amplification and 22 additional internal primers for mtDNA sequencing were obtained. These primers could be a useful tool in future projects that deal with mtDNA complete sequencing and heteroplasmy detection, since they represent a set of primers that have been tested for the non-amplification of nDNA.

  13. Linguistic isolates in Portugal: insights from the mitochondrial DNA pattern.

    Science.gov (United States)

    Mairal, Quim; Santos, Cristina; Silva, Marina; Marques, Sofia L; Ramos, Amanda; Aluja, Maria Pilar; Amorim, Antonio; Prata, Maria João; Alvarez, Luis

    2013-12-01

    Miranda do Douro, located in the northeastern region of Portugal, has notable characteristics not only from a geographic or naturalistic point of view, but also from a cultural perspective. A remarkable one is the coexistence of two different languages: Portuguese and Mirandese, the second being an Astur-Leonese dialect. The current persistence of the Astur-Leonese dialect in this population falls on the singularity of the region: relative isolation, implying difficulties to communicate with other Portuguese regions, while the same location facilitated the establishment of social and commercial relationships with adjacent Spanish territories, origin of the Astur-Leonese language. The objective of this study was to characterize the population from Miranda through the analysis of maternal lineages in order to evaluate whether its mitochondrial DNA diversity fitted the patterns previously reported for other populations from the Iberian Peninsula. Viewing that, the entire control region of mitochondrial DNA from 121 individuals was examined. Miranda showed a haplogroup composition usual for a Western European population, in the sense that as high as 63.6% of sequences belonged to macro-haplogroup R0. Lineages ascribed to have an African (L2a and L1b) origin, were detected, but reaching an amount commonly found in Portugal. Miranda also presented a few haplogroups typically found in Jewish populations, while rarely observed in other Iberian populations. The finding can be explained by gene flow with crypto-Jew communities that since long are known to be established in the region where Miranda is located. In Miranda, both genetic and nucleotide diversities presented low values (0.9292 ± 0.0180 and 0.01101 ± 0.00614 respectively) when compared to populations from its micro-geographical framework, which constitute a sign of population isolation that certainly provided conditions for the survival of the Astur-Leonese dialect in the region. Copyright © 2013 Elsevier

  14. Airway management in pierre robin sequence: patterns of practice.

    Science.gov (United States)

    Collins, Benjamin; Powitzky, Rosser; Robledo, Candace; Rose, Christopher; Glade, Robert

    2014-05-01

    Objectives : To report survey results from American Cleft Palate-Craniofacial Association members on the practice patterns of airway obstruction management in patients with Pierre Robin sequence. Design : A 10-question online survey was sent and the data were reviewed. Setting : Online survey of members of the American Cleft Palate-Craniofacial Association. Patients : Surveys assessed management patterns of patients with Pierre Robin sequence whom a surgeon member of the American Cleft Palate-Craniofacial Association treated for airway obstruction. Interventions : The survey comprised data on management strategies for airway obstruction in Pierre Robin sequence, including tracheostomy, tongue-lip adhesion, mandibular distraction, and treatments that falls in the "other" category. Results : A total of 87 American Cleft Palate-Craniofacial Association members completed the survey. Respondents' results were analyzed as a whole and by individual subspecialty: plastic surgery (n = 33), oromaxillofacial surgery (n = 21), and otolaryngology (n = 29). Although most of the surgeons were trained to manage airway obstruction in Pierre Robin sequence patients using tracheostomy (47%, n = 39) and tongue-lip adhesion (31%, n = 26), 48% reported a current preference for mandibular distraction (n = 40). Of surgeons who preferred to manage Pierre Robin sequence with tongue-lip adhesion (n = 23), 65% were trained to do so (n = 15). Surgeons preferring mandibular distraction (n = 40) and tracheostomy (n = 14) more often reported they were trained to manage Pierre Robin sequence with tracheostomy. Conclusions : Currently there are various practice patterns for the management of airway obstruction in Pierre Robin sequence. Training habits and subspecialty category may influence a surgeon's preference in patients who fail conservative therapy. Treatment guidelines are lacking and may require significant collaboration among centers and subspecialties to develop a more standardized

  15. Insights into the Genetic Relationships and Breeding Patterns of the African Tea Germplasm (Camellia sinensis (L. O. Kuntze Based on nSSR Markers and cpDNA Sequences

    Directory of Open Access Journals (Sweden)

    Lianming Gao

    2016-08-01

    Full Text Available Africa is one of the key centres of global tea production. Understanding the genetic diversity and relationships of cultivars of African tea is important for future targeted breeding efforts for new crop cultivars, specialty tea processing and to guide germplasm conservation efforts. Despite the economic importance of tea in Africa, no research work has been done so far on its genetic diversity at a continental scale. Twenty-three nSSRs and three plastid DNA regions were used to investigate the genetic diversity, relationships and breeding patterns of tea accessions collected from eight countries in Africa. A total of 280 African tea accessions generated 297 alleles with a mean of 12.91 alleles per locus and a genetic diversity (HS estimate of 0.652. A STRUCTURE analysis suggested two main genetic groups of African tea accessions which corresponded well with the two tea types Camellia sinensis var. sinensis and C. sinensis var. assamica respectively, as well as an admixed mosaic group whose individuals were defined as hybrids of F2 and BC generation with high proportion of C. sinensis var. assamica being maternal parents. Accessions known to be C. sinensis var. assamica further separated into two groups representing the two major tea breeding centres corresponding to southern Africa (Tea Research Foundation of Central Africa, TRFCA and East Africa (Tea Research Foundation of Kenya, TRFK. Tea accessions were shared among countries. African tea has relatively lower genetic diversity. C. sinensis var. assamica is the main tea type under cultivation and contributes more in tea breeding improvements in Africa. International germplasm exchange and movement among countries within Africa was confirmed. The clustering into two main breeding centres, TRFCA and TRFK, suggested that some traits of C. sinensis var. assamica and their associated genes possibly underwent selection during geographic differentiation or local breeding preferences. This study

  16. A DNA sequence alignment algorithm using quality information and a fuzzy inference method

    Institute of Scientific and Technical Information of China (English)

    Kwangbaek Kim; Minhwan Kim; Youngwoon Woo

    2008-01-01

    DNA sequence alignment algorithms in computational molecular biology have been improved by diverse methods.In this paper.We propose a DNA sequence alignment that Uses quality information and a fuzzy inference method developed based on the characteristics of DNA fragments and a fuzzy logic system in order to improve conventional DNA sequence alignment methods that uses DNA sequence quality information.In conventional algorithms.DNA sequence alignment scores are calculated by the global sequence alignment algorithm proposed by Needleman-Wunsch,which is established by using quality information of each DNA fragment.However,there may be errors in the process of calculating DNA sequence alignment scores when the quality of DNA fragment tips is low.because only the overall DNA sequence quality information are used.In our proposed method.an exact DNA sequence alignment can be achieved in spite of the low quality of DNA fragment tips by improvement of conventional algorithms using quality information.Mapping score parameters used to calculate DNA sequence alignment scores are dynamically adjusted by the fuzzy logic system utilizing lengths of DNA fragments and frequencies of low quality DNA bases in the fragments.From the experiments by applying real genome data of National Center for Bioteclmology Information,we could see that the proposed method is more efficient than conventional algorithms.

  17. DNA Amplification and Nucleotide Sequence Determination of a Region of Mitochondrial DNA in the Sea Snake, Laticauda Semifasciata

    OpenAIRE

    Eguchi, Tomoko; Eguchi, Yukinori; Oshiro, Minoru; Asato, Tsuyoshi; Takei, Hiroshi; Nakashima, Yasutsugu

    1993-01-01

    We determined the nucleotide sequence of a region of the 12S ribosomal RNA (rRNA) gene in the mitochondrial DNA (mtDNA) of the sea snake, Laticauda semifasciata, using the polymerase chain reaction (PCR). We synthesized oligonucleotide primers according to the nucleotide sequence of human mt DNA 12S rRNA gene and found that the target sequence (386bp) of the sea snake mtDNA could be amplified with these primers. The nucleotide sequence of the amplified region of the sea snake mt DNA was deter...

  18. Model identification for DNA sequence-structure relationships.

    Science.gov (United States)

    Hawley, Stephen Dwyer; Chiu, Anita; Chizeck, Howard Jay

    2006-11-01

    We investigate the use of algebraic state-space models for the sequence dependent properties of DNA. By considering the DNA sequence as an input signal, rather than using an all atom physical model, computational efficiency is achieved. A challenge in deriving this type of model is obtaining its structure and estimating its parameters. Here we present two candidate model structures for the sequence dependent structural property Slide and a method of encoding the models so that a recursive least squares algorithm can be applied for parameter estimation. These models are based on the assumption that the value of Slide at a base-step is determined by the surrounding tetranucleotide sequence. The first model takes the four bases individually as inputs and has a median root mean square deviation of 0.90 A. The second model takes the four bases pairwise and has a median root mean square deviation of 0.88 A. These values indicate that the accuracy of these models is within the useful range for structure prediction. Performance is comparable to published predictions of a more physically derived model, at significantly less computational cost.

  19. DNA Sequencing via Quantum Mechanics and Machine Learning

    CERN Document Server

    Yuen, Henry; Zhang, Kevin J; Nomura, Ken-ichi; Kalia, Rajiv K; Nakano, Aiichiro; Vashishta, Priya

    2010-01-01

    Rapid sequencing of individual human genome is prerequisite to genomic medicine, where diseases will be prevented by preemptive cures. Quantum-mechanical tunneling through single-stranded DNA in a solid-state nanopore has been proposed for rapid DNA sequencing, but unfortunately the tunneling current alone cannot distinguish the four nucleotides due to large fluctuations in molecular conformation and solvent. Here, we propose a machine-learning approach applied to the tunneling current-voltage (I-V) characteristic for efficient discrimination between the four nucleotides. We first combine principal component analysis (PCA) and fuzzy c-means (FCM) clustering to learn the "fingerprints" of the electronic density-of-states (DOS) of the four nucleotides, which can be derived from the I-V data. We then apply the hidden Markov model and the Viterbi algorithm to sequence a time series of DOS data (i.e., to solve the sequencing problem). Numerical experiments show that the PCA-FCM approach can classify unlabeled DOS ...

  20. DNA sequence chromatogram browsing using JAVA and CORBA.

    Science.gov (United States)

    Parsons, J D; Buehler, E; Hillier, L

    1999-03-01

    DNA sequence chromatograms (traces) are the primary data source for all large-scale genomic and expressed sequence tags (ESTs) sequencing projects. Access to the sequencing trace assists many later analyses, for example contig assembly and polymorphism detection, but obtaining and using traces is problematic. Traces are not collected and published centrally, they are much larger than the base calls derived from them, and viewing them requires the interactivity of a local graphical client with local data. To provide efficient global access to DNA traces, we developed a client/server system based on flexible Java components integrated into other applications including an applet for use in a WWW browser and a stand-alone trace viewer. Client/server interaction is facilitated by CORBA middleware which provides a well-defined interface, a naming service, and location independence. [The software is packaged as a Jar file available from the following URL: http://www.ebi.ac.uk/jparsons. Links to working examples of the trace viewers can be found at http://corba.ebi.ac.uk/EST. All the Washington University mouse EST traces are available for browsing at the same URL.

  1. Artificial intelligence approach in analysis of DNA sequences.

    Science.gov (United States)

    Brézillon, P J; Zaraté, P; Saci, F

    1993-01-01

    We present an approach for designing a knowledge-based system, called Sequence Acquisition In Context (SAIC), that will be able to cooperate with a biologist in the analysis of DNA sequences. The main task of the system is the acquisition of the expert knowledge that the biologist uses for solving ambiguities from gel autoradiograms, with the aim of re-using it later for solving similar ambiguities. The various types of expert knowledge constitute what we call the contextual knowledge of the sequence analysis. Contextual knowledge deals with the unavoidable problems that are common in the study of the living material (eg noise on data, difficulties of observations). Indeed, the analysis of DNA sequences from autoradiograms belongs to an emerging and promising area of investigation, namely reasoning with images. The SAIC project is developed in a theoretical framework that is shared with other applications. Not all tasks have the same importance in each application. We use this observation for designing an intelligent assistant system with three applications. In the SAIC project, we focus on knowledge acquisition, human-computer interaction and explanation. The project will benefit research in the two other applications. We also discuss our SAIC project in the context of large international projects that aim to re-use and share knowledge in a repository.

  2. Genome-wide DNA methylation patterns and transcription analysis in sheep muscle.

    Directory of Open Access Journals (Sweden)

    Christine Couldrey

    Full Text Available DNA methylation plays a central role in regulating many aspects of growth and development in mammals through regulating gene expression. The development of next generation sequencing technologies have paved the way for genome-wide, high resolution analysis of DNA methylation landscapes using methodology known as reduced representation bisulfite sequencing (RRBS. While RRBS has proven to be effective in understanding DNA methylation landscapes in humans, mice, and rats, to date, few studies have utilised this powerful method for investigating DNA methylation in agricultural animals. Here we describe the utilisation of RRBS to investigate DNA methylation in sheep Longissimus dorsi muscles. RRBS analysis of ∼1% of the genome from Longissimus dorsi muscles provided data of suitably high precision and accuracy for DNA methylation analysis, at all levels of resolution from genome-wide to individual nucleotides. Combining RRBS data with mRNAseq data allowed the sheep Longissimus dorsi muscle methylome to be compared with methylomes from other species. While some species differences were identified, many similarities were observed between DNA methylation patterns in sheep and other more commonly studied species. The RRBS data presented here highlights the complexity of epigenetic regulation of genes. However, the similarities observed across species are promising, in that knowledge gained from epigenetic studies in human and mice may be applied, with caution, to agricultural species. The ability to accurately measure DNA methylation in agricultural animals will contribute an additional layer of information to the genetic analyses currently being used to maximise production gains in these species.

  3. Bisulfite sequencing of chromatin immunoprecipitated DNA (BisChIP-seq) directly informs methylation status of histone-modified DNA

    NARCIS (Netherlands)

    Statham, A.L.; Robinson, M.D.; Song, J.Z.; Coolen, M.W.; Stirzaker, C.; Clark, S. J.

    2012-01-01

    The complex relationship between DNA methylation, chromatin modification, and underlying DNA sequence is often difficult to unravel with existing technologies. Here, we describe a novel technique based on high-throughput sequencing of bisulfite-treated chromatin immunoprecipitated DNA (BisChIP-seq),

  4. On-Demand Indexing for Referential Compression of DNA Sequences.

    Directory of Open Access Journals (Sweden)

    Fernando Alves

    Full Text Available The decreasing costs of genome sequencing is creating a demand for scalable storage and processing tools and techniques to deal with the large amounts of generated data. Referential compression is one of these techniques, in which the similarity between the DNA of organisms of the same or an evolutionary close species is exploited to reduce the storage demands of genome sequences up to 700 times. The general idea is to store in the compressed file only the differences between the to-be-compressed and a well-known reference sequence. In this paper, we propose a method for improving the performance of referential compression by removing the most costly phase of the process, the complete reference indexing. Our approach, called On-Demand Indexing (ODI compresses human chromosomes five to ten times faster than other state-of-the-art tools (on average, while achieving similar compression ratios.

  5. Maternal Plasma DNA and RNA Sequencing for Prenatal Testing.

    Science.gov (United States)

    Tamminga, Saskia; van Maarle, Merel; Henneman, Lidewij; Oudejans, Cees B M; Cornel, Martina C; Sistermans, Erik A

    2016-01-01

    Cell-free DNA (cfDNA) testing has recently become indispensable in diagnostic testing and screening. In the prenatal setting, this type of testing is often called noninvasive prenatal testing (NIPT). With a number of techniques, using either next-generation sequencing or single nucleotide polymorphism-based approaches, fetal cfDNA in maternal plasma can be analyzed to screen for rhesus D genotype, common chromosomal aneuploidies, and increasingly for testing other conditions, including monogenic disorders. With regard to screening for common aneuploidies, challenges arise when implementing NIPT in current prenatal settings. Depending on the method used (targeted or nontargeted), chromosomal anomalies other than trisomy 21, 18, or 13 can be detected, either of fetal or maternal origin, also referred to as unsolicited or incidental findings. For various biological reasons, there is a small chance of having either a false-positive or false-negative NIPT result, or no result, also referred to as a "no-call." Both pre- and posttest counseling for NIPT should include discussing potential discrepancies. Since NIPT remains a screening test, a positive NIPT result should be confirmed by invasive diagnostic testing (either by chorionic villus biopsy or by amniocentesis). As the scope of NIPT is widening, professional guidelines need to discuss the ethics of what to offer and how to offer. In this review, we discuss the current biochemical, clinical, and ethical challenges of cfDNA testing in the prenatal setting and its future perspectives including novel applications that target RNA instead of DNA.

  6. Chimeric TALE recombinases with programmable DNA sequence specificity.

    Science.gov (United States)

    Mercer, Andrew C; Gaj, Thomas; Fuller, Roberta P; Barbas, Carlos F

    2012-11-01

    Site-specific recombinases are powerful tools for genome engineering. Hyperactivated variants of the resolvase/invertase family of serine recombinases function without accessory factors, and thus can be re-targeted to sequences of interest by replacing native DNA-binding domains (DBDs) with engineered zinc-finger proteins (ZFPs). However, imperfect modularity with particular domains, lack of high-affinity binding to all DNA triplets, and difficulty in construction has hindered the widespread adoption of ZFPs in unspecialized laboratories. The discovery of a novel type of DBD in transcription activator-like effector (TALE) proteins from Xanthomonas provides an alternative to ZFPs. Here we describe chimeric TALE recombinases (TALERs): engineered fusions between a hyperactivated catalytic domain from the DNA invertase Gin and an optimized TALE architecture. We use a library of incrementally truncated TALE variants to identify TALER fusions that modify DNA with efficiency and specificity comparable to zinc-finger recombinases in bacterial cells. We also show that TALERs recombine DNA in mammalian cells. The TALER architecture described herein provides a platform for insertion of customized TALE domains, thus significantly expanding the targeting capacity of engineered recombinases and their potential applications in biotechnology and medicine.

  7. Scalable lithography from Natural DNA Patterns via polyacrylamide gel

    Science.gov (United States)

    Qu, Jiehao; Hou, Xianliang; Fan, Wanchao; Xi, Guanghui; Diao, Hongyan; Liu, Xiangdon

    2015-12-01

    A facile strategy for fabricating scalable stamps has been developed using cross-linked polyacrylamide gel (PAMG) that controllably and precisely shrinks and swells with water content. Aligned patterns of natural DNA molecules were prepared by evaporative self-assembly on a PMMA substrate, and were transferred to unsaturated polyester resin (UPR) to form a negative replica. The negative was used to pattern the linear structures onto the surface of water-swollen PAMG, and the pattern sizes on the PAMG stamp were customized by adjusting the water content of the PAMG. As a result, consistent reproduction of DNA patterns could be achieved with feature sizes that can be controlled over the range of 40%-200% of the original pattern dimensions. This methodology is novel and may pave a new avenue for manufacturing stamp-based functional nanostructures in a simple and cost-effective manner on a large scale.

  8. Patterns of linkage disequilibrium in mitochondrial DNA of 16 ruminant populations.

    Science.gov (United States)

    Slate, J; Phua, S H

    2003-03-01

    Mitochondrial DNA (mtDNA) is a widely employed molecular tool in phylogeography, in the inference of human evolutionary history, in dating the domestication of livestock and in forensic science. In humans and other vertebrates the popularity of mtDNA can be partially attributed to an assumption of strict maternal inheritance, such that there is no recombination between mitochondrial lineages. The recent demonstration that linkage disequilibrium (LD) declines as a function of distance between polymorphic sites in hominid mitochondrial genomes has been interpreted as evidence of recombination between mtDNA haplotypes, and hence nonclonal inheritance. However, critics of mtDNA recombination have suggested that this association is an artefact of an inappropriate measure of LD or of sequencing error, and subsequent studies of other populations have failed to replicate the initial finding. Here we report the analysis of 16 ruminant populations and present evidence that LD significantly declines with distance in five of them. A meta-analysis of the data indicates a nonsignificant trend of LD declining with distance. Most of the earlier criticisms of patterns between LD and distance in hominid mtDNA are not applicable to this data set. Our results suggest that either ruminant mtDNA is not strictly clonal or that compensatory selection has influenced patterns of variation at closely linked sites within the mitochondrial control region. The potential impact of these processes should be considered when using mtDNA as a tool in vertebrate population genetic, phylogenetic and forensic studies.

  9. Bacterial DNA Sequence Compression Models Using Artificial Neural Networks

    Directory of Open Access Journals (Sweden)

    Armando J. Pinho

    2013-08-01

    Full Text Available It is widely accepted that the advances in DNA sequencing techniques have contributed to an unprecedented growth of genomic data. This fact has increased the interest in DNA compression, not only from the information theory and biology points of view, but also from a practical perspective, since such sequences require storage resources. Several compression methods exist, and particularly, those using finite-context models (FCMs have received increasing attention, as they have been proven to effectively compress DNA sequences with low bits-per-base, as well as low encoding/decoding time-per-base. However, the amount of run-time memory required to store high-order finite-context models may become impractical, since a context-order as low as 16 requires a maximum of 17.2 x 109 memory entries. This paper presents a method to reduce such a memory requirement by using a novel application of artificial neural networks (ANN to build such probabilistic models in a compact way and shows how to use them to estimate the probabilities. Such a system was implemented, and its performance compared against state-of-the art compressors, such as XM-DNA (expert model and FCM-Mx (mixture of finite-context models , as well as with general-purpose compressors. Using a combination of order-10 FCM and ANN, similar encoding results to those of FCM, up to order-16, are obtained using only 17 megabytes of memory, whereas the latter, even employing hash-tables, uses several hundreds of megabytes.

  10. Stability of capillary gels for automated sequencing of DNA.

    Science.gov (United States)

    Swerdlow, H; Dew-Jager, K E; Brady, K; Grey, R; Dovichi, N J; Gesteland, R

    1992-08-01

    Recent interest in capillary gel electrophoresis has been fueled by the Human Genome Project and other large-scale sequencing projects. Advances in gel polymerization techniques and detector design have enabled sequencing of DNA directly in capillaries. Efforts to exploit this technology have been hampered by problems with the reproducibility and stability of gels. Gel instability manifests itself during electrophoresis as a decrease in the current passing through the capillary under a constant voltage. Upon subsequent microscopic examination, bubbles are often visible at or near the injection (cathodic) end of the capillary gel. Gels have been prepared with the polyacrylamide matrix covalently attached to the silica walls of the capillary. These gels, although more stable, still suffer from problems with bubbles. The use of actual DNA sequencing samples also adversely affects gel stability. We examined the mechanisms underlying these disruptive processes by employing polyacrylamide gel-filled capillaries in which the gel was not attached to the capillary wall. Three sources of gel instability were identified. Bubbles occurring in the absence of sample introduction were attributed to electroosmotic force; replacing the denaturant urea with formamide was shown to reduce the frequency of these bubbles. The slow, steady decline in current through capillary sequencing gels interferes with the ability to detect other gel problems. This phenomenon was shown to be a result of ionic depletion at the gel-liquid interface. The decline was ameliorated by adding denaturant and acrylamide monomers to the buffer reservoirs. Sample-induced problems were shown to be due to the presence of template DNA; elimination of the template allowed sample loading to occur without complications.(ABSTRACT TRUNCATED AT 250 WORDS)

  11. Choosing the best heuristic for seeded alignment of DNA sequences

    Directory of Open Access Journals (Sweden)

    Buhler Jeremy

    2006-03-01

    Full Text Available Abstract Background Seeded alignment is an important component of algorithms for fast, large-scale DNA similarity search. A good seed matching heuristic can reduce the execution time of genomic-scale sequence comparison without degrading sensitivity. Recently, many types of seed have been proposed to improve on the performance of traditional contiguous seeds as used in, e.g., NCBI BLASTN. Choosing among these seed types, particularly those that use information besides the presence or absence of matching residue pairs, requires practical guidance based on a rigorous comparison, including assessment of sensitivity, specificity, and computational efficiency. This work performs such a comparison, focusing on alignments in DNA outside widely studied coding regions. Results We compare seeds of several types, including those allowing transition mutations rather than matches at fixed positions, those allowing transitions at arbitrary positions ("BLASTZ" seeds, and those using a more general scoring matrix. For each seed type, we use an extended version of our Mandala seed design software to choose seeds with optimized sensitivity for various levels of specificity. Our results show that, on a test set biased toward alignments of noncoding DNA, transition information significantly improves seed performance, while finer distinctions between different types of mismatches do not. BLASTZ seeds perform especially well. These results depend on properties of our test set that are not shared by EST-based test sets with a strong bias toward coding DNA. Conclusion Practical seed design requires careful attention to the properties of the alignments being sought. For noncoding DNA sequences, seeds that use transition information, especially BLASTZ-style seeds, are particularly useful. The Mandala seed design software can be found at http://www.cse.wustl.edu/~yanni/mandala/.

  12. Complete genome sequence of mitochondrial DNA (mtDNA) of Chlorella sorokiniana.

    Science.gov (United States)

    Orsini, Massimiliano; Costelli, Cristina; Malavasi, Veronica; Cusano, Roberto; Concas, Alessandro; Angius, Andrea; Cao, Giacomo

    2016-01-01

    The complete sequence of mitochondrial genome of the Chlorella sorokiniana strain (SAG 111-8 k) is presented in this work. Within the Chlorella genus, it represents the second species with a complete sequenced and annotated mitochondrial genome (GenBank accession no. KM241869). The genome consists of circular chromosomes of 52,528 bp and encodes a total of 31 protein coding genes, 3 rRNAs and 26 tRNAs. The overall AT contents of the C. sorokiniana mtDNA is 70.89%, while the coding sequence is of 97.4%.

  13. Interspecific "common" repetitive DNA sequences in salamanders of the genus Plethodon.

    Science.gov (United States)

    Mizuno, S; Andrews, C; Macgregor, H C

    1976-10-12

    Intermediate repetitive sequences of Plethodon cinereus which comprised about 30% of the genomic DNA were isolated and iodinated with 125I. About 5% of the 125I-repetitive fraction hybridized with a large excess of DNA from P. dunni at Cot 20. About half of the 125I-DNA in the hybrids was resistant to extensive digestion with S-1 nuclease. The average molecular size of the S-1 nuclease-resistant fraction was about 100 nucleotide pairs. The melting temperature of the S-1 nuclease-resistant fraction was about 2 degrees lower than that of the corresponding fraction made with P. cinereus DNA. These results are taken to indicate the presence in the genomes of P. cinereus and P. dunni of evolutionarily stable "common" repetitive sequences. The average frequency of repetition of the common repetitive sequences is about 6,000 X in both species. The common repetitive fraction is also present in the genomes of other species of Plethodon, although the general populations of intermediate repetitive sequences are markedly different from one species to another. The cinereus--dunni common repetitive sequences could not be detected in plethodontids belonging to different tribes, nor in more distantly related amphibians. The profiles of binding of the common repetitive sequences to CsCl or CS2SO4-Ag+ density gradient fractions of P. dunni DNA suggested that these sequences consisted of heterogeneous components with respect to base compositions, and that they did not include large amounts of the genes for ribosomal RNA, 5S RNA, 4S RNA, or histone messenger RNA. In situ hybridization of the 3H-labelled intermediate repetitive sequences of P. cinereus to male meiotic chromosomes of the same species gave autoradiographs after an exposure of seven days showing all 14 chromosomes labelled. The pattern of labelling appeared not to be random, but was impossible to analyse on account of the irregular shapes and different degrees of stretching of diplotene and prometaphase chromosomes. In

  14. Highly Iterated Palindromic Sequences (HIPs and Their Relationship to DNA Methyltransferases

    Directory of Open Access Journals (Sweden)

    Jeff Elhai

    2015-03-01

    Full Text Available The sequence GCGATCGC (Highly Iterated Palindrome, HIP1 is commonly found in high frequency in cyanobacterial genomes. An important clue to its function may be the presence of two orphan DNA methyltransferases that recognize internal sequences GATC and CGATCG. An examination of genomes from 97 cyanobacteria, both free-living and obligate symbionts, showed that there are exceptional cases in which HIP1 is at a low frequency or nearly absent. In some of these cases, it appears to have been replaced by a different GC-rich palindromic sequence, alternate HIPs. When HIP1 is at a high frequency, GATC- and CGATCG-specific methyltransferases are generally present in the genome. When an alternate HIP is at high frequency, a methyltransferase specific for that sequence is present. The pattern of 1-nt deviations from HIP1 sequences is biased towards the first and last nucleotides, i.e., those distinguish CGATCG from HIP1. Taken together, the results point to a role of DNA methylation in the creation or functioning of HIP sites. A model is presented that postulates the existence of a GmeC-dependent mismatch repair system whose activity creates and maintains HIP sequences.

  15. Sequencing of mitochondrial HV1 and HV2 DNA with length heteroplasmy

    DEFF Research Database (Denmark)

    Rasmussen, E. Michael; Eriksen, Birthe; Larsen, Hans Jakob

    2003-01-01

    This study presents a fast method for sequencing the poly C/G regions in HV1 and HV2 in the mitochondrial DNA (mtDNA)......This study presents a fast method for sequencing the poly C/G regions in HV1 and HV2 in the mitochondrial DNA (mtDNA)...

  16. Sequence-dpenedent DNA separation by anion-exchange high-performance liquid chromatography

    Energy Technology Data Exchange (ETDEWEB)

    Yamakawa, Hisashi; Higashino, Ken-ich; Ohara, Osamu [Kazusa DNA Research Inst., Chiba (Japan)

    1996-09-05

    High-performance liquid chromatography (HPLC) system with a new nonporous anion-exchange resin, DNA-NPR, made it possible to rapidly separate DNA fragments up to 20 kbp with high resolution. In order to further characterize this chromatographic DNA separation system, we prepared a mixtures of double-stranded DNAs of constant length carrying a fully degenerated 50-bp region and analyzed their chromatographic behavior on the DNA-NPR column. The results indicated that the separation of DNA fragments on the anion-exchange HPLC was governed not only by size, but also by nucleotide sequence: even DNA fragments with the same size and the same base content could be separated on this column. Taking advantage of this characteristic feature of the anion-exchange HPLC, we could readily fractionate human cDNAs with practically acceptable recovery and high resolution. Furthermore, the combination of HPLC and gel electrophoresis realized separation of a mixture of DNA fragments in a two-dimensional pattern. 22 refs., 5 figs., 1 tab.

  17. Distinguishing forest and savanna African elephants using short nuclear DNA sequences.

    Science.gov (United States)

    Ishida, Yasuko; Demeke, Yirmed; van Coeverden de Groot, Peter J; Georgiadis, Nicholas J; Leggett, Keith E A; Fox, Virginia E; Roca, Alfred L

    2011-01-01

    A more complete description of African elephant phylogeography would require a method that distinguishes forest and savanna elephants using DNA from low-quality samples. Although mitochondrial DNA is often the marker of choice for species identification, the unusual cytonuclear patterns in African elephants make nuclear markers more reliable. We therefore designed and utilized genetic markers for short nuclear DNA regions that contain fixed nucleotide differences between forest and savanna elephants. We used M13 forward and reverse sequences to increase the total length of PCR amplicons and to improve the quality of sequences for the target DNA. We successfully sequenced fragments of nuclear genes from dung samples of known savanna and forest elephants in the Democratic Republic of Congo, Ethiopia, and Namibia. Elephants at previously unexamined locations were found to have nucleotide character states consistent with their status as savanna or forest elephants. Using these and results from previous studies, we estimated that the short-amplicon nuclear markers could distinguish forest from savanna African elephants with more than 99% accuracy. Nuclear genotyping of museum, dung, or ivory samples will provide better-informed conservation management of Africa's elephants.

  18. Developmentally programmed excision of internal DNA sequences in Paramecium aurelia.

    Science.gov (United States)

    Gratias, A; Bétermier, M

    2001-01-01

    The development of a new somatic nucleus (macronucleus) during sexual reproduction of the ciliate Paramecium aurelia involves reproducible chromosomal rearrangements that affect the entire germline genome. Macronuclear development can be induced experimentally, which makes P. aurelia an attractive model for the study of the mechanism and the regulation of DNA rearrangements. Two major types of rearrangements have been identified: the fragmentation of the germline chromosomes, followed by the formation of the new macronuclear chromosome ends in association with imprecise DNA elimination, and the precise excision of internal eliminated sequences (IESs). All IESs identified so far are short, A/T rich and non-coding elements. They are flanked by a direct repeat of a 5'-TA-3' dinucleotide, a single copy of which remains at the macronuclear junction after excision. The number of these single-copy sequences has been estimated to be around 60,000 per haploid genome. This review focuses on the current knowledge about the genetic and epigenetic determinants of IES elimination in P. aurelia, the analysis of excision products, and the tightly regulated timing of excision throughout macronuclear development. Several models for the molecular mechanism of IES excision will be discussed in relation to those proposed for DNA elimination in other ciliates.

  19. Fragmentation of contaminant and endogenous DNA in ancient samples determined by shotgun sequencing; prospects for human palaeogenomics.

    Directory of Open Access Journals (Sweden)

    Marc García-Garcerà

    Full Text Available BACKGROUND: Despite the successful retrieval of genomes from past remains, the prospects for human palaeogenomics remain unclear because of the difficulty of distinguishing contaminant from endogenous DNA sequences. Previous sequence data generated on high-throughput sequencing platforms indicate that fragmentation of ancient DNA sequences is a characteristic trait primarily arising due to depurination processes that create abasic sites leading to DNA breaks. METHODOLOGY/PRINCIPALS FINDINGS: To investigate whether this pattern is present in ancient remains from a temperate environment, we have 454-FLX pyrosequenced different samples dated between 5,500 and 49,000 years ago: a bone from an extinct goat (Myotragus balearicus that was treated with a depurinating agent (bleach, an Iberian lynx bone not subjected to any treatment, a human Neolithic sample from Barcelona (Spain, and a Neandertal sample from the El Sidrón site (Asturias, Spain. The efficiency of retrieval of endogenous sequences is below 1% in all cases. We have used the non-human samples to identify human sequences (0.35 and 1.4%, respectively, that we positively know are contaminants. CONCLUSIONS: We observed that bleach treatment appears to create a depurination-associated fragmentation pattern in resulting contaminant sequences that is indistinguishable from previously described endogenous sequences. Furthermore, the nucleotide composition pattern observed in 5' and 3' ends of contaminant sequences is much more complex than the flat pattern previously described in some Neandertal contaminants. Although much research on samples with known contaminant histories is needed, our results suggest that endogenous and contaminant sequences cannot be distinguished by the fragmentation pattern alone.

  20. Patterns of DNA structural polymorphism and their evolutionary implications.

    Science.gov (United States)

    Keene, M A; Elgin, S C

    1984-01-01

    The pattern of sites within purified DNA that are highly susceptible to double-stranded cleavage by micrococcal nuclease has been analyzed in the vicinity of over 20 genes from widely separated loci in Drosophila. These genes have uniformly exhibited a distinctive organization of cleavage sites such that at early times of digestion major sites are observed in the spacer regions surrounding the genes, but not within the protein coding regions themselves. Examples examined include Drosophila genes for heat-shock proteins, cytoplasmic actin, ribosomal protein 49, alcohol dehydrogenase, Sgs 4 glue protein, and other developmentally regulated transcripts, a human beta-globin gene, and mouse alpha 3-globin pseudogene. It seems probable that this gene/spacer pattern will be a general one in the genomes of eucaryotes, but not in the genomes of procaryotes, since neither pBR322 nor phage lambda DNA display such a pattern. One observes a nonrandom spacing of strong cleavage sites in Drosophila DNA, with the most frequent intervals being 195 bp and 411 bp. Such a pattern of variation in DNA structure may have evolved to facilitate the packaging of eucaryotic DNA into chromatin.

  1. An intragenic distribution bias of DNA uptake sequences in Pasteurellaceae and Neisseriae

    Directory of Open Access Journals (Sweden)

    van Passel Mark WJ

    2008-03-01

    Full Text Available Abstract Most sequenced strains from Pasteurellaceae and Neisseriae contain hundreds to thousands of uptake sequence (US motifs in their genome, which are associated with natural competence for DNA uptake. The mechanism of their recognition is still unclear, and I searched for intragenic location patterns of these motifs for clues about their distribution. In all cases, one orientation of the US has a higher occurrence in the reading frame, and in all Pasteurellaceae, the US and the reverse complement motifs are biased towards the gene termini. These findings could help design experimental set-ups to study preferential DNA uptake, thereby further unravelling the phenomenon of natural competence. Reviewers This article was reviewed by Arcady Mushegian and I. King Jordan.

  2. Significance of satellite DNA revealed by conservation of a widespread repeat DNA sequence among angiosperms.

    Science.gov (United States)

    Mehrotra, Shweta; Goel, Shailendra; Raina, Soom Nath; Rajpal, Vijay Rani

    2014-08-01

    The analysis of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of plant nuclear DNA. In the present study, we analyzed the nature of pCtKpnI-I and pCtKpnI-II tandem repeated sequences, reported earlier in Carthamus tinctorius. Interestingly, homolog of pCtKpnI-I repeat sequence was also found to be present in widely divergent families of angiosperms. pCtKpnI-I showed high sequence similarity but low copy number among various taxa of different families of angiosperms analyzed. In comparison, pCtKpnI-II was specific to the genus Carthamus and was not present in any other taxa analyzed. The molecular structure of pCtKpnI-I was analyzed in various unrelated taxa of angiosperms to decipher the evolutionary conserved nature of the sequence and its possible functional role.

  3. Genetic variability of Taenia saginata inferred from mitochondrial DNA sequences.

    Science.gov (United States)

    Rostami, Sima; Salavati, Reza; Beech, Robin N; Babaei, Zahra; Sharbatkhori, Mitra; Harandi, Majid Fasihi

    2015-04-01

    Taenia saginata is an important tapeworm, infecting humans in many parts of the world. The present study was undertaken to identify inter- and intraspecific variation of T. saginata isolated from cattle in different parts of Iran using two mitochondrial CO1 and 12S rRNA genes. Up to 105 bovine specimens of T. saginata were collected from 20 slaughterhouses in three provinces of Iran. DNA were extracted from the metacestode Cysticercus bovis. After PCR amplification, sequencing of CO1 and 12S rRNA genes were carried out and two phylogenetic analyses of the sequence data were generated by Bayesian inference on CO1 and 12S rRNA sequences. Sequence analyses of CO1 and 12S rRNA genes showed 11 and 29 representative profiles respectively. The level of pairwise nucleotide variation between individual haplotypes of CO1 gene was 0.3-2.4% while the overall nucleotide variation among all 11 haplotypes was 4.6%. For 12S rRNA sequence data, level of pairwise nucleotide variation was 0.2-2.5% and the overall nucleotide variation was determined as 5.8% among 29 haplotypes of 12S rRNA gene. Considerable genetic diversity was found in both mitochondrial genes particularly in 12S rRNA gene.

  4. DNA sequencing technology, walking with modular primers. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Ulanovsky, L.

    1996-12-31

    The success of the Human Genome Project depends on the development of adequate technology for rapid and inexpensive DNA sequencing, which will also benefit biomedical research in general. The authors are working on DNA technologies that eliminate primer synthesis, the main bottleneck in sequencing by primer walking. They have developed modular primers that are assembled from three 5-mer, 6-mer or 7-mer modules selected from a presynthesized library of as few as 1,000 oligonucleotides ({double_bond}4, {double_bond}5, {double_bond}7). The three modules anneal contiguously at the selected template site and prime there uniquely, even though each is not unique for the most part when used alone. This technique is expected to speed up primer walking 30 to 50 fold, and reduce the sequencing cost by a factor of 5 to 15. Time and expensive will be saved on primer synthesis itself and even more so due to closed-loop automation of primer walking, made possible by the instant availability of primers. Apart from saving time and cost, closed-loop automation would also minimize the errors and complications associated with human intervention between the walks. The author has also developed two additional approaches to primer-library based sequencing. One involves a branched structure of modular primers which has a distinctly different mechanism of achieving priming specificity. The other introduces the concept of ``Differential Extension with Nucleotide Subsets`` as an approach increasing priming specificity, priming strength and allowing cycle sequencing. These approaches are expected to be more robust than the original version of the modular primer technique.

  5. Isolation and analysis of high quality nuclear DNA with reduced organellar DNA for plant genome sequencing and resequencing

    Directory of Open Access Journals (Sweden)

    Zdepski Anna

    2011-05-01

    Full Text Available Abstract Background High throughput sequencing (HTS technologies have revolutionized the field of genomics by drastically reducing the cost of sequencing, making it feasible for individual labs to sequence or resequence plant genomes. Obtaining high quality, high molecular weight DNA from plants poses significant challenges due to the high copy number of chloroplast and mitochondrial DNA, as well as high levels of phenolic compounds and polysaccharides. Multiple methods have been used to isolate DNA from plants; the CTAB method is commonly used to isolate total cellular DNA from plants that contain nuclear DNA, as well as chloroplast and mitochondrial DNA. Alternatively, DNA can be isolated from nuclei to minimize chloroplast and mitochondrial DNA contamination. Results We describe optimized protocols for isolation of nuclear DNA from eight different plant species encompassing both monocot and eudicot species. These protocols use nuclei isolation to minimize chloroplast and mitochondrial DNA contamination. We also developed a protocol to determine the number of chloroplast and mitochondrial DNA copies relative to the nuclear DNA using quantitative real time PCR (qPCR. We compared DNA isolated from nuclei to total cellular DNA isolated with the CTAB method. As expected, DNA isolated from nuclei consistently yielded nuclear DNA with fewer chloroplast and mitochondrial DNA copies, as compared to the total cellular DNA prepared with the CTAB method. This protocol will allow for analysis of the quality and quantity of nuclear DNA before starting a plant whole genome sequencing or resequencing experiment. Conclusions Extracting high quality, high molecular weight nuclear DNA in plants has the potential to be a bottleneck in the era of whole genome sequencing and resequencing. The methods that are described here provide a framework for researchers to extract and quantify nuclear DNA in multiple types of plants.

  6. DNA Sequencing as a Tool to Monitor Marine Ecological Status

    Directory of Open Access Journals (Sweden)

    Kelly D. Goodwin

    2017-05-01

    Full Text Available Many ocean policies mandate integrated, ecosystem-based approaches to marine monitoring, driving a global need for efficient, low-cost bioindicators of marine ecological quality. Most traditional methods to assess biological quality rely on specialized expertise to provide visual identification of a limited set of specific taxonomic groups, a time-consuming process that can provide a narrow view of ecological status. In addition, microbial assemblages drive food webs but are not amenable to visual inspection and thus are largely excluded from detailed inventory. Molecular-based assessments of biodiversity and ecosystem function offer advantages over traditional methods and are increasingly being generated for a suite of taxa using a “microbes to mammals” or “barcodes to biomes” approach. Progress in these efforts coupled with continued improvements in high-throughput sequencing and bioinformatics pave the way for sequence data to be employed in formal integrated ecosystem evaluation, including food web assessments, as called for in the European Union Marine Strategy Framework Directive. DNA sequencing of bioindicators, both traditional (e.g., benthic macroinvertebrates, ichthyoplankton and emerging (e.g., microbial assemblages, fish via eDNA, promises to improve assessment of marine biological quality by increasing the breadth, depth, and throughput of information and by reducing costs and reliance on specialized taxonomic expertise.

  7. 18S ribosomal DNA sequences provide insight into the phylogeny of patellogastropod limpets (Mollusca: Gastropoda).

    Science.gov (United States)

    Yoon, Sook Hee; Kim, Won

    2007-02-28

    To investigate the phylogeny of Patellogastropoda, the complete 18S rDNA sequences of nine patellogastropod limpets Cymbula canescens (Gmelin, 1791), Helcion dunkeri (Krauss, 1848), Patella rustica Linnaeus, 1758, Cellana toreuma (Reeve, 1855), Cellana nigrolineata (Reeve, 1854), Nacella magellanica Gmelin, 1791, Nipponacmea concinna (Lischke, 1870), Niveotectura pallida (Gould, 1859), and Lottia dorsuosa Gould, 1859 were determined. These sequences were then analyzed along with the published 18S rDNA sequences of 35 gastropods, one bivalve, and one chiton species. Phylogenetic trees were constructed by maximum parsimony, maximum likelihood, and Bayesian inference. The results of our 18S rDNA sequence analysis strongly support the monophyly of Patellogastropoda and the existence of three subgroups. Of these, two subgroups, the Patelloidea and Acmaeoidea, are closely related, with branching patterns that can be summarized as [(Cymbula + Helcion) + Patella] and [(Nipponacmea + Lottia) + Niveotectura]. The remaining subgroup, Nacelloidea, emerges as basal and paraphyletic, while its genus Cellana is monophyletic. Our analysis also indicates that the Patellogastropoda have a sister relationship with the order Cocculiniformia within the Gastropoda.

  8. Molecular phylogeny and evolution of Scomber (Teleostei: Scombridae) based on mitochondrial and nuclear DNA sequences

    Institute of Scientific and Technical Information of China (English)

    CHENG Jiao; GAO Tianxiang; MIAO Zhenqing; YANAGIMOTO Takashi

    2011-01-01

    A molecular phylogenetic analysis of the genus Scomber was conducted based on mitochondrial (COI, Cyt b and control region) and nuclear (5S rDNA) DNA sequence data in multigene perspective. A variety of phylogenetic analytic methods were used to clarify the current taxonomic classification and to assess phylogenetic relationships and the evolutionary history of this genus. The present study produced a well-resolved phylogeny that strongly supported the monophyly of Scomber. We confirmed that S. japonicus and S. colias were genetically distinct. Although morphologically and ecologically similar to S. colias, the molecular data showed that S. japonicus has a greater molecular affinity with S. australasicus, which conflicts with the traditional taxonomy. This phyiogenetic pattern was corroborated by the mtDNA data, but incompletely by the nuclear DNA data. Phylogenetic concordance between the mitochondrial and nuclear DNA regions for the basal nodes supports an Atlantic origin for Scomber. The present-day geographic ranges of the species were compared with the resultant molecular phylogeny derived from partition Bayesian analyses of the combined data sets to evaluate possible dispersal routes of the genus. The present-day geographic distribution of Scomber species might be best ascribed to multiple dispersal events. In addition, our results suggest that phylogenies derived from multiple genes and long sequences exhibited improved phylogenetic resolution, from which we conclude that the phylogenetic reconstruction is a reliable representation of the evolutionary history of Scomber.

  9. Discovering motifs in ranked lists of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Eran Eden

    2007-03-01

    Full Text Available Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP-chip (chromatin immuno-precipitation on a microarray measurements. Several major challenges in sequence motif discovery still require consideration: (i the need for a principled approach to partitioning the data into target and background sets; (ii the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii the need for an appropriate framework for accounting for motif multiplicity; (iv the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs, which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP-chip and CpG methylation data and obtained the following results. (i Identification of 50 novel putative transcription factor (TF binding sites in yeast ChIP-chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked

  10. A MapReduce Framework for DNA Sequencing Data Processing

    Directory of Open Access Journals (Sweden)

    Samy Ghoneimy

    2016-12-01

    Full Text Available Genomics and Next Generation Sequencers (NGS like Illumina Hiseq produce data in the order of ‎‎200 billion base pairs in a single one-week run for a 60x human genome coverage, which ‎requires modern high-throughput experimental technologies that can ‎only be tackled with high performance computing (HPC and specialized software algorithms called ‎‎“short read aligners”. This paper focuses on the implementation of the DNA sequencing as a set of MapReduce programs that will accept a DNA data set as a FASTQ file and finally generate a VCF (variant call format file, which has variants for a given DNA data set. In this paper MapReduce/Hadoop along with Burrows-Wheeler Aligner (BWA, Sequence Alignment/Map (SAM ‎tools, are fully utilized to provide various utilities for manipulating alignments, including sorting, merging, indexing, ‎and generating alignments. The Map-Sort-Reduce process is designed to be suited for a Hadoop framework in ‎which each cluster is a traditional N-node Hadoop cluster to utilize all of the Hadoop features like HDFS, program ‎management and fault tolerance. The Map step performs multiple instances of the short read alignment algorithm ‎‎(BoWTie that run in parallel in Hadoop. The ordered list of the sequence reads are used as input tuples and the ‎output tuples are the alignments of the short reads. In the Reduce step many parallel instances of the Short ‎Oligonucleotide Analysis Package for SNP (SOAPsnp algorithm run in the cluster. Input tuples are sorted ‎alignments for a partition and the output tuples are SNP calls. Results are stored via HDFS, and then archived in ‎SOAPsnp format. ‎ The proposed framework enables extremely fast discovering somatic mutations, inferring population genetical ‎parameters, and performing association tests directly based on sequencing data without explicit genotyping or ‎linkage-based imputation. It also demonstrate that this method achieves comparable

  11. DNA-Origami-Driven Lithography for Patterning on Gold Surfaces with Sub-10 nm Resolution.

    Science.gov (United States)

    Gállego, Isaac; Manning, Brendan; Prades, Joan Daniel; Mir, Mònica; Samitier, Josep; Eritja, Ramon

    2017-03-01

    Sub-10 nm lithography of DNA patterns is achieved using the DNA-origami stamping method. This new strategy utilizes DNA origami to bind a preprogrammed DNA ink pattern composed of thiol-modified oligonucleotides on gold surfaces. Upon denaturation of the DNA origami, the DNA ink pattern is exposed. The pattern can then be developed by hybridization with complementary strands carrying gold nanoparticles. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. PCR master mixes harbour murine DNA sequences. Caveat emptor!

    Directory of Open Access Journals (Sweden)

    Philip W Tuke

    Full Text Available BACKGROUND: XMRV is the most recently described retrovirus to be found in Man, firstly in patients with prostate cancer (PC and secondly in 67% of patients with chronic fatigue syndrome (CFS and 3.7% of controls. Both disease associations remain contentious. Indeed, a recent publication has concluded that "XMRV is unlikely to be a human pathogen". Subsequently related but different polytropic MLV (pMLV sequences were also reported from the blood of 86.5% of patients with CFS. and 6.8% of controls. Consequently we decided to investigate blood donors for evidence of XMRV/pMLV. METHODOLOGY/PRINCIPAL FINDINGS: Testing of cDNA prepared from the whole blood of 80 random blood donors, generated gag PCR signals from two samples (7C and 9C. These had previously tested negative for XMRV by two other PCR based techniques. To test whether the PCR mix was the source of these sequences 88 replicates of water were amplified using Invitrogen Platinum Taq (IPT and Applied Biosystems Taq Gold LD (ABTG. Four gag sequences (2D, 3F, 7H, 12C were generated with the IPT, a further sequence (12D by ABTG re-amplification of an IPT first round product. Sequence comparisons revealed remarkable similarities between these sequences, endogeous MLVs and the pMLV sequences reported in patients with CFS. CONCLUSIONS/SIGNIFICANCE: Methodologies for the detection of viruses highly homologous to endogenous murine viruses require special caution as the very reagents used in the detection process can be a source of contamination and at a level where it is not immediately apparent. It is suggested that such contamination is likely to explain the apparent presence of pMLV in CFS.

  13. Methylome-wide Sequencing Detects DNA Hypermethylation Distinguishing Indolent from Aggressive Prostate Cancer

    Directory of Open Access Journals (Sweden)

    Jeffrey M. Bhasin

    2015-12-01

    Full Text Available A critical need in understanding the biology of prostate cancer is characterizing the molecular differences between indolent and aggressive cases. Because DNA methylation can capture the regulatory state of tumors, we analyzed differential methylation patterns genome-wide among benign prostatic tissue and low-grade and high-grade prostate cancer and found extensive, focal hypermethylation regions unique to high-grade disease. These hypermethylation regions occurred not only in the promoters of genes but also in gene bodies and at intergenic regions that are enriched for DNA-protein binding sites. Integration with existing RNA-sequencing (RNA-seq and survival data revealed regions where DNA methylation correlates with reduced gene expression associated with poor outcome. Regions specific to aggressive disease are proximal to genes with distinct functions from regions shared by indolent and aggressive disease. Our compendium of methylation changes reveals crucial molecular distinctions between indolent and aggressive prostate cancer.

  14. Mitochondrial DNA sequence analysis of patients with 'atypical psychosis'.

    Science.gov (United States)

    Kazuno, An-A; Munakata, Kae; Mori, Kanako; Tanaka, Masashi; Nanko, Shinichiro; Kunugi, Hiroshi; Umekage, Tadashi; Tochigi, Mamoru; Kohda, Kazuhisa; Sasaki, Tsukasa; Akiyama, Tsuyoshi; Washizuka, Shinsuke; Kato, Nobumasa; Kato, Tadafumi

    2005-08-01

    Although classical psychopathological studies have shown the presence of an independent diagnostic category, 'atypical psychosis', most psychotic patients are currently classified into two major diagnostic categories, schizophrenia and bipolar disorder, by the Diagnostic and Statistical Manual of Mental Disorders (4th edn; DSM-IV) criteria. 'Atypical psychosis' is characterized by acute confusion without systematic delusion, emotional instability, and psychomotor excitement or stupor. Such clinical features resemble those seen in organic mental syndrome, and differential diagnosis is often difficult. Because patients with mitochondrial myopathy, encephalopathy, lactic acidosis, and stroke-like episodes (MELAS) sometimes show organic mental disorder, 'atypical psychosis' may be caused by mutations of mitochondrial DNA (mtDNA) in some patients. In the present study whole mtDNA was sequenced for seven patients with various psychotic disorders, who could be categorized as 'atypical psychosis'. None of them had known mtDNA mutations pathogenic for mitochondrial encephalopathy. Two of seven patients belonged to a subhaplogroup F1b1a with low frequency. These results did not support the hypothesis that clinical presentation of some patients with 'atypical psychosis' is a reflection of subclinical mitochondrial encephalopathy. However, the subhaplogroup F1b1a may be a good target for association study of 'atypical psychosis'.

  15. Retroviral DNA Sequences as a Means for Determining Ancient Diets.

    Directory of Open Access Journals (Sweden)

    Jessica I Rivera-Perez

    Full Text Available For ages, specialists from varying fields have studied the diets of the primeval inhabitants of our planet, detecting diet remains in archaeological specimens using a range of morphological and biochemical methods. As of recent, metagenomic ancient DNA studies have allowed for the comparison of the fecal and gut microbiomes associated to archaeological specimens from various regions of the world; however the complex dynamics represented in those microbial communities still remain unclear. Theoretically, similar to eukaryote DNA the presence of genes from key microbes or enzymes, as well as the presence of DNA from viruses specific to key organisms, may suggest the ingestion of specific diet components. In this study we demonstrate that ancient virus DNA obtained from coprolites also provides information reconstructing the host's diet, as inferred from sequences obtained from pre-Columbian coprolites. This depicts a novel and reliable approach to determine new components as well as validate the previously suggested diets of extinct cultures and animals. Furthermore, to our knowledge this represents the first description of the eukaryotic viral diversity found in paleofaeces belonging to pre-Columbian cultures.

  16. Peptide Synthesis on a Next-Generation DNA Sequencing Platform.

    Science.gov (United States)

    Svensen, Nina; Peersen, Olve B; Jaffrey, Samie R

    2016-09-01

    Methods for displaying large numbers of peptides on solid surfaces are essential for high-throughput characterization of peptide function and binding properties. Here we describe a method for converting the >10(7) flow cell-bound clusters of identical DNA strands generated by the Illumina DNA sequencing technology into clusters of complementary RNA, and subsequently peptide clusters. We modified the flow-cell-bound primers with ribonucleotides thus enabling them to be used by poliovirus polymerase 3D(pol) . The primers hybridize to the clustered DNA thus leading to RNA clusters. The RNAs fold into functional protein- or small molecule-binding aptamers. We used the mRNA-display approach to synthesize flow-cell-tethered peptides from these RNA clusters. The peptides showed selective binding to cognate antibodies. The methods described here provide an approach for using DNA clusters to template peptide synthesis on an Illumina flow cell, thus providing new opportunities for massively parallel peptide-based assays.

  17. S-sequence patterned illumination iterative photoacoustic tomography.

    Science.gov (United States)

    Harrison, Tyler; Shao, Peng; Zemp, Roger J

    2014-09-01

    Quantitatively reconstructing optical absorption using photoacoustic imaging is nontrivial. Theoretical hurdles, such as nonuniqueness and numerical instability, can be mitigated by using multiple illuminations. However, even with multiple illuminations, using ANSI-safety-limited fluence for practical imaging may result in poor performance owing to limited signal-to-noise ratio (SNR). We demonstrate the use of S-sequence coded patterned illumination to boost SNR while preserving the enhanced stability of multiple-illumination iterative techniques.

  18. Entropy and long-range correlations in DNA sequences.

    Science.gov (United States)

    Melnik, S S; Usatenko, O V

    2014-12-01

    We analyze the structure of DNA molecules of different organisms by using the additive Markov chain approach. Transforming nucleotide sequences into binary strings, we perform statistical analysis of the corresponding "texts". We develop the theory of N-step additive binary stationary ergodic Markov chains and analyze their differential entropy. Supposing that the correlations are weak we express the conditional probability function of the chain by means of the pair correlation function and represent the entropy as a functional of the pair correlator. Since the model uses two point correlators instead of probability of block occurring, it makes possible to calculate the entropy of subsequences at much longer distances than with the use of the standard methods. We utilize the obtained analytical result for numerical evaluation of the entropy of coarse-grained DNA texts. We believe that the entropy study can be used for biological classification of living species.

  19. Field guide to next-generation DNA sequencers.

    Science.gov (United States)

    Glenn, Travis C

    2011-09-01

    The diversity of available 2(nd) and 3(rd) generation DNA sequencing platforms is increasing rapidly. Costs for these systems range from $10/Mb for 454 and some Ion Torrent chips). In terms of cost per nonmultiplexed sample and instrument run time, the Pacific Biosciences and Ion Torrent platforms excel, with the 454 GS Junior and Illumina MiSeq also notable in this regard. All platforms allow multiplexing of samples, but details of library preparation, experimental design and data analysis can constrain the options. The wide range of characteristics among available platforms provides opportunities both to conduct groundbreaking studies and to waste money on scales that were previously infeasible. Thus, careful thought about the desired characteristics of these systems is warranted before purchasing or using any of them. Updated information from this guide will be maintained at: http://dna.uga.edu/ and http://tomato.biol.trinity.edu/blog/. © 2011 Blackwell Publishing Ltd.

  20. Sequence characterization of a human embryonic craniofacial cDNA library

    Energy Technology Data Exchange (ETDEWEB)

    Padanilam, B.J.; Barsel, S.; Solursh, M. [and others

    1994-09-01

    Broad-based sequencing approaches for the characterization of human cDNA libraries have proven successful in identifying large numbers of novel genes of specific tissue or developmental stages. To pursue our interests in human craniofacial development, stages. To pursue our interests in human craniofacial development, we have made use of both subtracted and unsubtracted cDNA libraries constructed from embryonic craniofacial tissue obtained from pooled samples at 42-54 days gestation. Single-pass sequencing was carried out using an ABI automated sequencer and T3 or T7 primers. Sequences were characterized using BLAST and GRAIL, and the identified homologous sequences grouped according to gene class and family. Four genes have been mapped using repeat sequence elements identified in the clones. Using primers developed from sequence data, other genes are being mapped using a panel of somatic cell hybrids. To date, a total of 786 sequences have been returned with 35% identifying no homologies, and 35% with strong homologies to previously identified genes. A number of genes previously identified to play a role in human embryonic development have been returned from the sequence comparisons providing evidence that the library is representative of this tissue and stage of development. Previous characterization of the library has also identified a number of novel embryonically expressed human homeobox genes. Genes felt to be of special relevance based on their homology to characterized genes known to play a role in development or that are members of novel classes but with high scores on GRAIL searches are being characterized using whole mount in situ hybridization with mouse embryos. Characterization of the library with respect to chromosomal mapping, gene types and make-up, and embryonic expression patterns will be presented.

  1. Photocatalytic probing of DNA sequence by using TiO{sub 2}/dopamine-DNA triads

    Energy Technology Data Exchange (ETDEWEB)

    Liu Jianqin [Center for Nanoscale Materials, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439 (United States); Garza, Linda de la [Chemistry Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439 (United States); Zhang Ligang; Dimitrijevic, Nada M. [Center for Nanoscale Materials, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439 (United States); Zuo Xiaobing; Tiede, David M. [Chemistry Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439 (United States); Rajh, Tijana [Center for Nanoscale Materials, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439 (United States); Chemistry Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439 (United States)], E-mail: rajh@anl.gov

    2007-10-15

    A method to control charge transfer reaction in DNA using hybrid nanometer-sized TiO{sub 2} nanoparticles was developed. In this system extended charge separation reflects the sequence of DNA and was measured using metallic silver deposition or by photocurrent response. Light-induced extended charge separation in these systems was found to be dependent on the DNA-bridge length and sequence. The yield of photocatalytic deposition of silver was studied in systems having GG accepting sites imbedded in AT runs at varying distances from the TiO{sub 2} nanoparticle surface. Weak distance dependence of charge separation indicative of a hole hopping through mediating adenine (A) sites was found. The quantum yield of silver deposition in the system having a GG accepting site placed 8.5 A from the nanoparticle surface was found to be {phi} = 0.70 (70%) and {phi} = 0.56 (56%) for (A){sub n} and (AT){sub n/2} bridge, respectively. Hole injection to GG trapping sites as far as 70 A from a nanoparticle surface in the absence of G hopping sites was measured. Introduction of G hopping sites increased the efficiency of hole injection. The efficiency of photocatalytic deposition of metallic silver was found to be sensitive to the presence of a single nucleobase mismatch in the DNA sequence.

  2. Next-generation DNA barcoding: using next-generation sequencing to enhance and accelerate DNA barcode capture from single specimens.

    Science.gov (United States)

    Shokralla, Shadi; Gibson, Joel F; Nikbakht, Hamid; Janzen, Daniel H; Hallwachs, Winnie; Hajibabaei, Mehrdad

    2014-09-01

    DNA barcoding is an efficient method to identify specimens and to detect undescribed/cryptic species. Sanger sequencing of individual specimens is the standard approach in generating large-scale DNA barcode libraries and identifying unknowns. However, the Sanger sequencing technology is, in some respects, inferior to next-generation sequencers, which are capable of producing millions of sequence reads simultaneously. Additionally, direct Sanger sequencing of DNA barcode amplicons, as practiced in most DNA barcoding procedures, is hampered by the need for relatively high-target amplicon yield, coamplification of nuclear mitochondrial pseudogenes, confusion with sequences from intracellular endosymbiotic bacteria (e.g. Wolbachia) and instances of intraindividual variability (i.e. heteroplasmy). Any of these situations can lead to failed Sanger sequencing attempts or ambiguity of the generated DNA barcodes. Here, we demonstrate the potential application of next-generation sequencing platforms for parallel acquisition of DNA barcode sequences from hundreds of specimens simultaneously. To facilitate retrieval of sequences obtained from individual specimens, we tag individual specimens during PCR amplification using unique 10-mer oligonucleotides attached to DNA barcoding PCR primers. We employ 454 pyrosequencing to recover full-length DNA barcodes of 190 specimens using 12.5% capacity of a 454 sequencing run (i.e. two lanes of a 16 lane run). We obtained an average of 143 sequence reads for each individual specimen. The sequences produced are full-length DNA barcodes for all but one of the included specimens. In a subset of samples, we also detected Wolbachia, nontarget species, and heteroplasmic sequences. Next-generation sequencing is of great value because of its protocol simplicity, greatly reduced cost per barcode read, faster throughout and added information content.

  3. Indirect readout of DNA sequence by p22 repressor: roles of DNA and protein functional groups in modulating DNA conformation.

    Science.gov (United States)

    Harris, Lydia-Ann; Watkins, Derrick; Williams, Loren Dean; Koudelka, Gerald B

    2013-01-09

    The repressor of bacteriophage P22 (P22R) discriminates between its various DNA binding sites by sensing the identity of non-contacted base pairs at the center of its binding site. The "indirect readout" of these non-contacted bases is apparently based on DNA's sequence-dependent conformational preferences. The structures of P22R-DNA complexes indicate that the non-contacted base pairs at the center of the binding site are in the B' state. This finding suggests that indirect readout and therefore binding site discrimination depend on P22R's ability to either sense and/or impose the B' state on the non-contacted bases of its binding sites. We show here that the affinity of binding sites for P22R depends on the tendency of the central bases to assume the B'-DNA state. Furthermore, we identify functional groups in the minor groove of the non-contacted bases as the essential modulators of indirect readout by P22R. In P22R-DNA complexes, the negatively charged E44 and E48 residues are provocatively positioned near the negatively charged DNA phosphates of the non-contacted nucleotides. The close proximity of the negatively charged groups on protein and DNA suggests that electrostatics may play a key role in the indirect readout process. Changing either of two negatively charged residues to uncharged residues eliminates the ability of P22R to impose structural changes on DNA and to recognize non-contacted base sequence. These findings suggest that these negatively charged amino acids function to force the P22R-bound DNA into the B' state and therefore play a key role in indirect readout by P22R.

  4. Mitochondrial DNA sequence variation in the Anatolian Peninsula (Turkey)

    Indian Academy of Sciences (India)

    Hatice Mergen; Reyhan Öner; Cihan Öner

    2004-04-01

    Throughout human history, the region known today as the Anatolian peninsula (Turkey) has served as a junction connecting the Middle East, Europe and Central Asia, and, thus, has been subject to major population movements. The present study is undertaken to obtain information about the distribution of the existing mitochondrial D-loop sequence variations in the Turkish population of Anatolia. A few studies have previously reported mtDNA sequences in Turks. We attempted to extend these results by analysing a cohort that is not only larger, but also more representative of the Turkish population living in Anatolia. In order to obtain a descriptive picture for the phylogenetic distribution of the mitochondrial genome within Turkey, we analysed mitochondrial D-loop region sequence variations in 75 individuals from different parts of Anatolia by direct sequencing. Analysis of the two hypervariable segments within the noncoding region of the mitochondrial genome revealed the existence of 81 nucleotide mutations at 79 sites. The neighbour-joining tree of Kimura’s distance matrix has revealed the presence of six main clusters, of which H and U are the most common. The data obtained are also compared with several European and Turkic Central Asian populations.

  5. Bacterial repetitive extragenic palindromic sequences are DNA targets for Insertion Sequence elements

    Directory of Open Access Journals (Sweden)

    Pareja Eduardo

    2006-03-01

    Full Text Available Abstract Background Mobile elements are involved in genomic rearrangements and virulence acquisition, and hence, are important elements in bacterial genome evolution. The insertion of some specific Insertion Sequences had been associated with repetitive extragenic palindromic (REP elements. Considering that there are a sufficient number of available genomes with described REPs, and exploiting the advantage of the traceability of transposition events in genomes, we decided to exhaustively analyze the relationship between REP sequences and mobile elements. Results This global multigenome study highlights the importance of repetitive extragenic palindromic elements as target sequences for transposases. The study is based on the analysis of the DNA regions surrounding the 981 instances of Insertion Sequence elements with respect to the positioning of REP sequences in the 19 available annotated microbial genomes corresponding to species of bacteria with reported REP sequences. This analysis has allowed the detection of the specific insertion into REP sequences for ISPsy8 in Pseudomonas syringae DC3000, ISPa11 in P. aeruginosa PA01, ISPpu9 and ISPpu10 in P. putida KT2440, and ISRm22 and ISRm19 in Sinorhizobium meliloti 1021 genome. Preference for insertion in extragenic spaces with REP sequences has also been detected for ISPsy7 in P. syringae DC3000, ISRm5 in S. meliloti and ISNm1106 in Neisseria meningitidis MC58 and Z2491 genomes. Probably, the association with REP elements that we have detected analyzing genomes is only the tip of the iceberg, and this association could be even more frequent in natural isolates. Conclusion Our findings characterize REP elements as hot spots for transposition and reinforce the relationship between REP sequences and genomic plasticity mediated by mobile elements. In addition, this study defines a subset of REP-recognizer transposases with high target selectivity that can be useful in the development of new tools for

  6. Noncontinuously binding loop-out primers for avoiding problematic DNA sequences in PCR and sanger sequencing.

    Science.gov (United States)

    Sumner, Kelli; Swensen, Jeffrey J; Procter, Melinda; Jama, Mohamed; Wooderchak-Donahue, Whitney; Lewis, Tracey; Fong, Michael; Hubley, Lindsey; Schwarz, Monica; Ha, Youna; Paul, Eleri; Brulotte, Benjamin; Lyon, Elaine; Bayrak-Toydemir, Pinar; Mao, Rong; Pont-Kingdon, Genevieve; Best, D Hunter

    2014-09-01

    We present a method in which noncontinuously binding (loop-out) primers are used to exclude regions of DNA that typically interfere with PCR amplification and/or analysis by Sanger sequencing. Several scenarios were tested using this design principle, including M13-tagged PCR primers, non-M13-tagged PCR primers, and sequencing primers. With this technique, a single oligonucleotide is designed in two segments that flank, but do not include, a short region of problematic DNA sequence. During PCR amplification or sequencing, the problematic region is looped-out from the primer binding site, where it does not interfere with the reaction. Using this method, we successfully excluded regions of up to 46 nucleotides. Loop-out primers were longer than traditional primers (27 to 40 nucleotides) and had higher melting temperatures. This method allows the use of a standardized PCR protocol throughout an assay, keeps the number of PCRs to a minimum, reduces the chance for laboratory error, and, above all, does not interrupt the clinical laboratory workflow.

  7. Autoantigenic proteins that bind recombinogenic sequences in Epstein-Barr virus and cellular DNA.

    OpenAIRE

    1994-01-01

    We have identified conserved autoantigenic cellular proteins that bind to G-rich sequence motifs in recombinogenic regions of Epstein-Barr virus (EBV) DNA. This binding activity, called TRBP, recognizes the EBV terminal repeats, a locus responsible for interconversion of linear and circular EBV DNA. We found that TRBP also binds to EBV DNA sequences involved in deletion of EBNA2, a gene product required for immortalization. We show that TRBP binds sequences present in repetitive cellular DNA,...

  8. Two Dimensional Yau-Hausdorff Distance with Applications on Comparison of DNA and Protein Sequences.

    Directory of Open Access Journals (Sweden)

    Kun Tian

    Full Text Available Comparing DNA or protein sequences plays an important role in the functional analysis of genomes. Despite many methods available for sequences comparison, few methods retain the information content of sequences. We propose a new approach, the Yau-Hausdorff method, which considers all translations and rotations when seeking the best match of graphical curves of DNA or protein sequences. The complexity of this method is lower than that of any other two dimensional minimum Hausdorff algorithm. The Yau-Hausdorff method can be used for measuring the similarity of DNA sequences based on two important tools: the Yau-Hausdorff distance and graphical representation of DNA sequences. The graphical representations of DNA sequences conserve all sequence information and the Yau-Hausdorff distance is mathematically proved as a true metric. Therefore, the proposed distance can preciously measure the similarity of DNA sequences. The phylogenetic analyses of DNA sequences by the Yau-Hausdorff distance show the accuracy and stability of our approach in similarity comparison of DNA or protein sequences. This study demonstrates that Yau-Hausdorff distance is a natural metric for DNA and protein sequences with high level of stability. The approach can be also applied to similarity analysis of protein sequences by graphic representations, as well as general two dimensional shape matching.

  9. Two Dimensional Yau-Hausdorff Distance with Applications on Comparison of DNA and Protein Sequences.

    Science.gov (United States)

    Tian, Kun; Yang, Xiaoqian; Kong, Qin; Yin, Changchuan; He, Rong L; Yau, Stephen S-T

    2015-01-01

    Comparing DNA or protein sequences plays an important role in the functional analysis of genomes. Despite many methods available for sequences comparison, few methods retain the information content of sequences. We propose a new approach, the Yau-Hausdorff method, which considers all translations and rotations when seeking the best match of graphical curves of DNA or protein sequences. The complexity of this method is lower than that of any other two dimensional minimum Hausdorff algorithm. The Yau-Hausdorff method can be used for measuring the similarity of DNA sequences based on two important tools: the Yau-Hausdorff distance and graphical representation of DNA sequences. The graphical representations of DNA sequences conserve all sequence information and the Yau-Hausdorff distance is mathematically proved as a true metric. Therefore, the proposed distance can preciously measure the similarity of DNA sequences. The phylogenetic analyses of DNA sequences by the Yau-Hausdorff distance show the accuracy and stability of our approach in similarity comparison of DNA or protein sequences. This study demonstrates that Yau-Hausdorff distance is a natural metric for DNA and protein sequences with high level of stability. The approach can be also applied to similarity analysis of protein sequences by graphic representations, as well as general two dimensional shape matching.

  10. Exact Tandem Repeats Analyzer (E-TRA): A new program for DNA sequence mining

    Indian Academy of Sciences (India)

    Mehmet Karaca; Mehmet Bilgen; A. Naci Onus; Ayse Gul Ince; Safinaz Y. Elmasulu

    2005-04-01

    Exact Tandem Repeats Analyzer 1.0 (E-TRA) combines sequence motif searches with keywords such as ‘organs’, ‘tissues’, ‘cell lines’ and ‘development stages’ for finding simple exact tandem repeats as well as non-simple repeats. E-TRA has several advanced repeat search parameters/options compared to other repeat finder programs as it not only accepts GenBank, FASTA and expressed sequence tags (EST) sequence files, but also does analysis of multiple files with multiple sequences. The minimum and maximum tandem repeat motif lengths that E-TRA finds vary from one to one thousand. Advanced user defined parameters/options let the researchers use different minimum motif repeats search criteria for varying motif lengths simultaneously. One of the most interesting features of genomes is the presence of relatively short tandem repeats (TRs). These repeated DNA sequences are found in both prokaryotes and eukaryotes, distributed almost at random throughout the genome. Some of the tandem repeats play important roles in the regulation of gene expression whereas others do not have any known biological function as yet. Nevertheless, they have proven to be very beneficial in DNA profiling and genetic linkage analysis studies. To demonstrate the use of E-TRA, we used 5,465,605 human EST sequences derived from 18,814,550 GenBank EST sequences. Our results indicated that 12.44% (679,800) of the human EST sequences contained simple and non-simple repeat string patterns varying from one to 126 nucleotides in length. The results also revealed that human organs, tissues, cell lines and different developmental stages differed in number of repeats as well as repeat composition, indicating that the distribution of expressed tandem repeats among tissues or organs are not random, thus differing from the un-transcribed repeats found in genomes.

  11. Estimation of a Killer Whale (Orcinus orca Population's Diet Using Sequencing Analysis of DNA from Feces.

    Directory of Open Access Journals (Sweden)

    Michael J Ford

    Full Text Available Estimating diet composition is important for understanding interactions between predators and prey and thus illuminating ecosystem function. The diet of many species, however, is difficult to observe directly. Genetic analysis of fecal material collected in the field is therefore a useful tool for gaining insight into wild animal diets. In this study, we used high-throughput DNA sequencing to quantitatively estimate the diet composition of an endangered population of wild killer whales (Orcinus orca in their summer range in the Salish Sea. We combined 175 fecal samples collected between May and September from five years between 2006 and 2011 into 13 sample groups. Two known DNA composition control groups were also created. Each group was sequenced at a ~330bp segment of the 16s gene in the mitochondrial genome using an Illumina MiSeq sequencing system. After several quality controls steps, 4,987,107 individual sequences were aligned to a custom sequence database containing 19 potential fish prey species and the most likely species of each fecal-derived sequence was determined. Based on these alignments, salmonids made up >98.6% of the total sequences and thus of the inferred diet. Of the six salmonid species, Chinook salmon made up 79.5% of the sequences, followed by coho salmon (15%. Over all years, a clear pattern emerged with Chinook salmon dominating the estimated diet early in the summer, and coho salmon contributing an average of >40% of the diet in late summer. Sockeye salmon appeared to be occasionally important, at >18% in some sample groups. Non-salmonids were rarely observed. Our results are consistent with earlier results based on surface prey remains, and confirm the importance of Chinook salmon in this population's summer diet.

  12. Imprinted genes show unique patterns of sequence conservation

    Directory of Open Access Journals (Sweden)

    Helms Volkhard

    2010-11-01

    Full Text Available Abstract Background Genomic imprinting is an evolutionary conserved mechanism of epigenetic gene regulation in placental mammals that results in silencing of one of the parental alleles. In order to decipher interactions between allele-specific DNA methylation of imprinted genes and evolutionary conservation, we performed a genome-wide comparative investigation of genomic sequences and highly conserved elements of imprinted genes in human and mouse. Results Evolutionarily conserved elements in imprinted regions differ from those associated with autosomal genes in various ways. Whereas for maternally expressed genes strong divergence of protein-encoding sequences is most prominent, paternally expressed genes exhibit substantial conservation of coding and noncoding sequences. Conserved elements in imprinted regions are marked by enrichment of CpG dinucleotides and low (TpG+CpA/(2·CpG ratios indicate reduced CpG deamination. Interestingly, paternally and maternally expressed genes can be distinguished by differences in G+C and CpG contents that might be associated with unusual epigenetic features. Especially noncoding conserved elements of paternally expressed genes are exceptionally G+C and CpG rich. In addition, we confirmed a frequent occurrence of intronic CpG islands and observed a decelerated degeneration of ancient LINE-1 repeats. We also found a moderate enrichment of YY1 and CTCF binding sites in imprinted regions and identified several short sequence motifs in highly conserved elements that might act as additional regulatory elements. Conclusions We discovered several novel conserved DNA features that might be related to allele-specific DNA methylation. Our results hint at reduced CpG deamination rates in imprinted regions, which affects mostly noncoding conserved elements of paternally expressed genes. Pronounced differences between maternally and paternally expressed genes imply specific modes of evolution as a result of differences in

  13. Complete genome sequence of chloroplast DNA (cpDNA) of Chlorella sorokiniana.

    Science.gov (United States)

    Orsini, Massimiliano; Cusano, Roberto; Costelli, Cristina; Malavasi, Veronica; Concas, Alessandro; Angius, Andrea; Cao, Giacomo

    2016-01-01

    The complete chloroplast genome sequence of Chlorella sorokiniana strain (SAG 111-8 k) is presented in this study. The genome consists of circular chromosomes of 109,811 bp, which encode a total of 109 genes, including 74 proteins, 3 rRNAs and 31 tRNAs. Moreover, introns are not detected and all genes are present in single copy. The overall AT contents of the C. sorokiniana cpDNA is 65.9%, the coding sequence is 59.1% and a large inverted repeat (IR) is not observed.

  14. Distribution patterns of postmortem damage in human mitochondrial DNA

    DEFF Research Database (Denmark)

    Gilbert, M Thomas P; Willerslev, Eske; Hansen, Anders J

    2002-01-01

    The distribution of postmortem damage in mitochondrial DNA retrieved from 37 ancient human DNA samples was analyzed by cloning and was compared with a selection of published animal data. A relative rate of damage (rho(v)) was calculated for nucleotide positions within the human hypervariable region......, such as MT5, have lower in vivo mutation rates and lower postmortem-damage rates. The postmortem data also identify a possible functional subregion of the HVR1, termed "low-diversity 1," through the lack of sequence damage. The amount of postmortem damage observed in mitochondrial coding regions...

  15. Statistical methods for detecting periodic fragments in DNA sequence data

    Directory of Open Access Journals (Sweden)

    Ying Hua

    2011-04-01

    Full Text Available Abstract Background Period 10 dinucleotides are structurally and functionally validated factors that influence the ability of DNA to form nucleosomes, histone core octamers. Robust identification of periodic signals in DNA sequences is therefore required to understand nucleosome organisation in genomes. While various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail and confirmatory testing for a priori specified periods has not been developed. Results We compared the estimation accuracy and suitability for confirmatory testing of autocorrelation, discrete Fourier transform (DFT, integer period discrete Fourier transform (IPDFT and a previously proposed Hybrid measure. A number of different statistical significance procedures were evaluated but a blockwise bootstrap proved superior. When applied to synthetic data whose period-10 signal had been eroded, or for which the signal was approximately period-10, the Hybrid technique exhibited superior properties during exploratory period estimation. In contrast, confirmatory testing using the blockwise bootstrap procedure identified IPDFT as having the greatest statistical power. These properties were validated on yeast sequences defined from a ChIP-chip study where the Hybrid metric confirmed the expected dominance of period-10 in nucleosome associated DNA but IPDFT identified more significant occurrences of period-10. Application to the whole genomes of yeast and mouse identified ~ 21% and ~ 19% respectively of these genomes as spanned by period-10 nucleosome positioning sequences (NPS. Conclusions For estimating the dominant period, we find the Hybrid period estimation method empirically to be the most effective for both eroded and approximate periodicity. The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of

  16. DNA Methylation Profiling Reveals Correlation of Differential Methylation Patterns with Gene Expression in Human Epilepsy.

    Science.gov (United States)

    Wang, Liang; Fu, Xinwei; Peng, Xi; Xiao, Zheng; Li, Zhonggui; Chen, Guojun; Wang, Xuefeng

    2016-05-01

    DNA methylation plays important roles in regulating gene expression and has been reported to be related with epilepsy. This study aimed to define differential DNA methylation patterns in drug-refractory epilepsy patients and to investigate the role of DNA methylation in human epilepsy. We performed DNA methylation profiling in brain tissues from epileptic and control patients via methylated-cytosine DNA immunoprecipitation microarray chip. Differentially methylated loci were validated by bisulfite sequencing PCR, and the messenger RNA (mRNA) levels of candidate genes were evaluated by reverse transcriptase PCR. We found 224 genes that showed differential DNA methylation between epileptic patients and controls. Among the seven candidate genes, three genes (TUBB2B, ATPGD1, and HTR6) showed relative transcriptional regulation by DNA methylation. TUBB2B and ATPGD1 exhibited hypermethylation and decreased mRNA levels, whereas HTR6 displayed hypomethylation and increased mRNA levels in the epileptic samples. Our findings suggest that certain genes become differentially regulated by DNA methylation in human epilepsy.

  17. Effect of ionic strength and cationic DNA affinity binders on the DNA sequence selective alkylation of guanine N7-positions by nitrogen mustards

    Energy Technology Data Exchange (ETDEWEB)

    Hartley, J.A.; Forrow, S.M.; Souhami, R.L. (Univ. College and Middlesex School of Medicine, London (England))

    1990-03-27

    Large variations in alkylation intensities exist among guanines in a DNA sequence following treatment with chemotherapeutic alkylating agents such as nitrogen mustards, and the substituent attached to the reactive group can impose a distinct sequence preference for reaction. In order to understand further the structural and electrostatic factors which determine the sequence selectivity of alkylation reactions, the effect of increase ionic strength, the intercalator ethidium bromide, AT-specific minor groove binders distamycin A and netropsin, and the polyamine spermine on guanine N7-alkylation by L-phenylalanine mustard (L-Pam), uracil mustard (UM), and quinacrine mustard (QM) was investigated with a modification of the guanine-specific chemical cleavage technique for DNA sequencing. The result differed with both the nitrogen mustard and the cationic agent used. The effect, which resulted in both enhancement and suppression of alkylation sites, was most striking in the case of netropsin and distamycin A, which differed from each other. DNA footprinting indicated that selective binding to AT sequences in the minor groove of DNA can have long-range effects on the alkylation pattern of DNA in the major groove.

  18. HLA DNA sequence variation among human populations: molecular signatures of demographic and selective events.

    Directory of Open Access Journals (Sweden)

    Stéphane Buhler

    Full Text Available Molecular differences between HLA alleles vary up to 57 nucleotides within the peptide binding coding region of human Major Histocompatibility Complex (MHC genes, but it is still unclear whether this variation results from a stochastic process or from selective constraints related to functional differences among HLA molecules. Although HLA alleles are generally treated as equidistant molecular units in population genetic studies, DNA sequence diversity among populations is also crucial to interpret the observed HLA polymorphism. In this study, we used a large dataset of 2,062 DNA sequences defined for the different HLA alleles to analyze nucleotide diversity of seven HLA genes in 23,500 individuals of about 200 populations spread worldwide. We first analyzed the HLA molecular structure and diversity of these populations in relation to geographic variation and we further investigated possible departures from selective neutrality through Tajima's tests and mismatch distributions. All results were compared to those obtained by classical approaches applied to HLA allele frequencies.Our study shows that the global patterns of HLA nucleotide diversity among populations are significantly correlated to geography, although in some specific cases the molecular information reveals unexpected genetic relationships. At all loci except HLA-DPB1, populations have accumulated a high proportion of very divergent alleles, suggesting an advantage of heterozygotes expressing molecularly distant HLA molecules (asymmetric overdominant selection model. However, both different intensities of selection and unequal levels of gene conversion may explain the heterogeneous mismatch distributions observed among the loci. Also, distinctive patterns of sequence divergence observed at the HLA-DPB1 locus suggest current neutrality but old selective pressures on this gene. We conclude that HLA DNA sequences advantageously complement HLA allele frequencies as a source of data used

  19. Anti-DNA antibodies: Sequencing, cloning, and expression

    Energy Technology Data Exchange (ETDEWEB)

    Barry, M.M.

    1992-01-01

    To gain some insight into the mechanism of systemic lupus erythematosus, and the interactions involved in proteins binding to DNA four anti-DNA antibodies have been investigated. Two of the antibodies, Hed 10 and Jel 242, have previously been prepared from female NZB/NZW mice which develop an autoimmune disease resembling human SLE. The remaining two antibodies, Jel 72 and Jel 318, have previously been produced via immunization of C57BL/6 mice. The isotypes of the four antibodies investigated in this thesis were determined by an enzyme-linked-immunosorbent assay. All four antibodies contained [kappa] light chains and [gamma]2a heavy chains except Jel 318 which contains a [gamma]2b heavy chain. The complete variable regions of the heavy and light chains of these four antibodies were sequenced from their respective mRNAs. The gene segments and variable gene families expressed in each antibody were identified. Analysis of the genes used in the autoimmune anti-DNA antibodies and those produced by immunization indicated no obvious differences to account for their different origins. Examination of the amino acid residues present in the complementary-determining regions of these four antibodies indicates a preference for aromatic amino acids. Jel 72 and Jel 242 contain three arginine residues in the third complementary-determining region. A single-chain Fv and the variable region of the heavy chain of Hed 10 were expressed in Escherichia coli. Expression resulted in the production of a 26,000 M[sub r] protein and a 15,000 M[sub r] protein. An immunoblot indicated that the 26,000 M[sub r] protein was the Fv for Hed 10, while the 15,000 M[sub r] protein was shown to bind poly (dT). The contribution of the heavy chain to DNA binding was assessed.

  20. The DNA sequence of the human X chromosome.

    Science.gov (United States)

    Ross, Mark T; Grafham, Darren V; Coffey, Alison J; Scherer, Steven; McLay, Kirsten; Muzny, Donna; Platzer, Matthias; Howell, Gareth R; Burrows, Christine; Bird, Christine P; Frankish, Adam; Lovell, Frances L; Howe, Kevin L; Ashurst, Jennifer L; Fulton, Robert S; Sudbrak, Ralf; Wen, Gaiping; Jones, Matthew C; Hurles, Matthew E; Andrews, T Daniel; Scott, Carol E; Searle, Stephen; Ramser, Juliane; Whittaker, Adam; Deadman, Rebecca; Carter, Nigel P; Hunt, Sarah E; Chen, Rui; Cree, Andrew; Gunaratne, Preethi; Havlak, Paul; Hodgson, Anne; Metzker, Michael L; Richards, Stephen; Scott, Graham; Steffen, David; Sodergren, Erica; Wheeler, David A; Worley, Kim C; Ainscough, Rachael; Ambrose, Kerrie D; Ansari-Lari, M Ali; Aradhya, Swaroop; Ashwell, Robert I S; Babbage, Anne K; Bagguley, Claire L; Ballabio, Andrea; Banerjee, Ruby; Barker, Gary E; Barlow, Karen F; Barrett, Ian P; Bates, Karen N; Beare, David M; Beasley, Helen; Beasley, Oliver; Beck, Alfred; Bethel, Graeme; Blechschmidt, Karin; Brady, Nicola; Bray-Allen, Sarah; Bridgeman, Anne M; Brown, Andrew J; Brown, Mary J; Bonnin, David; Bruford, Elspeth A; Buhay, Christian; Burch, Paula; Burford, Deborah; Burgess, Joanne; Burrill, Wayne; Burton, John; Bye, Jackie M; Carder, Carol; Carrel, Laura; Chako, Joseph; Chapman, Joanne C; Chavez, Dean; Chen, Ellson; Chen, Guan; Chen, Yuan; Chen, Zhijian; Chinault, Craig; Ciccodicola, Alfredo; Clark, Sue Y; Clarke, Graham; Clee, Chris M; Clegg, Sheila; Clerc-Blankenburg, Kerstin; Clifford, Karen; Cobley, Vicky; Cole, Charlotte G; Conquer, Jen S; Corby, Nicole; Connor, Richard E; David, Robert; Davies, Joy; Davis, Clay; Davis, John; Delgado, Oliver; Deshazo, Denise; Dhami, Pawandeep; Ding, Yan; Dinh, Huyen; Dodsworth, Steve; Draper, Heather; Dugan-Rocha, Shannon; Dunham, Andrew; Dunn, Matthew; Durbin, K James; Dutta, Ireena; Eades, Tamsin; Ellwood, Matthew; Emery-Cohen, Alexandra; Errington, Helen; Evans, Kathryn L; Faulkner, Louisa; Francis, Fiona; Frankland, John; Fraser, Audrey E; Galgoczy, Petra; Gilbert, James; Gill, Rachel; Glöckner, Gernot; Gregory, Simon G; Gribble, Susan; Griffiths, Coline; Grocock, Russell; Gu, Yanghong; Gwilliam, Rhian; Hamilton, Cerissa; Hart, Elizabeth A; Hawes, Alicia; Heath, Paul D; Heitmann, Katja; Hennig, Steffen; Hernandez, Judith; Hinzmann, Bernd; Ho, Sarah; Hoffs, Michael; Howden, Phillip J; Huckle, Elizabeth J; Hume, Jennifer; Hunt, Paul J; Hunt, Adrienne R; Isherwood, Judith; Jacob, Leni; Johnson, David; Jones, Sally; de Jong, Pieter J; Joseph, Shirin S; Keenan, Stephen; Kelly, Susan; Kershaw, Joanne K; Khan, Ziad; Kioschis, Petra; Klages, Sven; Knights, Andrew J; Kosiura, Anna; Kovar-Smith, Christie; Laird, Gavin K; Langford, Cordelia; Lawlor, Stephanie; Leversha, Margaret; Lewis, Lora; Liu, Wen; Lloyd, Christine; Lloyd, David M; Loulseged, Hermela; Loveland, Jane E; Lovell, Jamieson D; Lozado, Ryan; Lu, Jing; Lyne, Rachael; Ma, Jie; Maheshwari, Manjula; Matthews, Lucy H; McDowall, Jennifer; McLaren, Stuart; McMurray, Amanda; Meidl, Patrick; Meitinger, Thomas; Milne, Sarah; Miner, George; Mistry, Shailesh L; Morgan, Margaret; Morris, Sidney; Müller, Ines; Mullikin, James C; Nguyen, Ngoc; Nordsiek, Gabriele; Nyakatura, Gerald; O'Dell, Christopher N; Okwuonu, Geoffery; Palmer, Sophie; Pandian, Richard; Parker, David; Parrish, Julia; Pasternak, Shiran; Patel, Dina; Pearce, Alex V; Pearson, Danita M; Pelan, Sarah E; Perez, Lesette; Porter, Keith M; Ramsey, Yvonne; Reichwald, Kathrin; Rhodes, Susan; Ridler, Kerry A; Schlessinger, David; Schueler, Mary G; Sehra, Harminder K; Shaw-Smith, Charles; Shen, Hua; Sheridan, Elizabeth M; Shownkeen, Ratna; Skuce, Carl D; Smith, Michelle L; Sotheran, Elizabeth C; Steingruber, Helen E; Steward, Charles A; Storey, Roy; Swann, R Mark; Swarbreck, David; Tabor, Paul E; Taudien, Stefan; Taylor, Tineace; Teague, Brian; Thomas, Karen; Thorpe, Andrea; Timms, Kirsten; Tracey, Alan; Trevanion, Steve; Tromans, Anthony C; d'Urso, Michele; Verduzco, Daniel; Villasana, Donna; Waldron, Lenee; Wall, Melanie; Wang, Qiaoyan; Warren, James; Warry, Georgina L; Wei, Xuehong; West, Anthony; Whitehead, Siobhan L; Whiteley, Mathew N; Wilkinson, Jane E; Willey, David L; Williams, Gabrielle; Williams, Leanne; Williamson, Angela; Williamson, Helen; Wilming, Laurens; Woodmansey, Rebecca L; Wray, Paul W; Yen, Jennifer; Zhang, Jingkun; Zhou, Jianling; Zoghbi, Huda; Zorilla, Sara; Buck, David; Reinhardt, Richard; Poustka, Annemarie; Rosenthal, André; Lehrach, Hans; Meindl, Alfons; Minx, Patrick J; Hillier, Ladeana W; Willard, Huntington F; Wilson, Richard K; Waterston, Robert H; Rice, Catherine M; Vaudin, Mark; Coulson, Alan; Nelson, David L; Weinstock, George; Sulston, John E; Durbin, Richard; Hubbard, Tim; Gibbs, Richard A; Beck, Stephan; Rogers, Jane; Bentley, David R

    2005-03-17

    The human X chromosome has a unique biology that was shaped by its evolution as the sex chromosome shared by males and females. We have determined 99.3% of the euchromatic sequence of the X chromosome. Our analysis illustrates the autosomal origin of the mammalian sex chromosomes, the stepwise process that led to the progressive loss of recombination between X and Y, and the extent of subsequent degradation of the Y chromosome. LINE1 repeat elements cover one-third of the X chromosome, with a distribution that is consistent with their proposed role as way stations in the process of X-chromosome inactivation. We found 1,098 genes in the sequence, of which 99 encode proteins expressed in testis and in various tumour types. A disproportionately high number of mendelian diseases are documented for the X chromosome. Of this number, 168 have been explained by mutations in 113 X-linked genes, which in many cases were characterized with the aid of the DNA sequence.

  1. Polymorphic DNA sequences and their application in paternity testing; Polimorficzne sekwencje DNA i ich zastosowanie w dochodzeniu spornego ojcostwa

    Energy Technology Data Exchange (ETDEWEB)

    Slomski, R. [Polska Akademia Nauk, Poznan (Poland). Zaklad Genetyki Czlowieka]|[Akademia Rolnicza, Poznan (Poland)]|[Laboratorium Genetyki Molekularnej, Poznan (Poland); Kwiatkowska, J.; Chlebowska, H. [Polska Akademia Nauk, Poznan (Poland). Zaklad Genetyki Czlowieka; Siemieniallo, B. [Akademia Rolnicza, Poznan (Poland); Slomska, M. [Laboratorium Genetyki Molekularnej, Poznan (Poland)

    1994-12-31

    Characteristics of polymorphic sequences of DNA, especially satellite, mini satellite and micro satellite sequences are presented. Own experience from the use of multi and single locus analysis of DNA in paternity testing has been compared with the results of research in other laboratories. Critical points of both types of analysis are discussed. (author). 53 refs, 4 figs, 2 tabs.

  2. Determination of cDNA and genomic DNA sequences of hevamine, a chitinase from the rubber tree Hevea brasiliensis

    NARCIS (Netherlands)

    Bokma, E; Spiering, M; Chow, KS; Mulder, PPMFA; Subroto, T; Beintema, JJ

    2001-01-01

    Hevamine is a chitinase from the rubber tree Hevea brasiliensis and belongs to the family 18 glycosyl hydrolases. This paper describes the cloning of hevamine DNA and cDNA sequences. Hevamine contains a signal peptide at the N-terminus and a putative vacuolar targeting sequence at the C-terminus whi

  3. Construction of a Sequencing Library from Circulating Cell-Free DNA.

    Science.gov (United States)

    Fang, Nan; Löffert, Dirk; Akinci-Tolun, Rumeysa; Heitz, Katja; Wolf, Alexander

    2016-04-01

    Circulating DNA is cell-free DNA (cfDNA) in serum or plasma that can be used for non-invasive prenatal testing, as well as cancer diagnosis, prognosis, and stratification. High-throughput sequence analysis of the cfDNA with next-generation sequencing technologies has proven to be a highly sensitive and specific method in detecting and characterizing mutations in cancer and other diseases, as well as aneuploidy during pregnancy. This unit describes detailed procedures to extract circulating cfDNA from human serum and plasma and generate sequencing libraries from a wide concentration range of circulating DNA.

  4. Population genetic structure and historical demography of Oratosquilla oratoria revealed by mitochondrial DNA sequences.

    Science.gov (United States)

    Zhang, D; Ding, Ge; Ge, B; Zhang, H; Tang, B

    2012-12-01

    Genetic diversity, population genetic structure and molecular phylogeographic pattern of mantis shrimp Oratosquilla oratoria in Bohai Sea and South China Sea were analyzed by mitochondrial DNA sequences. Nucleotide and haplotype diversities were 0.00409-0.00669 and 0.894-0.953 respectively. Neighbor-Joining phylogenetic tree clustered two distinct lineages. Both phylogenetic tree and median-joining network showed the consistent genetic structure corresponding to geographical distribution. Mismatch distributions, negative neutral test and "star-like" network supported a sudden population expansion event. And the time was estimated about 44000 and 50000 years ago.

  5. Phylogeny of Pelargonium (Geraniaceae) based on DNA sequences from three genomes

    NARCIS (Netherlands)

    Bakker, F.T.; Culham, A.; Hettiarachi, P.; Touloumendidou, T.; Gibby, M.

    2004-01-01

    Phylogenetic hypotheses for the largely South African genus Pelargonium L'Hér. (Geraniaceae) were derived based on DNA sequence data from nuclear, chloroplast and mitochondrial encoded regions. The datasets were unequally represented and comprised cpDNA trnL-F sequences for 152 taxa, nrDNA ITS seque

  6. Effect of base sequence on the DNA cross-linking properties of pyrrolobenzodiazepine (PBD) dimers.

    Science.gov (United States)

    Rahman, Khondaker M; James, Colin H; Thurston, David E

    2011-07-01

    Pyrrolo[2,1-c][1,4]benzodiazepine (PBD) dimers are synthetic sequence-selective DNA minor-groove cross-linking agents that possess two electrophilic imine moieties (or their equivalent) capable of forming covalent aminal linkages with guanine C2-NH(2) functionalities. The PBD dimer SJG-136, which has a C8-O-(CH(2))(3)-O-C8'' central linker joining the two PBD moieties, is currently undergoing phase II clinical trials and current research is focused on developing analogues of SJG-136 with different linker lengths and substitution patterns. Using a reversed-phase ion pair HPLC/MS method to evaluate interaction with oligonucleotides of varying length and sequence, we recently reported (JACS, 2009, 131, 13 756) that SJG-136 can form three different types of adducts: inter- and intrastrand cross-linked adducts, and mono-alkylated adducts. These studies have now been extended to include PBD dimers with a longer central linker (C8-O-(CH(2))(5)-O-C8'), demonstrating that the type and distribution of adducts appear to depend on (i) the length of the C8/C8'-linker connecting the two PBD units, (ii) the positioning of the two reactive guanine bases on the same or opposite strands, and (iii) their separation (i.e. the number of base pairs, usually ATs, between them). Based on these data, a set of rules are emerging that can be used to predict the DNA-interaction behaviour of a PBD dimer of particular C8-C8' linker length towards a given DNA sequence. These observations suggest that it may be possible to design PBD dimers to target specific DNA sequences.

  7. Sequence alignment tools: one parallel pattern to rule them all?

    Science.gov (United States)

    Misale, Claudia; Ferrero, Giulio; Torquati, Massimo; Aldinucci, Marco

    2014-01-01

    In this paper, we advocate high-level programming methodology for next generation sequencers (NGS) alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools to their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols, and task scheduling, gaining more possibility for seamless performance tuning. In this work, we show some use cases in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets.

  8. Sequence Alignment Tools: One Parallel Pattern to Rule Them All?

    Directory of Open Access Journals (Sweden)

    Claudia Misale

    2014-01-01

    Full Text Available In this paper, we advocate high-level programming methodology for next generation sequencers (NGS alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools to their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols, and task scheduling, gaining more possibility for seamless performance tuning. In this work, we show some use cases in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets.

  9. Thermochemical and kinetic evidence for nucleotide-sequence-dependent RecA-DNA interactions.

    Science.gov (United States)

    Wittung, P; Ellouze, C; Maraboeuf, F; Takahashi, M; Nordèn, B

    1997-05-01

    RecA catalyses homologous recombination in Escherichia coli by promoting pairing of homologous DNA molecules after formation of a helical nucleoprotein filament with single-stranded DNA. The primary reaction of RecA with DNA is generally assumed to be unspecific. We show here, by direct measurement of the interaction enthalpy by means of isothermal titration calorimetry, that the polymerisation of RecA on single-stranded DNA depends on the DNA sequence, with a high exothermic preference for thymine bases. This enthalpic sequence preference of thymines by RecA correlates with faster binding kinetics of RecA to thymine DNA. Furthermore, the enthalpy of interaction between the RecA x DNA filament and a second DNA strand is large only when the added DNA is complementary to the bound DNA in RecA. This result suggests a possibility for a rapid search mechanism by RecA x DNA filaments for homologous DNA molecules.

  10. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    Science.gov (United States)

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  11. Molecular phylogeny of parasitic Platyhelminthes based on sequences of partial 28S rDNA D1 and mitochondrial cytochrome c oxidase subunit I.

    Science.gov (United States)

    Lee, Soo-Ung; Chun, Ha-Chung; Huh, Sun

    2007-09-01

    The phylogenic relationships existing among 14 parasitic Platyhelminthes in the Republic of Korea were investigated via the use of the partial 28S ribosomal DNA (rDNA) D1 region and the partial mitochondrial cytochrome c oxidase subunit 1 (mCOI) DNA sequences. The nucleotide sequences were analyzed by length, G + C %, nucleotide differences and gaps in order to determine the analyzed phylogenic relationships. The phylogenic patterns of the 28S rDNA D1 and mCOI regions were closely related within the same class and order as analyzed by the PAUP 4.0 program, with the exception of a few species. These findings indicate that the 28S rDNA gene sequence is more highly conserved than are the mCOI gene sequences. The 28S rDNA gene may prove useful in studies of the systematics and population genetic structures of parasitic Platyhelminthes.

  12. Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human

    Science.gov (United States)

    Wu, Chengchao; Yao, Shixin; Li, Xinghao; Chen, Chujia; Hu, Xuehai

    2017-01-01

    DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation. PMID:28212312

  13. Divergent DNA Methylation Patterns Associated with Abiotic Stress in Hevea brasiliensis

    Institute of Scientific and Technical Information of China (English)

    Thomas K. Uthup; Mlnlmol Ravindran; K. Bini; Saha Thakurdas

    2011-01-01

    Cytosine methylation is a fundamental epigenetic mechanism for gene-expression regulation and development in plants.Here,we report for the first time the identification of DNA methylation patterns and their putative relationship with abiotic stress in the tree crop Hevea brasiliensis (source of 99% of natural rubber in the world).Regulatory sequences of four major genes involved in the mevalonate pathway (rubber biosynthesis pathway) and one general defense-related gene of three high-yielding popular rubber clones grown at two different agroclimatic conditions were analyzed for the presence of methylation.We found several significant variations in the methylation pattern at core DNA binding motifs within all the five genes.Several consistent clone-specific and location-specific methylation patterns were identified.The differences in methylation pattern observed at certain pivotal cis-regulatory sites indicate the direct impact of stress on the genome and support the hypothesis of site-specific stress-induced DNA methylation.It is assumed that some of the methylation patterns observed may be involved in the stress-responsive mechanism in plants by which they adapt to extreme conditions.The study also provide clues towards the existence of highly divergent phenotypic characters among Hevea clones despite their very similar genetic make-up.Altogether,the observations from this study prove beyond doubt that there exist epigenetic variations in Hevea and environmental factors play a significant role in the induction of site-specific epigenetic mutations in its genome.

  14. Beyond DNA Sequencing in Space: Current and Future Omics Capabilities of the Biomolecule Sequencer Payload

    Science.gov (United States)

    Wallace, Sarah

    2017-01-01

    Why do we need a DNA sequencer to support the human exploration of space? (A) Operational environmental monitoring; (1) Identification of contaminating microbes, (2) Infectious disease diagnosis, (3) Reduce down mass (sample return for environmental monitoring, crew health, etc.). (B) Research; (1) Human, (2) Animal, (3) Microbes/Cell lines, (4) Plant. (C) Med Ops; (1) Response to countermeasures, (2) Radiation, (3) Real-time analysis can influence medical intervention. (C) Support astrobiology science investigations; (1) Technology superiorly suited to in situ nucleic acid-based life detection, (2) Functional testing for integration into robotics for extraplanetary exploration mission.

  15. Nucleosome positioning and kinetics near transcription-start-site barriers are controlled by interplay between active remodeling and DNA sequence.

    Science.gov (United States)

    Parmar, Jyotsana J; Marko, John F; Padinhateeri, Ranjith

    2014-01-01

    We investigate how DNA sequence, ATP-dependent chromatin remodeling and nucleosome-depleted 'barriers' co-operate to determine the kinetics of nucleosome organization, in a stochastic model of nucleosome positioning and dynamics. We find that 'statistical' positioning of nucleosomes against 'barriers', hypothesized to control chromatin structure near transcription start sites, requires active remodeling and therefore cannot be described using equilibrium statistical mechanics. We show that, unlike steady-state occupancy, DNA site exposure kinetics near a barrier is dominated by DNA sequence rather than by proximity to the barrier itself. The timescale for formation of positioning patterns near barriers is proportional to the timescale for active nucleosome eviction. We also show that there are strong gene-to-gene variations in nucleosome positioning near barriers, which are eliminated by averaging over many genes. Our results suggest that measurement of nucleosome kinetics can reveal information about sequence-dependent regulation that is not apparent in steady-state nucleosome occupancy.

  16. An improved chloroplast DNA extraction procedure for whole plastid genome sequencing.

    Science.gov (United States)

    Shi, Chao; Hu, Na; Huang, Hui; Gao, Ju; Zhao, You-Jie; Gao, Li-Zhi

    2012-01-01

    Chloroplast genomes supply valuable genetic information for evolutionary and functional studies in plants. The past five years have witnessed a dramatic increase in the number of completely sequenced chloroplast genomes with the application of second-generation sequencing technology in plastid genome sequencing projects. However, cost-effective high-throughput chloroplast DNA (cpDNA) extraction becomes a major bottleneck restricting the application, as conventional methods are difficult to make a balance between the quality and yield of cpDNAs. We first tested two traditional methods to isolate cpDNA from the three species, Oryza brachyantha, Leersia japonica and Prinsepia utihis. Both of them failed to obtain properly defined cpDNA bands. However, we developed a simple but efficient method based on sucrose gradients and found that the modified protocol worked efficiently to isolate the cpDNA from the same three plant species. We sequenced the isolated DNA samples with Illumina (Solexa) sequencing technology to test cpDNA purity according to aligning sequence reads to the reference chloroplast genomes, showing that the reference genome was properly covered. We show that 40-50% cpDNA purity is achieved with our method. Here we provide an improved method used to isolate cpDNA from angiosperms. The Illumina sequencing results suggest that the isolated cpDNA has reached enough yield and sufficient purity to perform subsequent genome assembly. The cpDNA isolation protocol thus will be widely applicable to the plant chloroplast genome sequencing projects.

  17. PRIMEGENS-v2: genome-wide primer design for analyzing DNA methylation patterns of CpG islands.

    Science.gov (United States)

    Srivastava, Gyan P; Guo, Juyuan; Shi, Huidong; Xu, Dong

    2008-09-01

    DNA methylation plays important roles in biological processes and human diseases, especially cancers. High-throughput bisulfite genomic sequencing based on new generation of sequencers, such as the 454-sequencing system provides an efficient method for analyzing DNA methylation patterns. The successful implementation of this approach depends on the use of primer design software capable of performing genome-wide scan for optimal primers from in silico bisulfite-treated genome sequences. We have developed a method, which fulfills this requirement and conduct primer design for sequences including regions of given promoter CpG islands. The developed method has been implemented using the C and JAVA programming languages. The primer design results were tested in the PCR experiments of 96 selected human DNA sequences containing CpG islands in the promoter regions. The results indicate that this method is efficient and reliable for designing sequence-specific primers. The sequence-specific primer design for DNA meth-ylated sequences including CpG islands has been integrated into the second version of PRIMEGENS as one of the primer design features. The software is freely available for academic use at http://digbio.missouri.edu/primegens/.

  18. Analysis of mitochondrial DNA sequences in patients with isolated or combined oxidative phosphorylation system deficiency.

    NARCIS (Netherlands)

    Hinttala, R.; Smeets, R.; Moilanen, J.S.; Ugalde, C.; Uusimaa, J.; Smeitink, J.A.M.; Majamaa, K.

    2006-01-01

    BACKGROUND: Enzyme deficiencies of the oxidative phosphorylation (OXPHOS) system may be caused by mutations in the mitochondrial DNA (mtDNA) or in the nuclear DNA. OBJECTIVE: To analyse the sequences of the mtDNA coding region in 25 patients with OXPHOS system deficiency to identify the underlying g

  19. A discriminative approach for unsupervised clustering of DNA sequence motifs.

    Directory of Open Access Journals (Sweden)

    Philip Stegmaier

    Full Text Available Algorithmic comparison of DNA sequence motifs is a problem in bioinformatics that has received increased attention during the last years. Its main applications concern characterization of potentially novel motifs and clustering of a motif collection in order to remove redundancy. Despite growing interest in motif clustering, the question which motif clusters to aim at has so far not been systematically addressed. Here we analyzed motif similarities in a comprehensive set of vertebrate transcription factor classes. For this we developed enhanced similarity scores by inclusion of the information coverage (IC criterion, which evaluates the fraction of information an alignment covers in aligned motifs. A network-based method enabled us to identify motif clusters with high correspondence to DNA-binding domain phylogenies and prior experimental findings. Based on this analysis we derived a set of motif families representing distinct binding specificities. These motif families were used to train a classifier which was further integrated into a novel algorithm for unsupervised motif clustering. Application of the new algorithm demonstrated its superiority to previously published methods and its ability to reproduce entrained motif families. As a result, our work proposes a probabilistic approach to decide whether two motifs represent common or distinct binding specificities.

  20. The contrasting structures of mismatched DNA sequences containing looped-out bases (bulges) and multiple mismatches (bubbles).

    Science.gov (United States)

    Bhattacharyya, A; Lilley, D M

    1989-09-12

    We have studied the structure and reactivities of two kinds of mismatched DNA sequences--unopposed bases, or bulges, and multiple mismatched pairs of bases. These were generated in a constant sequence environment, in relatively long DNA fragments, using a technique based on heteroduplex formation between sequences cloned into single-stranded M13 phage. The mismatched sequences were studied from two points of view, viz 1. The mobility of the fragments on gel electrophoresis in polyacrylamide was studied in order to examine possible bending of the DNA due to the presence of the mismatch defect. Such bending would constitute a global effect on the conformation of the molecule. 2. Sequences in and around the mismatches were studied using enzyme and chemical probes of DNA structure. This would reveal more local structural effects of the mismatched sequences. We observed that the structures of the bulges and the multiple mismatches appear to be fundamentally different. The bulged sequences exhibited a large gel retardation, consistent with a significant bending of the DNA at the bulge, and whose magnitude depends on the number of mismatched bases. The larger bulges were sensitive to cleavage by single-strand specific nucleases, and modified by diethyl pyrocarbonate (adenines) or osmium tetroxide (thymines) in a non-uniform way, suggesting that the bulges have a precise structure that leads to exposure of some, but not all, of the bases. In contrast the multiple mismatches ('bubbles') cause very much less bending of the DNA fragment in which they occur, and uniform patterns of chemical reactivity along the length of the mismatched sequences, suggesting a less well defined, and possibly flexible, structure. The precise structure of the bulges suggests that such features may be especially significant for recognition by proteins.

  1. Analysis of cDNA sequence, protein structure and expression of parotid secretory protein in pig

    Institute of Scientific and Technical Information of China (English)

    YIN Haifang; FAN Baoliang; ZHAO Zhihui; LIU Zhaoliang; FEI Jing; LI Ning

    2003-01-01

    Parotid secretory protein (PSP) secreted abundantly in saliva, whose function is related with the anti-bacterial effect. The PSP cDNA has been isolated from pig parotid glands by 3′ and 5′ rapid amplification of cDNA end (RACE),based on the conserved signal peptide region among the known mammalian PSP. Theresult of homologous comparison shows that pig PSP and human PSP shares the high identity at the level of the primary, secondary and tertiary protein structure. A search for functionally significant protein motifs revealed a unique amino acid sequence pattern consisting of the residues Leu-X(6)-Leu-X(6)-Leu- X(7)-Leu-X(6)-Leu-X(6)-Leu near the amino-terminal portion of the protein, which is important to its function. RT-PCR, Dot blot and Northern blot analysis demonstrated that PSP was strongly expressed in parotid glands, but not in other tissues.

  2. Immunological inter-strain crossreactivity correlated to 18S rDNA sequence types in Acanthamoeba spp.

    Science.gov (United States)

    Walochnik, J; Obwaller, A; Aspöck, H

    2001-02-01

    Various species of the genus Acanthamoeba have been described as potential pathogens; however, differentiation of acanthamoebae remains problematic. The genus has been divided into 12 18S rDNA sequence types, most keratitis causing strains exhibiting sequence type T4. We recently isolated a keratitis causing Acanthamoeba strain showing sequence type T6, but being morphologically identical to a T4 strain. The aim of our study was to find out, whether the 18S rDNA sequence based identification correlates to immunological differentiation. The protein and antigen profiles of the T6 isolate and three reference Acanthamoeba strains were investigated using two sera from Acanthamoeba keratitis patients and one serum from an asymptomatic individual. It was shown, that the T6 strain produces a distinctly different immunological pattern, while patterns within T4 were identical. Affinity purified antibodies were used to further explore immunological cross-reactivity between sequence types. Altogether, the results of our study support the Acanthamoeba 18S rDNA sequence type classification in the investigated strains.

  3. Examination of species boundaries in the Acropora cervicornis group (Scleractinia, cnidaria) using nuclear DNA sequence analyses.

    Science.gov (United States)

    Oppen, M J; Willis, B L; Vugt, H W; Miller, D J

    2000-09-01

    Although Acropora is the most species-rich genus of the scleractinian (stony) corals, only three species occur in the Caribbean: A. cervicornis, A. palmata and A. prolifera. Based on overall coral morphology, abundance and distribution patterns, it has been suggested that A. prolifera may be a hybrid between A. cervicornis and A. palmata. The species boundaries among these three morphospecies were examined using DNA sequence analyses of the nuclear Pax-C 46/47 intron and the ribosomal DNA Internal Transcribed Spacer (ITS1 and ITS2) and 5.8S regions. Moderate levels of sequence variability were observed in the ITS and 5.8S sequences (up to 5.2% overall sequence difference), but variability within species was as large as between species and all three species carried similar sequences. Since this is unlikely to represent a shared ancestral polymorphism, the data suggest that introgressive hybridization occurs among the three species. For the Pax-C intron, A. cervicornis and A. palmata had very distinct allele frequencies and A. cervicornis carried a unique allele at a frequency of 0.769 (although sequence differences between alleles were small). All A. prolifera colonies examined were heterozygous for the Pax-C intron, whereas heterozygosity was only 0.286 and 0.333 for A. cervicornis and A. palmata, respectively. These data support the hypothesis that A. prolifera is the product of hybridization between two species that have a different allelic composition for the Pax-C intron, i.e. A. cervicornis and A. palmata. We therefore suggest that A. prolifera is a hybrid between A. cervicornis and A. palmata, which backcrosses with the parental species at low frequency.

  4. Z-DNA-forming sequences generate large-scale deletions in mammalian cells

    OpenAIRE

    Wang, Guliang; Christensen, Laura A.; Vasquez, Karen M.

    2006-01-01

    Spontaneous chromosomal breakages frequently occur at genomic hot spots in the absence of DNA damage and can result in translocation-related human disease. Chromosomal breakpoints are often mapped near purine–pyrimidine Z-DNA-forming sequences in human tumors. However, it is not known whether Z-DNA plays a role in the generation of these chromosomal breakages. Here, we show that Z-DNA-forming sequences induce high levels of genetic instability in both bacterial and mammalian cells. In mammali...

  5. Identification of H. Pylori strain specific DNA sequences between two clinical isolates from NUD and gastric ulcer by SSH

    Institute of Scientific and Technical Information of China (English)

    Feng-Chan Han; Min Gong; Han-Chong Ng; Bow Ho

    2003-01-01

    AIM: The genomes of Helicobacter pylori(H. pylori) from different individuals are different. This project was to identify the strain specific DNA sequences between two clinical H. pylori isolates by suppression subtractive hybridization (SSH).METHODS: Two clinical H. pylori isolates, one from gastric ulcer (GU, tester) and the other from non-ulcer dyspepsia (NUD, driver), were cultured and the genomic DNA was prepared and submitted to AluⅠdigestion. Then two different adaptors were ligated respectively to the 5′-end of two aliquots of the tester DNA fragments and SSH was made between the tester and driver DNA. The un-hybridized tester DNA sequences were amplified by two sequential PCR and cloned into pGEM-T-Easy Vector. The tester strain specific inserts were screened and disease related DNA sequences were identified by dot blotting.RESULTS: Among the 240 colonies randomly chosen, 50contained the tester strain specific DNA sequences. Twenty three inserts were sequenced and the sizes ranged from 261 bp to 1 036 bp. Fifteen inserts belonged to the H.pylori plasmid pHPO100 that is about 3.5 kb and codes a replication protein A. Other inserts had patches of homologous to the genes of H. pylori in GenBank. Various patterns of dot blots were given and no GU strain unique DNA sequences were found when 4 inserts were used as probes to screen the genomic DNA from 27 clinical isolates, 8 from GU, 12 from duodenum ulcer (DU), 4 from GU-DU, 2 from NUD and 1from gastric cancer (GC). But a 670 bp DNA fragment (GU198)that was a bit homologous to the 3′-end of the gene of thymidylate kinase was positive in 7 GU strains (7/8), 3 GUDU strains (3/4) and 3 DU strains (3/12). A 384 bp fragment (GU79) of the replication gene A (repA) was positive only in 4 H, pylori isolates, 2 from GU and 2 from GU-DU.CONCLUSION: Differences exist in the genes of different H.pylori isolates. SSH is very effective to screen H. pylori strain specific DNA sequences between two clinical isolates

  6. Generating Exome Enriched Sequencing Libraries from Formalin-Fixed, Paraffin-Embedded Tissue DNA for Next-Generation Sequencing.

    Science.gov (United States)

    Marosy, Beth A; Craig, Brian D; Hetrick, Kurt N; Witmer, P Dane; Ling, Hua; Griffith, Sean M; Myers, Benjamin; Ostrander, Elaine A; Stanford, Janet L; Brody, Lawrence C; Doheny, Kimberly F

    2017-01-11

    This unit describes a technique for generating exome-enriched sequencing libraries using DNA extracted from formalin-fixed paraffin-embedded (FFPE) samples. Utilizing commercially available kits, we present a low-input FFPE workflow starting with 50 ng of DNA. This procedure includes a repair step to address damage caused by FFPE preservation that improves sequence quality. Subsequently, libraries undergo an in-solution-targeted selection for exons, followed by sequencing using the Illumina next-generation short-read sequencing platform. © 2017 by John Wiley & Sons, Inc.

  7. Localization of a new highly repeated DNA sequence of Lemur cafta (Lemuridae, Strepsirhini).

    Science.gov (United States)

    Boniotto, Michele; Ventura, Mario; Cardone, Maria Francesca; Boaretto, Francesca; Archidiacono, Nicoletta; Rocchi, Mariano; Crovella, Sergio

    2002-10-01

    We have isolated and cloned an 800-bp highly repeated DNA (HRDNA) sequence from Lemur catta (LCA) and described its localization on LCA chromosomes. Lemur catta HRDNA sequences were localized by performing FISH experiments on standard and elongated metaphasic chromosomes using an LCA HRDNA probe (LCASAT). A complex hybridization pattern was detected. A strong pericentromeric hybridization signal was observed on most LCA chromosomes. Chromosomes 7 and 13 were lit in pericentromeric regions, as well as in the interspersed heterochromatin. Chromosomes 1, 3, 4, 17, 19, X, and microchromosomes (20, 25, 26, and 27) showed no signals in the pericentromeric region, but chromosomes 3 and 4 showed a positive hybridization in heterochromatic regions. The 800-bp L catta HRDNA was species specific. We performed FISH experiments with the LCASAT probe on Eulemur macaco macaco (EMA) and Eulemur fulvus fulvus (EFU) metaphases and no positive signal of hybridization was detected. These findings were also confirmed by Southern blot analysis and PCR.

  8. Sequence selective naked-eye detection of DNA harnessing extension of oligonucleotide-modified nucleotides.

    Science.gov (United States)

    Verga, Daniela; Welter, Moritz; Marx, Andreas

    2016-02-01

    DNA polymerases can efficiently and sequence selectively incorporate oligonucleotide (ODN)-modified nucleotides and the incorporated oligonucleotide strand can be employed as primer in rolling circle amplification (RCA). The effective amplification of the DNA primer by Φ29 DNA polymerase allows the sequence-selective hybridisation of the amplified strand with a G-quadruplex DNA sequence that has horse radish peroxidase-like activity. Based on these findings we develop a system that allows DNA detection with single-base resolution by naked eye.

  9. True single-molecule DNA sequencing of a pleistocene horse bone

    DEFF Research Database (Denmark)

    Orlando, Ludovic Antoine Alexandre; Ginolhac, Aurélien; Raghavan, Maanasa

    2011-01-01

    -preserved Pleistocene horse bone using the Helicos HeliScope and Illumina GAIIx platforms, respectively. We find that the percentage of endogenous DNA sequences derived from the horse is higher among the Helicos data than Illumina data. This result indicates that the molecular biology tools used to generate sequencing...... to the standard Helicos DNA template preparation protocol further increase the proportion of horse DNA for this sample by 3-fold. Comparison of Helicos-specific biases and sequence errors in modern DNA with those in ancient DNA also reveals extensive cytosine deamination damage at the 3' ends of ancient templates...

  10. A High-Throughput Process for the Solid-Phase Purification of Synthetic DNA Sequences.

    Science.gov (United States)

    Grajkowski, Andrzej; Cieślak, Jacek; Beaucage, Serge L

    2017-06-19

    An efficient process for the purification of synthetic phosphorothioate and native DNA sequences is presented. The process is based on the use of an aminopropylated silica gel support functionalized with aminooxyalkyl functions to enable capture of DNA sequences through an oximation reaction with the keto function of a linker conjugated to the 5'-terminus of DNA sequences. Deoxyribonucleoside phosphoramidites carrying this linker, as a 5'-hydroxyl protecting group, have been synthesized for incorporation into DNA sequences during the last coupling step of a standard solid-phase synthesis protocol executed on a controlled pore glass (CPG) support. Solid-phase capture of the nucleobase- and phosphate-deprotected DNA sequences released from the CPG support is demonstrated to proceed near quantitatively. Shorter than full-length DNA sequences are first washed away from the capture support; the solid-phase purified DNA sequences are then released from this support upon reaction with tetra-n-butylammonium fluoride in dry dimethylsulfoxide (DMSO) and precipitated in tetrahydrofuran (THF). The purity of solid-phase-purified DNA sequences exceeds 98%. The simulated high-throughput and scalability features of the solid-phase purification process are demonstrated without sacrificing purity of the DNA sequences. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  11. DNA methyltransferase 1 and DNA methylation patterning contribute to germinal center B-cell differentiation

    DEFF Research Database (Denmark)

    Shaknovich, Rita; Cerchietti, Leandro; Tsikitas, Lucas

    2011-01-01

    The phenotype of germinal center (GC) B cells includes the unique ability to tolerate rapid proliferation and the mutagenic actions of activation induced cytosine deaminase (AICDA). Given the importance of epigenetic patterning in determining cellular phenotypes, we examined DNA methylation and t...

  12. Forensic DNA Banding Patterns: How to Simulate & Explain DNA Fingerprinting in a Classroom with No Budget

    Science.gov (United States)

    Christensen, Doug

    2013-01-01

    Understanding how DNA banding patterns in a gel can aid in the conviction or exoneration of suspects and be utilized for positive identification of biological fathers in paternity cases can be intimidating. In reality, the logistics and technology used in such cases are rather straightforward. This exercise is designed for use in high school…

  13. Forensic DNA Banding Patterns: How to Simulate & Explain DNA Fingerprinting in a Classroom with No Budget

    Science.gov (United States)

    Christensen, Doug

    2013-01-01

    Understanding how DNA banding patterns in a gel can aid in the conviction or exoneration of suspects and be utilized for positive identification of biological fathers in paternity cases can be intimidating. In reality, the logistics and technology used in such cases are rather straightforward. This exercise is designed for use in high school…

  14. Sperm DNA fragmentation is related to sperm morphological staining patterns.

    Science.gov (United States)

    Sá, Rosália; Cunha, Mariana; Rocha, Eduardo; Barros, Alberto; Sousa, Mário

    2015-10-01

    In this prospective comparative study, sperm DNA fragmentation (sDNAfrag) was compared at each step of a sequential semen preparation, with semen parameters according to their degree of severity. At each step (fractions) of the sequential procedure, sDNAfrag was determined: fresh (Raw), after gradient centrifugation, washing, and swim-up (SU) for 70 infertile men enrolled in intracytoplasmic sperm injection cycles. sDNAfrag significantly (P = 0.04; P < 0.0001) decreased throughout the steps of semen preparation, with centrifugation and washing not increasing it. A negative correlation to sperm motility was observed in Raw and SU fractions, and a higher sDNAfrag was observed in samples with lower semen quality. Our results confirm that the steps of the sequential procedure do not compromise sperm DNA integrity and progressively decreased sDNAfrag regardless of the sperm abnormality and that semen parameters with lower quality present higher sDNAfrag. Four distinct patterns were observed, of which the entire sperm head staining was the pattern most expressed in all studied fractions. Additionally, the sperm head gene-rich region staining pattern was reduced by the procedure. This suggests that pattern quantification might be a useful adjunct when performing sDNAfrag testing for male infertility. Copyright © 2015 Reproductive Healthcare Ltd. Published by Elsevier Ltd. All rights reserved.

  15. Cloning and molecular genetics analyses of Deschampsia antarctica Desv. chloroplast and mitochondrial DNA sequence

    Directory of Open Access Journals (Sweden)

    O.P. Savchuk

    2012-03-01

    Full Text Available Chloroplast and mitochondrial DNA sequences of Deschampsia antarctica were studied. We had made comparison analysis with completely sequenced genomes of other temperateness plants to find homology.

  16. Human DNA contains sequences homologous to the 5'-non-coding region of hepatitis C virus: characterization with restriction endonucleases reveals individual varieties

    Institute of Scientific and Technical Information of China (English)

    Reinhard H Dennin; Jianer Wo

    2003-01-01

    Objective To investigate a 272 base pair section of the 5'-non-coding region of genomic DNA from the peripheral blood monounuclear cells of healthy hepatitis virus C (HCV)-negative human subjects (not patients). Results The suspected HCV-specific sequence was found in the DNA of each subject tested. The pre-PCR digestion assay reveals individual differences in their pattern of methylation, which may be due to possible epigenetic phenomena.Conclusions The results provide formal proof that these HCV-specific sequences are contained in the genomic or extra chromosomal target DNA, and probably belong to a new class of endogenous sequences.

  17. The immunogenicity of viral haemorragic septicaemia rhabdovirus (VHSV) DNA vaccines can depend on plasmid regulatory sequences.

    Science.gov (United States)

    Chico, V; Ortega-Villaizan, M; Falco, A; Tafalla, C; Perez, L; Coll, J M; Estepa, A

    2009-03-18

    A plasmid DNA encoding the viral hemorrhagic septicaemia virus (VHSV)-G glycoprotein under the control of 5' sequences (enhancer/promoter sequence plus both non-coding 1st exon and 1st intron sequences) from carp beta-actin gene (pAE6-G(VHSV)) was compared to the vaccine plasmid usually described the gene expression is regulated by the human cytomegalovirus (CMV) immediate-early promoter (pMCV1.4-G(VHSV)). We observed that these two plasmids produced a markedly different profile in the level and time of expression of the encoded-antigen, and this may have a direct effect upon the intensity and suitability of the in vivo immune response. Thus, fish genetic immunisation assays were carried out to study the immune response of both plasmids. A significantly enhanced specific-antibody response against the viral glycoprotein was found in the fish immunised with pAE6-G(VHSV). However, the protective efficacy against VHSV challenge conferred by both plasmids was similar. Later analysis of the transcription profile of a set of representative immune-related genes in the DNA immunized fish suggested that depending on the plasmid-related regulatory sequences controlling its expression, the plasmid might activate distinct patterns of the immune system. All together, the results from this study mainly point out that the selection of a determinate encoded-antigen/vector combination for genetic immunisation is of extraordinary importance in designing optimised DNA vaccines that, when required for inducing protective immune response, could elicit responses biased to antigen-specific antibodies or cytotoxic T cells generation.

  18. Characterization of Expressed Sequence Tags From a Gallus gallus Pineal Gland cDNA Library

    OpenAIRE

    2005-01-01

    The pineal gland is the circadian oscillator in the chicken, regulating diverse functions ranging from egg laying to feeding. Here, we describe the isolation and characterization of expressed sequence tags (ESTs) isolated from a chicken pineal gland cDNA library. A total of 192 unique sequences were analysed and submitted to GenBank; 6% of the ESTs matched neither GenBank cDNA sequences nor the newly assembled chicken genomic DNA sequence, three ESTs aligned with sequences designated to be on...

  19. Sequences Characterization of Microsatellite DNA Sequences in Pacific Abalone (Haliotis discus hannat)

    Institute of Scientific and Technical Information of China (English)

    LI Qi; Kijima Akihiro

    2007-01-01

    The microsatellite-enriched library was constructed using magnetic bead hybridization selection method, and the microsatellite DNA sequences were analyzed in Pacific abalone Haliotis discus hannai. Three hundred and fifty white colonies were screened using PCR-based technique, and 84 clones were identified to potentially contain microsatellite repeat motif. The 84 clones were sequenced, and 42 microsatellites and 4 minisatellites with a minimum of five repeats were found (13.1% of white colonies screened). Besides the motif of CA contained in the oligoprobe, we also found other 16 types of microsatellite repeats including a dinucleotide repeat, two tetranucleotide repeats, twelve pentanucleotide repeats and a hexanucleotide repeat. According to Weber(1990), the microsatellite sequences obtained could be categorized structurally into perfect repeats (73.3%), imperfect repeats(13.3%), and compound repeats (13.4%). Among the microsatellite repeats, relatively short arrays (< 20 repeats) were most abundant,accounting for 75.0%. The largest length of microsatellites was 48 repeats, and the average number of repeats was 13.4. The data on the composition and length distribution of microsatellites obtained in the present study can be useful for choosing the repeat motifs for microsatetlite isolation in other abalone species.

  20. DUC-Curve, a highly compact 2D graphical representation of DNA sequences and its application in sequence alignment

    Science.gov (United States)

    Li, Yushuang; Liu, Qian; Zheng, Xiaoqi

    2016-08-01

    A highly compact and simple 2D graphical representation of DNA sequences, named DUC-Curve, is constructed through mapping four nucleotides to a unit circle with a cyclic order. DUC-Curve could directly detect nucleotide, di-nucleotide compositions and microsatellite structure from DNA sequences. Moreover, it also could be used for DNA sequence alignment. Taking geometric center vectors of DUC-Curves as sequence descriptor, we perform similarity analysis on the first exons of β-globin genes of 11 species, oncogene TP53 of 27 species and twenty-four Influenza A viruses, respectively. The obtained reasonable results illustrate that the proposed method is very effective in sequence comparison problems, and will at least play a complementary role in classification and clustering problems.

  1. Unique cell-type-specific patterns of DNA methylation in the root meristem.

    Science.gov (United States)

    Kawakatsu, Taiji; Stuart, Tim; Valdes, Manuel; Breakfield, Natalie; Schmitz, Robert J; Nery, Joseph R; Urich, Mark A; Han, Xinwei; Lister, Ryan; Benfey, Philip N; Ecker, Joseph R

    2016-04-29

    DNA methylation is an epigenetic modification that differs between plant organs and tissues, but the extent of variation between cell types is not known. Here, we report single-base-resolution whole-genome DNA methylomes, mRNA transcriptomes and small RNA transcriptomes for six cell populations covering the major cell types of the Arabidopsis root meristem. We identify widespread cell-type-specific patterns of DNA methylation, especially in the CHH sequence context, where H is A, C or T. The genome of the columella root cap is the most highly methylated Arabidopsis cell characterized so far. It is hypermethylated within transposable elements (TEs), accompanied by increased abundance of transcripts encoding RNA-directed DNA methylation (RdDM) pathway components and 24-nt small RNAs (smRNAs). The absence of the nucleosome remodeller DECREASED DNA METHYLATION 1 (DDM1), required for maintenance of DNA methylation, and low abundance of histone transcripts involved in heterochromatin formation suggests that a loss of heterochromatin may occur in the columella, thus allowing access of RdDM factors to the whole genome, and producing an excess of 24-nt smRNAs in this tissue. Together, these maps provide new insights into the epigenomic diversity that exists between distinct plant somatic cell types.

  2. A 28,000 years old Cro-Magnon mtDNA sequence differs from all potentially contaminating modern sequences.

    Directory of Open Access Journals (Sweden)

    David Caramelli

    Full Text Available BACKGROUND: DNA sequences from ancient specimens may in fact result from undetected contamination of the ancient specimens by modern DNA, and the problem is particularly challenging in studies of human fossils. Doubts on the authenticity of the available sequences have so far hampered genetic comparisons between anatomically archaic (Neandertal and early modern (Cro-Magnoid Europeans. METHODOLOGY/PRINCIPAL FINDINGS: We typed the mitochondrial DNA (mtDNA hypervariable region I in a 28,000 years old Cro-Magnoid individual from the Paglicci cave, in Italy (Paglicci 23 and in all the people who had contact with the sample since its discovery in 2003. The Paglicci 23 sequence, determined through the analysis of 152 clones, is the Cambridge reference sequence, and cannot possibly reflect contamination because it differs from all potentially contaminating modern sequences. CONCLUSIONS/SIGNIFICANCE: The Paglicci 23 individual carried a mtDNA sequence that is still common in Europe, and which radically differs from those of the almost contemporary Neandertals, demonstrating a genealogical continuity across 28,000 years, from Cro-Magnoid to modern Europeans. Because all potential sources of modern DNA contamination are known, the Paglicci 23 sample will offer a unique opportunity to get insight for the first time into the nuclear genes of early modern Europeans.

  3. Frequent mutations in EGFR, KRAS and TP53 genes in human lung cancer tumors detected by ion torrent DNA sequencing.

    Directory of Open Access Journals (Sweden)

    Xin Cai

    Full Text Available Lung cancer is the most common malignancy and the leading cause of cancer deaths worldwide. While smoking is by far the leading cause of lung cancer, other environmental and genetic factors influence the development and progression of the cancer. Since unique mutations patterns have been observed in individual cancer samples, identification and characterization of the distinctive lung cancer molecular profile is essential for developing more effective, tailored therapies. Until recently, personalized DNA sequencing to identify genetic mutations in cancer was impractical and expensive. The recent technological advancements in next-generation DNA sequencing, such as the semiconductor-based Ion Torrent sequencing platform, has made DNA sequencing cost and time effective with more reliable results. Using the Ion Torrent Ampliseq Cancer Panel, we sequenced 737 loci from 45 cancer-related genes to identify genetic mutations in 76 human lung cancer samples. The sequencing analysis revealed missense mutations in KRAS, EGFR, and TP53 genes in the breast cancer samples of various histologic types. Thus, this study demonstrates the necessity of sequencing individual human cancers in order to develop personalized drugs or combination therapies to effectively target individual, breast cancer-specific mutations.

  4. PIK3CA and TP53 gene mutations in human breast cancer tumors frequently detected by ion torrent DNA sequencing.

    Directory of Open Access Journals (Sweden)

    Xusheng Bai

    Full Text Available Breast cancer is the most common malignancy and the leading cause of cancer deaths in women worldwide. While specific genetic mutations have been linked to 5-10% of breast cancer cases, other environmental and epigenetic factors influence the development and progression of the cancer. Since unique mutations patterns have been observed in individual cancer samples, identification and characterization of the distinctive breast cancer molecular profile is needed to develop more effective target therapies. Until recently, identifying genetic cancer mutations via personalized DNA sequencing was impractical and expensive. The recent technological advancements in next-generation DNA sequencing, such as the semiconductor-based Ion Torrent sequencing platform, has made DNA sequencing cost and time effective with more reliable results. Using the Ion Torrent Ampliseq Cancer Panel, we sequenced 737 loci from 45 cancer-related genes to identify genetic mutations in 105 human breast cancer samples. The sequencing analysis revealed missense mutations in PIK3CA, and TP53 genes in the breast cancer samples of various histologic types. Thus, this study demonstrates the necessity of sequencing individual human cancers in order to develop personalized drugs or combination therapies to effectively target individual, breast cancer-specific mutations.

  5. Sequencing of megabase plus DNA by hybridization: Method development ENT. Final technical progress report

    Energy Technology Data Exchange (ETDEWEB)

    Crkvenjakov, R.; Drmanac, R.

    1991-01-31

    Sequencing by hybridization (SBH) is the only sequencing method based on the experimental determination of the content of oligonucleotide sequences. The data acquisition relies on the natural process of base pairing. It is possible to determine the content of complementary oligosequences in the target DNA by the process of hybridization with oligonucleotide probes of known sequences.

  6. One-way sequencing of multiple amplicons from tandem repetitive mitochondrial DNA control region.

    Science.gov (United States)

    Xu, Jiawu; Fonseca, Dina M

    2011-10-01

    Repetitive DNA sequences not only exist abundantly in eukaryotic nuclear genomes, but also occur as tandem repeats in many animal mitochondrial DNA (mtDNA) control regions. Due to concerted evolution, these repetitive sequences are highly similar or even identical within a genome. When long repetitive regions are the targets of amplification for the purpose of sequencing, multiple amplicons may result if one primer has to be located inside the repeats. Here, we show that, without separating these amplicons by gel purification or cloning, directly sequencing the mitochondrial repeats with the primer outside repetitive region is feasible and efficient. We exemplify it by sequencing the mtDNA control region of the mosquito Aedes albopictus, which harbors typical large tandem DNA repeats. This one-way sequencing strategy is optimal for population surveys.

  7. DNA interactions with a Methylene Blue redox indicator depend on the DNA length and are sequence specific.

    Science.gov (United States)

    Farjami, Elaheh; Clima, Lilia; Gothelf, Kurt V; Ferapontova, Elena E

    2010-06-01

    A DNA molecular beacon approach was used for the analysis of interactions between DNA and Methylene Blue (MB) as a redox indicator of a hybridization event. DNA hairpin structures of different length and guanine (G) content were immobilized onto gold electrodes in their folded states through the alkanethiol linker at the 5'-end. Binding of MB to the folded hairpin DNA was electrochemically studied and compared with binding to the duplex structure formed by hybridization of the hairpin DNA to a complementary DNA strand. Variation of the electrochemical signal from the DNA-MB complex was shown to depend primarily on the DNA length and sequence used: the G-C base pairs were the preferential sites of MB binding in the duplex. For short 20 nts long DNA sequences, the increased electrochemical response from MB bound to the duplex structure was consistent with the increased amount of bound and electrochemically readable MB molecules (i.e. MB molecules that are available for the electron transfer (ET) reaction with the electrode). With longer DNA sequences, the balance between the amounts of the electrochemically readable MB molecules bound to the hairpin DNA and to the hybrid was opposite: a part of the MB molecules bound to the long-sequence DNA duplex seem to be electrochemically mute due to long ET distance. The increasing electrochemical response from MB bound to the short-length DNA hybrid contrasts with the decreasing signal from MB bound to the long-length DNA hybrid and allows an "off"-"on" genosensor development.

  8. Isolation of DNA for Sequence Analysis from Herbarium Material of Some Lichen Specimens

    OpenAIRE

    Aras, Sümer; CANSARAN, Demet

    2006-01-01

    An improved protocol for the isolation of DNA from herbarium material of some lichen specimens is described. The isolated DNA is suitable for PCR reactions for DNA sequence analysis. The hexadecyltrimethylammonium bromide (CTAB) based protocol defined in this study provides a number of advantages, mainly speed and reliability. In addition, different DNA extraction protocols were examined to determine the yield of DNA from the thallus of lichen specimens. The methods examined include a CTAB ba...

  9. Isolation of DNA for Sequence Analysis from Herbarium Material of Some Lichen Specimens

    OpenAIRE

    Aras, Sümer; CANSARAN, Demet

    2014-01-01

    An improved protocol for the isolation of DNA from herbarium material of some lichen specimens is described. The isolated DNA is suitable for PCR reactions for DNA sequence analysis. The hexadecyltrimethylammonium bromide (CTAB) based protocol defined in this study provides a number of advantages, mainly speed and reliability. In addition, different DNA extraction protocols were examined to determine the yield of DNA from the thallus of lichen specimens. The methods examined include a CTAB ba...

  10. Evolution in the block: common elements of 5S rDNA organization and evolutionary patterns in distant fish genera.

    Science.gov (United States)

    Campo, Daniel; García-Vázquez, Eva

    2012-01-01

    The 5S rDNA is organized in the genome as tandemly repeated copies of a structural unit composed of a coding sequence plus a nontranscribed spacer (NTS). The coding region is highly conserved in the evolution, whereas the NTS vary in both length and sequence. It has been proposed that 5S rRNA genes are members of a gene family that have arisen through concerted evolution. In this study, we describe the molecular organization and evolution of the 5S rDNA in the genera Lepidorhombus and Scophthalmus (Scophthalmidae) and compared it with already known 5S rDNA of the very different genera Merluccius (Merluccidae) and Salmo (Salmoninae), to identify common structural elements or patterns for understanding 5S rDNA evolution in fish. High intra- and interspecific diversity within the 5S rDNA family in all the genera can be explained by a combination of duplications, deletions, and transposition events. Sequence blocks with high similarity in all the 5S rDNA members across species were identified for the four studied genera, with evidences of intense gene conversion within noncoding regions. We propose a model to explain the evolution of the 5S rDNA, in which the evolutionary units are blocks of nucleotides rather than the entire sequences or single nucleotides. This model implies a "two-speed" evolution: slow within blocks (homogenized by recombination) and fast within the gene family (diversified by duplications and deletions).

  11. Genome organization and DNA methylation patterns of B chromosomes in the red fox and Chinese raccoon dogs.

    Science.gov (United States)

    Bugno-Poniewierska, Monika; Solek, Przemysław; Wronski, Mariusz; Potocki, Leszek; Jezewska-Witkowska, Grażyna; Wnuk, Maciej

    2014-12-01

    The molecular structure of B chromosomes (Bs) is relatively well studied. Previous research demonstrates that Bs of various species usually contain two types of repetitive DNA sequences, satellite DNA and ribosomal DNA, but Bs also contain genes encoding histone proteins and many others. However, many questions remain regarding the origin and function of these chromosomes. Here, we focused on the comparative cytogenetic characteristics of the red fox and Chinese raccoon dog B chromosomes with particular attention to the distribution of repetitive DNA sequences and their methylation status. We confirmed that the small Bs of the red fox show a typical fluorescent telomeric distal signal, whereas medium-sized Bs of the Chinese raccoon dog were characterized by clusters of telomeric sequences along their length. We also found different DNA methylation patterns for the B chromosomes of both species. Therefore, we concluded that DNA methylation may maintain the transcriptional inactivation of DNA sequences localized to B chromosomes and may prevent genetic unbalancing and several negative phenotypic effects.

  12. RevTrans: multiple alignment of coding DNA from aligned amino acid sequences

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Pedersen, Anders Gorm

    2003-01-01

    The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the 'signal-to-noise ratio' in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit...... proteins. It is therefore preferable to align coding DNA at the amino acid level and it is for this purpose we have constructed the program RevTrans. RevTrans constructs a multiple DNA alignment by: (i) translating the DNA; (ii) aligning the resulting peptide sequences; and (iii) building a multiple DNA...... alignment by 'reverse translation' of the aligned protein sequences. In the resulting DNA alignment, gaps occur in groups of three corresponding to entire codons, and analogous codon positions are therefore always lined up. These features are useful when constructing multiple DNA alignments for phylogenetic...

  13. A tobacco cDNA reveals two different transcription patterns in vegetative and reproductive organs

    Directory of Open Access Journals (Sweden)

    I. da Silva

    2002-08-01

    Full Text Available In order to identify genes expressed in the pistil that may have a role in the reproduction process, we have established an expressed sequence tags project to randomly sequence clones from a Nicotiana tabacum stigma/style cDNA library. A cDNA clone (MTL-8 showing high sequence similarity to genes encoding glycine-rich RNA-binding proteins was chosen for further characterization. Based on the extensive identity of MTL-8 to the RGP-1a sequence of N. sylvestris, a primer was defined to extend the 5' sequence of MTL-8 by RT-PCR from stigma/style RNAs. The amplification product was sequenced and it was confirmed that MTL-8 corresponds to an mRNA encoding a glycine-rich RNA-binding protein. Two transcripts of different sizes and expression patterns were identified when the MTL-8 cDNA insert was used as a probe in RNA blots. The largest is 1,100 nucleotides (nt long and markedly predominant in ovaries. The smaller transcript, with 600 nt, is ubiquitous to the vegetative and reproductive organs analyzed (roots, stems, leaves, sepals, petals, stamens, stigmas/styles and ovaries. Plants submitted to stress (wounding, virus infection and ethylene treatment presented an increased level of the 600-nt transcript in leaves, especially after tobacco necrosis virus infection. In contrast, the level of the 1,100-nt transcript seems to be unaffected by the stress conditions tested. Results of Southern blot experiments have suggested that MTL-8 is present in one or two copies in the tobacco genome. Our results suggest that the shorter transcript is related to stress while the larger one is a flower predominant and nonstress-inducible messenger.

  14. SeeDNA: A Visualization Tool for K-string Content of Long DNA Sequences and Their Randomized Counterparts

    Institute of Scientific and Technical Information of China (English)

    Junjie Shen; Shuyu Zhang; Hoong-Chien Lee; Bailin Hao

    2004-01-01

    An interactive tool to visualize the K-string composition of long DNA sequences including bacterial complete genomes is described. It is especially useful for exploring short palindromic structures in the sequences. The SeeDNA program runs on Red Hat Linux with GTK+ support. It displays two-dimensional (2D) or one-dimensional (1D) histograms of the K-string distribution of a given sequence and/or its randomized counterpart. It is also capable of showing the difference of K-string distributions between two sequences. The C source code using the GTK+package is freely available.

  15. Analysis and location of a rice BAC clone containing telomeric DNA sequences

    Institute of Scientific and Technical Information of China (English)

    翟文学; 陈浩; 颜辉煌; 严长杰; 王国梁; 朱立煌

    1999-01-01

    BAC2, a rice BAC clone containing (TTTAGGG)n homologous sequences, was analyzed by Southern hybridization and DNA sequencing of its subclones. It was disclosed that there were many tandem repeated satellite DNA sequences, called TA352, as well as simple tandem repeats consisting of TTTAGGG or its variant within the BAC2 insert. A 0. 8 kb (TTTAGGG) n-containing fragment in BAC2 was mapped in the telomere regions of at least 5 pairs of rice chromosomes by using fluorescence in situ hybridization (FISH). By RFLP analysis of low copy sequences the BAC2 clone was localized in one terminal region of chromosome 6. All the results strongly suggest that the telomeric DNA sequences of rice are TTTAGGG or its variant, and the linked satellite DNA TA352 sequences belong to telomere-associated sequences.

  16. [A method for determining DNA sequence by labeling the end of the molecule and cleaving at the base. Isolation of DNA fragments, end-labeling, cleavage, electrophoresis in polyacrylamide gel and analysis of results].

    Science.gov (United States)

    Maxam, A M; Gilbert, W

    1986-01-01

    We elaborate basic chemical principles and current laboratory procedures for sequencing end-labeled DNA by partial cleavage and gel electrophoresis (A. M. Maxam and W. Gilbert, Proc. Natl. Acad. Sci. USA, 1977, v. 74, p. 560-564). We provide step-by-step protocols for 32P-labeling DNA ends, segregating the labeled ends by cutting with a second restriction enzyme or separating strands, partially cleaving the DNA at specific bases with reagents, electrophoresing the labeled products of cleavage on sequencing gels, and interpreting sequencing band patterns. Many of these procedures have been condensed, to make them faster and easier, and some are new. We also discuss sequencing strategies, and suggest a technique which will reduce plasmid or viral DNA to a collection of singly-end-labeled fragments in one day, for efficient sequencing of these chromosomes in 250-nucleotide blocks.

  17. Complete sequence analysis of 18S rDNA based on genomic DNA extraction from individual Demodex mites (Acari: Demodicidae).

    Science.gov (United States)

    Zhao, Ya-E; Xu, Ji-Ru; Hu, Li; Wu, Li-Ping; Wang, Zheng-Hang

    2012-05-01

    The study for the first time attempted to accomplish 18S ribosomal DNA (rDNA) complete sequence amplification and analysis for three Demodex species (Demodex folliculorum, Demodex brevis and Demodex canis) based on gDNA extraction from individual mites. The mites were treated by DNA Release Additive and Hot Start II DNA Polymerase so as to promote mite disruption and increase PCR specificity. Determination of D. folliculorum gDNA showed that the gDNA yield reached the highest at 1 mite, tending to descend with the increase of mite number. The individual mite gDNA was successfully used for 18S rDNA fragment (about 900 bp) amplification examination. The alignments of 18S rDNA complete sequences of individual mite samples and those of pooled mite samples ( ≥ 1000mites/sample) showed over 97% identities for each species, indicating that the gDNA extracted from a single individual mite was as satisfactory as that from pooled mites for PCR amplification. Further pairwise sequence analyses showed that average divergence, genetic distance, transition/transversion or phylogenetic tree could not effectively identify the three Demodex species, largely due to the differentiation in the D. canis isolates. It can be concluded that the individual Demodex mite gDNA can satisfy the molecular study of Demodex. 18S rDNA complete sequence is suitable for interfamily identification in Cheyletoidea, but whether it is suitable for intrafamily identification cannot be confirmed until the ascertainment of the types of Demodex mites parasitizing in dogs.

  18. mtDNA sequences suggest a recent evolutionary divergence for Beringian and Northern American populations

    Energy Technology Data Exchange (ETDEWEB)

    Shields, G.F.; Schmiechen, A.M.; Reed, J.K. (Univ. of Alaska, Fairbanks, AK (United States)); Frazier, B.L.; Redd, A.; Ward, R.H. (Univ. of Utah, Salt Lake City, UT (United States)); Voevoda, M.I. (Institute of Internal Medicine, Novosibirsk (Russian Federation))

    1993-09-01

    Conventional descriptions of the pattern and process of human entry into the New World from Asia are incomplete and controversial. In order to gain an evolutionary insight into this process, the authors have sequenced the control region of mtDNA in samples of contemporary tribal populations of eastern Siberia, Alaska, and Greenland and have compared them with those of Amerind speakers of the Pacific Northwest and with those of the Altai of central Siberia. Specifically, they have analyzed sequence diversity in 33 mitochondiral lineages identified in 90 individuals belonging to five Circumpolar populations of Beringia, North America, and Greenland: Chukchi from Siberia, Inupiaq Eskimos and Athapaskans from Alaska, Eskimos from West Greenland, and Haida from Canada. Hereafter, these five populations are referred to as Circumarctic peoples'. These data were then compared with the sequence diversity in 47 mitochondrial lineages identified in a sample of 145 individuals from three Amerind-speaking tribes (Bella Coola, Nuu-Chah-Nulth, and Yakima) of the Pacific Northwest, plus 16 mitrochondrial lineages identified in a sample of 17 Altai from central Siberia. Sequence diversity within and among Circumarctic populations is considerably less than the sequence diversity observed within and among the three Amerind tribes. The similarity of sequences found among the geographically dispersed Circumarctic groups, plus the small values of mean pairwise sequence differences within Circumarctic populations, suggest a recent and rapid evolutionary radiation of these populations. In addition, Circumarctic populations lack the 9-bp deletion which has been used to trace various migrations out of Asia, while populations of southeastern Siberia possess this deletion. On the basis of these observations, while the evolutionary affinities of Native Americans extend west to the Circumarctic populations of eastern Siberia, they do not include the Altai of central Siberia.

  19. The DNA sequence and biology of human chromosome 19.

    Science.gov (United States)

    Grimwood, Jane; Gordon, Laurie A; Olsen, Anne; Terry, Astrid; Schmutz, Jeremy; Lamerdin, Jane; Hellsten, Uffe; Goodstein, David; Couronne, Olivier; Tran-Gyamfi, Mary; Aerts, Andrea; Altherr, Michael; Ashworth, Linda; Bajorek, Eva; Black, Stacey; Branscomb, Elbert; Caenepeel, Sean; Carrano, Anthony; Caoile, Chenier; Chan, Yee Man; Christensen, Mari; Cleland, Catherine A; Copeland, Alex; Dalin, Eileen; Dehal, Paramvir; Denys, Mirian; Detter, John C; Escobar, Julio; Flowers, Dave; Fotopulos, Dea; Garcia, Carmen; Georgescu, Anca M; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Ho, Isaac; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Larionov, Vladimer; Leem, Sun-Hee; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Malfatti, Stephanie; Martinez, Diego; McCready, Paula; Medina, Catherine; Morgan, Jenna; Nelson, Kathryn; Nolan, Matt; Ovcharenko, Ivan; Pitluck, Sam; Pollard, Martin; Popkie, Anthony P; Predki, Paul; Quan, Glenda; Ramirez, Lucia; Rash, Sam; Retterer, James; Rodriguez, Alex; Rogers, Stephanine; Salamov, Asaf; Salazar, Angelica; She, Xinwei; Smith, Doug; Slezak, Tom; Solovyev, Victor; Thayer, Nina; Tice, Hope; Tsai, Ming; Ustaszewska, Anna; Vo, Nu; Wagner, Mark; Wheeler, Jeremy; Wu, Kevin; Xie, Gary; Yang, Joan; Dubchak, Inna; Furey, Terrence S; DeJong, Pieter; Dickson, Mark; Gordon, David; Eichler, Evan E; Pennacchio, Len A; Richardson, Paul; Stubbs, Lisa; Rokhsar, Daniel S; Myers, Richard M; Rubin, Edward M; Lucas, Susan M

    2004-04-01

    Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high G + C content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9% of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in mendelian disorders, including familial hypercholesterolaemia and insulin-resistant diabetes. Nearly one-quarter of these genes belong to tandemly arranged families, encompassing more than 25% of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.

  20. The DNA sequence and biology of human chromosome 19

    Energy Technology Data Exchange (ETDEWEB)

    Grimwood, J; Gordon, L A; Olsen, A; Terry, A; Schmutz, J; Lamerdin, J; Hellsten, U; Goodstein, D; Couronne, O; Tran-Gyamfi, M

    2004-04-06

    Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high GC content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9% of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in Mendelian disorders, including familial hypercholesterolemia and insulin-resistant diabetes. Nearly one quarter of these genes belong to tandemly arranged families, encompassing more than 25% of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.

  1. Silicon Nanopore Devices for DNA Translocation and Sequencing Studies

    Science.gov (United States)

    Ling, Sean

    2005-03-01

    In this talk, I will discuss the recent progress [1-3] in developing solid-state nanopore devices using silicon technology. We have demonstrated a novel technique for shaping nanopores in the range of 1-10 nm, using surface-tension-driven mass flow with single nanometer precision. This technique overcomes a major technical challenge in silicon technology. I will also discuss the current effort [3] in developing integrated nanopore silicon chips with electrically addressable nanopores. These devices are used for DNA translocation and sequencing studies. This work was done in collaboration with the group of Cees Dekker at TU-Delft with partial support from FOM and Guggenheim Foundation. The work at Brown was supported by NSF-NER and NSF-NIRT. [1] A.J. Storm, J.H. Chen, X.S. Ling, H. Zandbergen, and C. Dekker, ``Fabrication of Solid-State Nanopores with Single Nanometer Precision'', Nature Materials, 2, 537 (2003). [2] A.J. Storm, J.H. Chen, X.S. Ling, H. Zandbergen, and C. Dekker, ``Electron-Beam-Induced Deformations of SiO2 Nanostructures'', Journal of Applied Physics (submitted, 2004). [3] X.S. Ling, "Addressable nanopores and micropores" (patent pending).

  2. [Patentability of DNA sequences: the debate remains open].

    Science.gov (United States)

    Martín Uranga, Amelia

    2013-01-01

    The patentability of human genes was from the beginning of the discussion concerning the Directive on the legal protection of biotechnological inventions, an issue that provoked debates among politicians, scientists, lawyers and civil society itself. Although Directive 98/44 tried to settle the matter by stating that to support the patentability of human genes, it should know what role they fulfill, which protein they encode, all of this as an essential requirement to test its industrial application. However, following the judgment of 13 June 2013 (Supreme Court of the United States of America in the case of Association for Molecular Pathology et al. versus Myriad Genetics Inc.) the debate on this issue has been reopened. There are several issues to be considered, taking into account that the patents on DNA & Gene Sequences have played an important incentive to increase the interest in biotechnology applied to human health. On the other hand, this is a paradigm shift in the R & D of biopharmaceutical companies, and it has moved from an in house research model to a model of open innovation, a model of collaboration between large corporations with biotech SMEs and public and private research centers. This model of innovation, impacts on the issue of the industrial property, and therefore it will be necessary to clearly define what each party brings to the relationship and how they are expected to share the results. But all of this, with the ultimate goal that the patients have access to treatments and medications most innovative, safe and effective.

  3. The DNA sequence and biology of human chromosome 19

    Energy Technology Data Exchange (ETDEWEB)

    Grimwood, Jane; Gordon, Laurie A.; Olsen, Anne; Terry, Astrid; Schmutz, Jeremy; Lamerdin, Jane; Hellsten, Uffe; Goodstein, David; Couronne, Olivier; Tran-Gyamfi, Mary; Aerts, Andrea; Altherr, Michael; Ashworth, Linda; Bajorek, Eva; Black, Stacey; Branscomb, Elbert; Caenepeel, Sean; Carrano, Anthony; Caoile, Chenier; Chan, Yee Man; Christensen, Mari; Cleland, Catherine A.; Copeland, Alex; Dalin, Eileen; Dehal, Paramvir; Denys, Mirian; Detter, John C.; Escobar, Julio; Flowers, Dave; Fotopulos, Dea; Garcia, Carmen; Georgescu, Anca M.; Glavina, Tijana; Gomez, Maria; Gonzales, Eldelyn; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Ho, Issac; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Larionov, Vladimer; Leem, Sun-Hee; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Malfatti, Stephanie; Martinez, Diego; McCready, Paula; Medina, Catherine; Morgan, Jenna; Nelson, Kathryn; Nolan, Matt; Ovcharenko, Ivan; Pitluck, Sam; Pollard, Martin; Popkie, Anthony P.; Predki, Paul; Quan, Glenda; Ramirez, Lucia; Rash, Sam; Retterer, James; Rodriguez, Alex; Rogers, Stephanine; Salamov, Asaf; Salazar, Angelica; She, Xinwei; Smith, Doug; Slezak, Tom; Solovyev, Victor; Thayer, Nina; Tice, Hope; Tsai, Ming; Ustaszewska, Anna; Vo, Nu; Wagner, Mark; Wheeler, Jeremy; Wu, Kevin; Xie, Gary; Yang, Joan; Dubchak, Inna; Furey, Terrence S.; DeJong, Pieter; Dickson, Mark; Gordon, David; Eichler, Evan E.; Pennacchio, Len A.; Richardson, Paul; Stubbs, Lisa; Rokhsar, Daniel S.; Myers, Richard M.; Rubin, Edward M.; Lucas, Susan M.

    2003-09-15

    Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high G1C content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9 percent of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in mendelian disorders, including familial hypercholesterolaemia and insulin-resistant diabetes. Nearly one-quarter of these genes belong to tandemly arranged families, encompassing more than 25 percent of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, a nd segments of coding and non-coding conservation with the distant fish species Takifugu.

  4. Genomic libraries: II. Subcloning, sequencing, and assembling large-insert genomic DNA clones.

    Science.gov (United States)

    Quail, Mike A; Matthews, Lucy; Sims, Sarah; Lloyd, Christine; Beasley, Helen; Baxter, Simon W

    2011-01-01

    Sequencing large insert clones to completion is useful for characterizing specific genomic regions, identifying haplotypes, and closing gaps in whole genome sequencing projects. Despite being a standard technique in molecular laboratories, DNA sequencing using the Sanger method can be highly problematic when complex secondary structures or sequence repeats are encountered in genomic clones. Here, we describe methods to isolate DNA from a large insert clone (fosmid or BAC), subclone the sample, and sequence the region to the highest industry standard. Troubleshooting solutions for sequencing difficult templates are discussed.

  5. DNA polymorphism among Fusarium oxysporum f.sp. elaeidis populations from oil palm, using a repeated and dispersed sequence "Palm".

    Science.gov (United States)

    Mouyna, I; Renard, J L; Brygoo, Y

    1996-07-31

    A worldwide collection, of 76 F. oxysporum f.sp. elaeidis isolates (Foe), and of 21 F. oxysporum isolates from the soil of several palm grove was analysed by RFLP. As a probe, we used a random DNA fragment (probe 46) from a genomic library of a Foe isolate. This probe contains two different types of sequence, one being repeated and dispersed in the genome "Palm", the other being a single-copy sequence. All F. oxysporum isolates from the palm-grove soils were non-pathogenic to oil palm. They all had a simple restriction pattern with one band homologous to the single-copy sequence of probe 46. All Foe isolates were pathogenic to oil palm and they all had complex patterns due to hybridization with "Palm". This repetitive sequence reveals that Foe isolates are distinct from the other F. oxysporum palm-grove soils isolates. The sequence can reliably discriminate pathogenic from non-pathogenic oil palm isolates. Based on DNA fingerprint similarities, Foe populations were divided into ten groups consisting of isolates with the same geographic origin. Isolates from Brazil and Ecuador were an exception to that rule as they had the same restriction pattern as a few isolates from the Ivory Coast, suggesting they may originated from Africa.

  6. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution

    NARCIS (Netherlands)

    Falconer, Ester; Hills, Mark; Naumann, Ulrike; Poon, Steven S. S.; Chavez, Elizabeth A.; Sanders, Ashley D.; Zhao, Yongjun; Hirst, Martin; Lansdorp, Peter M.

    2012-01-01

    DNA rearrangements such as sister chromatid exchanges (SCEs) are sensitive indicators of genomic stress and instability, but they are typically masked by single-cell sequencing techniques. We developed Strand-seq to independently sequence parental DNA template strands from single cells, making it po

  7. DNA sequence-dependent mechanics and protein-assisted bending in repressor-mediated loop formation

    Science.gov (United States)

    Boedicker, James Q.; Garcia, Hernan G.; Johnson, Stephanie; Phillips, Rob

    2013-12-01

    As the chief informational molecule of life, DNA is subject to extensive physical manipulations. The energy required to deform double-helical DNA depends on sequence, and this mechanical code of DNA influences gene regulation, such as through nucleosome positioning. Here we examine the sequence-dependent flexibility of DNA in bacterial transcription factor-mediated looping, a context for which the role of sequence remains poorly understood. Using a suite of synthetic constructs repressed by the Lac repressor and two well-known sequences that show large flexibility differences in vitro, we make precise statistical mechanical predictions as to how DNA sequence influences loop formation and test these predictions using in vivo transcription and in vitro single-molecule assays. Surprisingly, sequence-dependent flexibility does not affect in vivo gene regulation. By theoretically and experimentally quantifying the relative contributions of sequence and the DNA-bending protein HU to DNA mechanical properties, we reveal that bending by HU dominates DNA mechanics and masks intrinsic sequence-dependent flexibility. Such a quantitative understanding of how mechanical regulatory information is encoded in the genome will be a key step towards a predictive understanding of gene regulation at single-base pair resolution.

  8. Sequencing strategy of mitochondrial HV1 and HV2 DNA with length heteroplasmy

    DEFF Research Database (Denmark)

    Rasmussen, Erik Michael; Sørensen, E; Eriksen, Birthe

    2002-01-01

    We describe a method to obtain reliable mitochondrial DNA (mtDNA) sequences downstream of the homopolymeric stretches with length heteroplasmy in the sequencing direction. The method is based on the use of junction primers that bind to a part of the homopolymeric stretch and the first 2-4 bases...

  9. Molecular characterization and physical localization of highly repetitive DNA sequences from Brazilian Alstroemeria species

    NARCIS (Netherlands)

    Kuipers, A.G.J.; Kamstra, S.A.; Jeu, de M.J.; Jacobsen, E.

    2002-01-01

    Highly repetitive DNA sequences were isolated from genomic DNA libraries of Alstroemeria psittacina and A. inodora. Among the repetitive sequences that were isolated, tandem repeats as well as dispersed repeats could be discerned. The tandem repeats belonged to a family of interlinked Sau3A subfragm

  10. MSA-PAD: DNA multiple sequence alignment framework based on PFAM accessed domain information.

    Science.gov (United States)

    Balech, Bachir; Vicario, Saverio; Donvito, Giacinto; Monaco, Alfonso; Notarangelo, Pasquale; Pesole, Graziano

    2015-08-01

    Here we present the MSA-PAD application, a DNA multiple sequence alignment framework that uses PFAM protein domain information to align DNA sequences encoding either single or multiple protein domains. MSA-PAD has two alignment options: gene and genome mode.

  11. Therapeutic modulation of endogenous gene function by agents with designed DNA-sequence specificities

    NARCIS (Netherlands)

    Uil, T.G.; Haisma, H.J.; Rots, Marianne

    2003-01-01

    Designer molecules that can specifically target pre-determined DNA sequences provide a means to modulate endogenous gene function. Different classes of sequence-specific DNA-binding agents have been developed, including triplex-forming molecules, synthetic polyamides and designer zinc finger protein

  12. Methods for sequencing GC-rich and CCT repeat DNA templates

    Science.gov (United States)

    Robinson, Donna L.

    2007-02-20

    The present invention is directed to a PCR-based method of cycle sequencing DNA and other polynucleotide sequences having high CG content and regions of high GC content, and includes for example DNA strands with a high Cytosine and/or Guanosine content and repeated motifs such as CCT repeats.

  13. Mapping and Use of a Sequence that Targets DNA Ligase I to Sites of DNA Replication In Vivo

    OpenAIRE

    Cardoso, M. Cristina; Joseph, Cuthbert; Rahn, Hans-Peter; Reusch, Regina; Nadal-Ginard, Bernardo; Leonhardt, Heinrich

    1997-01-01

    The mammalian nucleus is highly organized, and nuclear processes such as DNA replication occur in discrete nuclear foci, a phenomenon often termed “functional organization” of the nucleus. We describe the identification and characterization of a bipartite targeting sequence (amino acids 1–28 and 111–179) that is necessary and sufficient to direct DNA ligase I to nuclear replication foci during S phase. This targeting sequence is located within the regulatory, NH2-terminal domain of the protei...

  14. ACME: A scalable parallel system for extracting frequent patterns from a very long sequence

    KAUST Repository

    Sahli, Majed

    2014-10-02

    Modern applications, including bioinformatics, time series, and web log analysis, require the extraction of frequent patterns, called motifs, from one very long (i.e., several gigabytes) sequence. Existing approaches are either heuristics that are error-prone, or exact (also called combinatorial) methods that are extremely slow, therefore, applicable only to very small sequences (i.e., in the order of megabytes). This paper presents ACME, a combinatorial approach that scales to gigabyte-long sequences and is the first to support supermaximal motifs. ACME is a versatile parallel system that can be deployed on desktop multi-core systems, or on thousands of CPUs in the cloud. However, merely using more compute nodes does not guarantee efficiency, because of the related overheads. To this end, ACME introduces an automatic tuning mechanism that suggests the appropriate number of CPUs to utilize, in order to meet the user constraints in terms of run time, while minimizing the financial cost of cloud resources. Our experiments show that, compared to the state of the art, ACME supports three orders of magnitude longer sequences (e.g., DNA for the entire human genome); handles large alphabets (e.g., English alphabet for Wikipedia); scales out to 16,384 CPUs on a supercomputer; and supports elastic deployment in the cloud.

  15. Structural biology of disease-associated repetitive DNA sequences and protein-DNA complexes involved in DNA damage and repair

    Energy Technology Data Exchange (ETDEWEB)

    Gupta, G.; Santhana Mariappan, S.V.; Chen, X.; Catasti, P.; Silks, L.A. III; Moyzis, R.K.; Bradbury, E.M.; Garcia, A.E.

    1997-07-01

    This project is aimed at formulating the sequence-structure-function correlations of various microsatellites in the human (and other eukaryotic) genomes. Here the authors have been able to develop and apply structure biology tools to understand the following: the molecular mechanism of length polymorphism microsatellites; the molecular mechanism by which the microsatellites in the noncoding regions alter the regulation of the associated gene; and finally, the molecular mechanism by which the expansion of these microsatellites impairs gene expression and causes the disease. Their multidisciplinary structural biology approach is quantitative and can be applied to all coding and noncoding DNA sequences associated with any gene. Both NIH and DOE are interested in developing quantitative tools for understanding the function of various human genes for prevention against diseases caused by genetic and environmental effects.

  16. Sequence-Dependent Fluorescence of Cy3- and Cy5-Labeled Double-Stranded DNA.

    Science.gov (United States)

    Kretschy, Nicole; Sack, Matej; Somoza, Mark M

    2016-03-16

    The fluorescent intensity of Cy3 and Cy5 dyes is strongly dependent on the nucleobase sequence of the labeled oligonucleotides. Sequence-dependent fluorescence may significantly influence the data obtained from many common experimental methods based on fluorescence detection of nucleic acids, such as sequencing, PCR, FRET, and FISH. To quantify sequence dependent fluorescence, we have measured the fluorescence intensity of Cy3 and Cy5 bound to the 5' end of all 1024 possible double-stranded DNA 5mers. The fluorescence intensity was also determined for these dyes bound to the 5' end of fixed-sequence double-stranded DNA with a variable sequence 3' overhang adjacent to the dye. The labeled DNA oligonucleotides were made using light-directed, in situ microarray synthesis. The results indicate that the fluorescence intensity of both dyes is sensitive to all five bases or base pairs, that the sequence dependence is stronger for double- (vs single-) stranded DNA, and that the dyes are sensitive to both the adjacent dsDNA sequence and the 3'-ssDNA overhang. Purine-rich sequences result in higher fluorescence. The results can be used to estimate measurement error in experiments with fluorescent-labeled DNA, as well as to optimize the fluorescent signal by considering the nucleobase environment of the labeling cyanine dye.

  17. Sequencing the hypervariable regions of human mitochondrial DNA using massively parallel sequencing: Enhanced data acquisition for DNA samples encountered in forensic testing.

    Science.gov (United States)

    Davis, Carey; Peters, Dixie; Warshauer, David; King, Jonathan; Budowle, Bruce

    2015-03-01

    Mitochondrial DNA testing is a useful tool in the analysis of forensic biological evidence. In cases where nuclear DNA is damaged or limited in quantity, the higher copy number of mitochondrial genomes available in a sample can provide information about the source of a sample. Currently, Sanger-type sequencing (STS) is the primary method to develop mitochondrial DNA profiles. This method is laborious and time consuming. Massively parallel sequencing (MPS) can increase the amount of information obtained from mitochondrial DNA samples while improving turnaround time by decreasing the numbers of manipulations and more so by exploiting high throughput analyses to obtain interpretable results. In this study 18 buccal swabs, three different tissue samples from five individuals, and four bones samples from casework were sequenced at hypervariable regions I and II using STS and MPS. Sample enrichment for STS and MPS was PCR-based. Library preparation for MPS was performed using Nextera® XT DNA Sample Preparation Kit and sequencing was performed on the MiSeq™ (Illumina, Inc.). MPS yielded full concordance of base calls with STS results, and the newer methodology was able to resolve length heteroplasmy in homopolymeric regions. This study demonstrates short amplicon MPS of mitochondrial DNA is feasible, can provide information not possible with STS, and lays the groundwork for development of a whole genome sequencing strategy for degraded samples.

  18. Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules.

    Science.gov (United States)

    Mayjonade, Baptiste; Gouzy, Jérôme; Donnadieu, Cécile; Pouilly, Nicolas; Marande, William; Callot, Caroline; Langlade, Nicolas; Muños, Stéphane

    2016-10-01

    De novo sequencing of complex genomes is one of the main challenges for researchers seeking high-quality reference sequences. Many de novo assemblies are based on short reads, producing fragmented genome sequences. Third-generation sequencing, with read lengths >10 kb, will improve the assembly of complex genomes, but these techniques require high-molecular-weight genomic DNA (gDNA), and gDNA extraction protocols used for obtaining smaller fragments for short-read sequencing are not suitable for this purpose. Methods of preparing gDNA for bacterial artificial chromosome (BAC) libraries could be adapted, but these approaches are time-consuming, and commercial kits for these methods are expensive. Here, we present a protocol for rapid, inexpensive extraction of high-molecular-weight gDNA from bacteria, plants, and animals. Our technique was validated using sunflower leaf samples, producing a mean read length of 12.6 kb and a maximum read length of 80 kb.

  19. Phylogenetic relationships within Pelargonium section Peristera (Geraniaceae) inferred from nrDNA and cpDNA sequence comparisons.

    NARCIS (Netherlands)

    Bakker, F.T.; Helbrugge, D.; Culham, A.; Gibby, M.

    1998-01-01

    Phylogenetic analysis of nrDNA ITS and tmL (UAA) 5' exon-tmF (GAA) chloroplast DNA sequences from 17 species of Pelargonium sect. Peristera, together with nine putative outgroups, suggests paraphyly for the section and a close relationship between the highly disjurmt South African and Australian spe

  20. Planktonic foraminifera-derived environmental DNA extracted from abyssal sediments preserves patterns of plankton macroecology

    Science.gov (United States)

    Morard, Raphaël; Lejzerowicz, Franck; Darling, Kate F.; Lecroq-Bennet, Béatrice; Winther Pedersen, Mikkel; Orlando, Ludovic; Pawlowski, Jan; Mulitza, Stefan; de Vargas, Colomban; Kucera, Michal

    2017-06-01

    Deep-sea sediments constitute a unique archive of ocean change, fueled by a permanent rain of mineral and organic remains from the surface ocean. Until now, paleo-ecological analyses of this archive have been mostly based on information from taxa leaving fossils. In theory, environmental DNA (eDNA) in the sediment has the potential to provide information on non-fossilized taxa, allowing more comprehensive interpretations of the fossil record. Yet, the process controlling the transport and deposition of eDNA onto the sediment and the extent to which it preserves the features of past oceanic biota remains unknown. Planktonic foraminifera are the ideal taxa to allow an assessment of the eDNA signal modification during deposition because their fossils are well preserved in the sediment and their morphological taxonomy is documented by DNA barcodes. Specifically, we re-analyze foraminiferal-specific metabarcodes from 31 deep-sea sediment samples, which were shown to contain a small fraction of sequences from planktonic foraminifera. We confirm that the largest portion of the metabarcode originates from benthic bottom-dwelling foraminifera, representing the in situ community, but a small portion (DNA is preserved in a range of marine sediment types, the composition of the recovered eDNA metabarcode is replicable and that both the similarity structure and the diversity pattern are preserved. Our results suggest that sedimentary eDNA could preserve the ecological structure of the entire pelagic community, including non-fossilized taxa, thus opening new avenues for paleoceanographic and paleoecological studies.

  1. High-Quality Exome Sequencing of Whole-Genome Amplified Neonatal Dried Blood Spot DNA

    DEFF Research Database (Denmark)

    Poulsen, Jesper Buchhave; Lescai, Francesco; Grove, Jakob;

    2016-01-01

    and WB vs WB sample types yielded similar concordance rates, with values close to 100%. WgaDNA of neonatal DBS samples performs with great accuracy and efficiency in exome sequencing. The wgaDNA performed similarly to matched high-quality reference--whole-blood DNA--based on concordance rates calculated...

  2. Carrier molecules and extraction of circulating tumor DNA for next generation sequencing in colorectal cancer.

    Science.gov (United States)

    Beránek, Martin; Sirák, Igor; Vošmik, Milan; Petera, Jiří; Drastíková, Monika; Palička, Vladimír

    The aims of the study were: i) to compare circulating tumor DNA (ctDNA) yields obtained by different manual extraction procedures, ii) to evaluate the addition of various carrier molecules into the plasma to improve ctDNA extraction recovery, and iii) to use next generation sequencing (NGS) technology to analyze KRAS, BRAF, and NRAS somatic mutations in ctDNA from patients with metastatic colorectal cancer. Venous blood was obtained from patients who suffered from metastatic colorectal carcinoma. For plasma ctDNA extraction, the following carriers were tested: carrier RNA, polyadenylic acid, glycogen, linear acrylamide, yeast tRNA, salmon sperm DNA, and herring sperm DNA. Each extract was characterized by quantitative real-time PCR and next generation sequencing. The addition of polyadenylic acid had a significant positive effect on the amount of ctDNA eluted. The sequencing data revealed five cases of ctDNA mutated in KRAS and one patient with a BRAF mutation. An agreement of 86% was found between tumor tissues and ctDNA. Testing somatic mutations in ctDNA seems to be a promising tool to monitor dynamically changing genotypes of tumor cells circulating in the body. The optimized process of ctDNA extraction should help to obtain more reliable sequencing data in patients with metastatic colorectal cancer.

  3. Raman-based system for DNA sequencing-mapping and other separations

    Science.gov (United States)

    Vo-Dinh, Tuan

    1994-01-01

    DNA sequencing and mapping are performed by using a Raman spectrometer with a surface enhanced Raman scattering (SERS) substrate to enhance the Raman signal. A SERS label is attached to a DNA fragment and then analyzed with the Raman spectrometer to identify the DNA fragment according to characteristics of the Raman spectrum generated.

  4. Effect of intercalator substituent and nucleotide sequence on the stability of DNA- and RNA-naphthalimide complexes.

    Science.gov (United States)

    Johnson, Charles A; Hudson, Graham A; Hardebeck, Laura K E; Jolley, Elizabeth A; Ren, Yi; Lewis, Michael; Znosko, Brent M

    2015-07-01

    DNA intercalators are commonly used as anti-cancer and anti-tumor agents. As a result, it is imperative to understand how changes in intercalator structure affect binding affinity to DNA. Amonafide and mitonafide, two naphthalimide derivatives that are active against HeLa and KB cells in vitro, were previously shown to intercalate into DNA. Here, a systematic study was undertaken to change the 3-substituent on the aromatic intercalator 1,8-naphthalimide to determine how 11 different functional groups with a variety of physical and electronic properties affect binding of the naphthalimide to DNA and RNA duplexes of different sequence compositions and lengths. Wavelength scans, NMR titrations, and circular dichroism were used to investigate the binding mode of 1,8-naphthalimide derivatives to short synthetic DNA. Optical melting experiments were used to measure the change in melting temperature of the DNA and RNA duplexes due to intercalation, which ranged from 0 to 19.4°C. Thermal stabilities were affected by changing the substituent, and several patterns and idiosyncrasies were identified. By systematically varying the 3-substituent, the binding strength of the same derivative to various DNA and RNA duplexes was compared. The binding strength of different derivatives to the same DNA and RNA sequences was also compared. The results of these comparisons shed light on the complexities of site specificity and binding strength in DNA-intercalator complexes. For example, the consequences of adding a 5'-TpG-3' or 5'-GpT-3' step to a duplex is dependent on the sequence composition of the duplex. When added to a poly-AT duplex, naphthalimide binding was enhanced by 5.6-11.5°C, but when added to a poly-GC duplex, naphthalimide binding was diminished by 3.2-6.9°C. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Computational optimisation of targeted DNA sequencing for cancer detection

    DEFF Research Database (Denmark)

    Martinez, Pierre; McGranahan, Nicholas; Birkbak, Nicolai Juul

    2013-01-01

    circulating tumour DNA (ctDNA) might represent a non-invasive method to detect mutations in patients, facilitating early detection. In this article, we define reduced gene panels from publicly available datasets as a first step to assess and optimise the potential of targeted ctDNA scans for early tumour...

  6. High Interlaboratory Reprocucibility of DNA Sequence-based Typing of Bacteria in a Multicenter Study

    DEFF Research Database (Denmark)

    Sousa, MA de; Boye, Kit; Lencastre, H de

    2006-01-01

    Current DNA amplification-based typing methods for bacterial pathogens often lack interlaboratory reproducibility. In this international study, DNA sequence-based typing of the Staphylococcus aureus protein A gene (spa, 110 to 422 bp) showed 100% intra- and interlaboratory reproducibility without...... extensive harmonization of protocols for 30 blind-coded S. aureus DNA samples sent to 10 laboratories. Specialized software for automated sequence analysis ensured a common typing nomenclature....

  7. True single-molecule DNA sequencing of a pleistocene horse bone

    DEFF Research Database (Denmark)

    Orlando, Ludovic Antoine Alexandre; Ginolhac, Aurélien; Raghavan, Maanasa;

    2011-01-01

    Second-generation sequencing platforms have revolutionized the field of ancient DNA, opening access to complete genomes of past individuals and extinct species. However, these platforms are dependent on library construction and amplification steps that may result in sequences that do not reflect......-preserved Pleistocene horse bone using the Helicos HeliScope and Illumina GAIIx platforms, respectively. We find that the percentage of endogenous DNA sequences derived from the horse is higher among the Helicos data than Illumina data. This result indicates that the molecular biology tools used to generate sequencing...... libraries of ancient DNA molecules as required for second-generation sequencing introduce biases into the data, that reduce the efficiency of the sequencing process and limit our ability to fully explore the molecular complexity of ancient DNA extracts. We demonstrate that simple modifications...

  8. cDNA sequencing improves the detection of P53 missense mutations in colorectal cancer

    Directory of Open Access Journals (Sweden)

    Jesionek-Kupnicka Dorota

    2009-08-01

    Full Text Available Abstract Background Recently published data showed discrepancies beteween P53 cDNA and DNA sequencing in glioblastomas. We hypothesised that similar discrepancies may be observed in other human cancers. Methods To this end, we analyzed 23 colorectal cancers for P53 mutations and gene expression using both DNA and cDNA sequencing, real-time PCR and immunohistochemistry. Results We found P53 gene mutations in 16 cases (15 missense and 1 nonsense. Two of the 15 cases with missense mutations showed alterations based only on cDNA, and not DNA sequencing. Moreover, in 6 of the 15 cases with a cDNA mutation those mutations were difficult to detect in the DNA sequencing, so the results of DNA analysis alone could be misinterpreted if the cDNA sequencing results had not also been available. In all those 15 cases, we observed a higher ratio of the mutated to the wild type template by cDNA analysis, but not by the DNA analysis. Interestingly, a similar overexpression of P53 mRNA was present in samples with and without P53 mutations. Conclusion In terms of colorectal cancer, those discrepancies might be explained under three conditions: 1, overexpression of mutated P53 mRNA in cancer cells as compared with normal cells; 2, a higher content of cells without P53 mutation (normal cells and cells showing K-RAS and/or APC but not P53 mutation in samples presenting P53 mutation; 3, heterozygous or hemizygous mutations of P53 gene. Additionally, for heterozygous mutations unknown mechanism(s causing selective overproduction of mutated allele should also be considered. Our data offer new clues for studying discrepancy in P53 cDNA and DNA sequencing analysis.

  9. Molecular characterization and phylogeny of whipworm nematodes inferred from DNA sequences of cox1 mtDNA and 18S rDNA.

    Science.gov (United States)

    Callejón, Rocío; Nadler, Steven; De Rojas, Manuel; Zurita, Antonio; Petrášová, Jana; Cutillas, Cristina

    2013-11-01

    A molecular phylogenetic hypothesis is presented for the genus Trichuris based on sequence data from the mitochondrial cytochrome c oxidase 1 (cox1) and ribosomal 18S genes. The taxa consisted of different described species and several host-associated isolates (undescribed taxa) of Trichuris collected from hosts from Spain. Sequence data from mitochondrial cox1 (partial gene) and nuclear 18S near-complete gene were analyzed by maximum likelihood and Bayesian inference methods, as separate and combined datasets, to evaluate phylogenetic relationships among taxa. Phylogenetic results based on 18S ribosomal DNA (rDNA) were robust for relationships among species; cox1 sequences delimited species and revealed phylogeographic variation, but most relationships among Trichuris species were poorly resolved by mitochondrial sequences. The phylogenetic hypotheses for both genes strongly supported monophyly of Trichuris, and distinct genetic lineages corresponding to described species or nematodes associated with certain hosts were recognized based on cox1 sequences. Phylogenetic reconstructions based on concatenated sequences of the two loci, cox1 (mitochondrial DNA (mtDNA)) and 18S rDNA, were congruent with the overall topology inferred from 18S and previously published results based on internal transcribed spacer sequences. Our results demonstrate that the 18S rDNA and cox1 mtDNA genes provide resolution at different levels, but together resolve relationships among geographic populations and species in the genus Trichuris.

  10. Dramatic reduction of sequence artefacts from DNA isolated from formalin-fixed cancer biopsies by treatment with uracil- DNA glycosylase.

    Science.gov (United States)

    Do, Hongdo; Dobrovic, Alexander

    2012-05-01

    Non-reproducible sequence artefacts are frequently detected in DNA from formalinfixed and paraffin-embedded (FFPE) tissues. However, no rational strategy has been developed for reduction of sequence artefacts from FFPE DNA as the underlying causes of the artefacts are poorly understood. As cytosine deamination to uracil is a common form of DNA damage in ancient DNA, we set out to examine whether treatment of FFPE DNA with uracil-DNA glycosylase (UDG) would lead to the reduction of C>T (and G>A) sequence artefacts. Heteroduplex formation in high resolution melting (HRM)-based assays was used for the detection of sequence variants in FFPE DNA samples. A set of samples that gave false positive HRM results for screening for the E17K mutation in exon 4 of the AKT1 gene were chosen for analysis. Sequencing of these samples showed multiple non-reproducible C:G>T:A artefacts. Treatment of the FFPE DNA with UDG prior to PCR amplification led to a very marked reduction of the sequence artefacts as indicated by both HRM and sequencing analysis, indicating that uracil lesions are the major cause of sequence artefacts. Similar results were shown for the BRAF V600 region in the same sample set and EGFR exon 19 in another sample set. UDG treatment specifically suppressed the formation of artefacts in FFPE DNA as it did not affect the detection of true KRAS codon 12 and true EGFR exon 19 and 20 mutations. We conclude that uracil in FFPE DNA leads to a significant proportion of sequence artefacts. These can be minimised by a simple UDG pretreatment which can be readily carried out, in the same tube, as the PCR immediately prior to commencing thermal cycling. HRM is a convenient way of monitoring both the degree of damage and the effectiveness of the UDG treatment. These findings have immediate and important implications for cancer diagnostics where FFPE DNA is used as the primary genetic material for mutational studies guiding personalised medicine strategies and where simple

  11. What determines water-bridge lifetimes at the surface of DNA? Insight from systematic molecular dynamics analysis of water kinetics for various DNA sequences.

    Science.gov (United States)

    Yonetani, Yoshiteru; Kono, Hidetoshi

    2012-01-01

    The lifetime during which a water molecule resides at the surface of a biomolecule varies according to the hydration site. What determines this variety of lifetimes? Despite many previous studies, there is still no uniform picture quantitatively explaining this phenomenon. Here we calculate the lifetime for a particular hydration pattern in the DNA minor groove, the water bridge, for various DNA sequences to show that the water-bridge lifetime varies from 1 to ~300ps in a sequence-dependent manner. We find that it follows 1/k(V(step))P(m), where P(m) and V(step) are two crucial factors, namely the probability of forming a specific hydrogen bond in which more than one donor atom participates, and the structural fluctuation of DNA, respectively. This relationship provides a picture of the water kinetics with atomistic detail and shows that water dissociation occurs when a particular hydrogen-bonding pattern appears. The rate constant of water dissociation k can be described as a function of the structural fluctuations of DNA. This picture is consistent with the model of Laage and Hynes proposing that hydrogen-bond switching occurs when an unusual number of hydrogen bonds are formed. The two new factors suggested here are discussed in the context of the surface's geometry and electrostatic nature, which were previously proposed as the determinants of water lifetimes.

  12. An improved chloroplast DNA extraction procedure for whole plastid genome sequencing.

    Directory of Open Access Journals (Sweden)

    Chao Shi

    Full Text Available BACKGROUND: Chloroplast genomes supply valuable genetic information for evolutionary and functional studies in plants. The past five years have witnessed a dramatic increase in the number of completely sequenced chloroplast genomes with the application of second-generation sequencing technology in plastid genome sequencing projects. However, cost-effective high-throughput chloroplast DNA (cpDNA extraction becomes a major bottleneck restricting the application, as conventional methods are difficult to make a balance between the quality and yield of cpDNAs. METHODOLOGY/PRINCIPAL FINDINGS: We first tested two traditional methods to isolate cpDNA from the three species, Oryza brachyantha, Leersia japonica and Prinsepia utihis. Both of them failed to obtain properly defined cpDNA bands. However, we developed a simple but efficient method based on sucrose gradients and found that the modified protocol worked efficiently to isolate the cpDNA from the same three plant species. We sequenced the isolated DNA samples with Illumina (Solexa sequencing technology to test cpDNA purity according to aligning sequence reads to the reference chloroplast genomes, showing that the reference genome was properly covered. We show that 40-50% cpDNA purity is achieved with our method. CONCLUSION: Here we provide an improved method used to isolate cpDNA from angiosperms. The Illumina sequencing results suggest that the isolated cpDNA has reached enough yield and sufficient purity to perform subsequent genome assembly. The cpDNA isolation protocol thus will be widely applicable to the plant chloroplast genome sequencing projects.

  13. DNA methylation patterns in cord blood DNA and body size in childhood.

    Directory of Open Access Journals (Sweden)

    Caroline L Relton

    Full Text Available Epigenetic markings acquired in early life may have phenotypic consequences later in development through their role in transcriptional regulation with relevance to the developmental origins of diseases including obesity. The goal of this study was to investigate whether DNA methylation levels at birth are associated with body size later in childhood.A study design involving two birth cohorts was used to conduct transcription profiling followed by DNA methylation analysis in peripheral blood. Gene expression analysis was undertaken in 24 individuals whose biological samples and clinical data were collected at a mean ± standard deviation (SD age of 12.35 (0.95 years, the upper and lower tertiles of body mass index (BMI were compared with a mean (SD BMI difference of 9.86 (2.37 kg/m(2. This generated a panel of differentially expressed genes for DNA methylation analysis which was then undertaken in cord blood DNA in 178 individuals with body composition data prospectively collected at a mean (SD age of 9.83 (0.23 years. Twenty-nine differentially expressed genes (>1.2-fold and p<10(-4 were analysed to determine DNA methylation levels at 1-3 sites per gene. Five genes were unmethylated and DNA methylation in the remaining 24 genes was analysed using linear regression with bootstrapping. Methylation in 9 of the 24 (37.5% genes studied was associated with at least one index of body composition (BMI, fat mass, lean mass, height at age 9 years, although only one of these associations remained after correction for multiple testing (ALPL with height, p(Corrected = 0.017.DNA methylation patterns in cord blood show some association with altered gene expression, body size and composition in childhood. The observed relationship is correlative and despite suggestion of a mechanistic epigenetic link between in utero life and later phenotype, further investigation is required to establish causality.

  14. Analysis of T-DNA/Host-Plant DNA Junction Sequences in Single-Copy Transgenic Barley Lines

    Directory of Open Access Journals (Sweden)

    Joanne G. Bartlett

    2014-01-01

    Full Text Available Sequencing across the junction between an integrated transfer DNA (T-DNA and a host plant genome provides two important pieces of information. The junctions themselves provide information regarding the proportion of T-DNA which has integrated into the host plant genome, whilst the transgene flanking sequences can be used to study the local genetic environment of the integrated transgene. In addition, this information is important in the safety assessment of GM crops and essential for GM traceability. In this study, a detailed analysis was carried out on the right-border T-DNA junction sequences of single-copy independent transgenic barley lines. T-DNA truncations at the right-border were found to be relatively common and affected 33.3% of the lines. In addition, 14.3% of lines had rearranged construct sequence after the right border break-point. An in depth analysis of the host-plant flanking sequences revealed that a significant proportion of the T-DNAs integrated into or close to known repetitive elements. However, this integration into repetitive DNA did not have a negative effect on transgene expression.

  15. Screening for K-Casein (CSN3 Gene Variation in Carpathian Goat Breed by Isoelectric focusing (IEF and DNA Sequencing

    Directory of Open Access Journals (Sweden)

    Adrian Valentin Balteanu

    2015-05-01

    Full Text Available In goats, k-casein (CSN3 locus is highly polymorphic with up to 16 allele currently characterized. They produce 13 protein variants (CSN3 that were classified in two groups (AIEF and BIEF, according to their isoelectric point. Isoelectric focusing (IEF of milk samples allows the detection of these two CSN3 groups, but for correct identification of CSN3 alleles DNA based genotyping methods are needed. Therefore the objective of this study was to identify the types of alleles occurring at the CSN3 locus in Carpathian goat breed by using a combined IEF and DNA sequencing approach. IEF analysis of milk samples collected from two Carpathian goat populations reared in Romania revealed two distinct CSN3 patterns. Amplification and sequencing of CSN3 cDNA obtained from these goats revealed four polymorphic sites located in the exon 4 that are responsible for amino acids substitutions, as compared with the reference sequence of A allele. By comparative analysis of IEF and cDNA sequencing data obtained from the two populations, we shown that AIEF alleles are represented by B allele, while BIEF alleles are represented by D allele. However, the variation of CSN3 locus in Carpathian goat breed could be more complex, therefore further studies are needed to characterize it.

  16. eDNA Barcoding: Using Next-Generation Sequencing of Environmental DNA for Detection and Identification of Cetacean Species

    Science.gov (United States)

    2015-09-30

    present. The unique amplicon sequences (haplotypes) can be matched back to the reference database or searched against GenBank for species identification...PCR primers designed from a comprehensive reference database of sequences from cetacean species. Figure 2: Deployment of a hydrophone and a...ubiquitous DNA sequencing for surveys of biodiversity more efficient and affordable in the near future. RELATED PROJECTS None to date.

  17. High-throughput sequencing of nematode communities from total soil DNA extractions

    DEFF Research Database (Denmark)

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-01-01

    nematodes without the need for enrichment was developed. Using this strategy on DNA templates from a set of 22 agricultural soils, we obtained 64.4% sequences of nematode origin in total, whereas the remaining sequences were almost entirely from other metazoans. The nematode sequences were derived from...

  18. Pattern analysis approach reveals restriction enzyme cutting abnormalities and other cDNA library construction artifacts using raw EST data

    Directory of Open Access Journals (Sweden)

    Zhou Sun

    2012-05-01

    Full Text Available Abstract Background Expressed Sequence Tag (EST sequences are widely used in applications such as genome annotation, gene discovery and gene expression studies. However, some of GenBank dbEST sequences have proven to be “unclean”. Identification of cDNA termini/ends and their structures in raw ESTs not only facilitates data quality control and accurate delineation of transcription ends, but also furthers our understanding of the potential sources of data abnormalities/errors present in the wet-lab procedures for cDNA library construction. Results After analyzing a total of 309,976 raw Pinus taeda ESTs, we uncovered many distinct variations of cDNA termini, some of which prove to be good indicators of wet-lab artifacts, and characterized each raw EST by its cDNA terminus structure patterns. In contrast to the expected patterns, many ESTs displayed complex and/or abnormal patterns that represent potential wet-lab errors such as: a failure of one or both of the restriction enzymes to cut the plasmid vector; a failure of the restriction enzymes to cut the vector at the correct positions; the insertion of two cDNA inserts into a single vector; the insertion of multiple and/or concatenated adapters/linkers; the presence of 3′-end terminal structures in designated 5′-end sequences or vice versa; and so on. With a close examination of these artifacts, many problematic ESTs that have been deposited into public databases by conventional bioinformatics pipelines or tools could be cleaned or filtered by our methodology. We developed a software tool for Abnormality Filtering and Sequence Trimming for ESTs (AFST, http://code.google.com/p/afst/ using a pattern analysis approach. To compare AFST with other pipelines that submitted ESTs into dbEST, we reprocessed 230,783 Pinus taeda and 38,709 Arachis hypogaea GenBank ESTs. We found 7.4% of Pinus taeda and 29.2% of Arachis hypogaea GenBank ESTs are “unclean” or abnormal, all of which could be cleaned

  19. Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison.

    Science.gov (United States)

    Hoang, Tung; Yin, Changchuan; Yau, Stephen S-T

    2016-10-01

    Numerical encoding plays an important role in DNA sequence analysis via computational methods, in which numerical values are associated with corresponding symbolic characters. After numerical representation, digital signal processing methods can be exploited to analyze DNA sequences. To reflect the biological properties of the original sequence, it is vital that the representation is one-to-one. Chaos Game Representation (CGR) is an iterative mapping technique that assigns each nucleotide in a DNA sequence to a respective position on the plane that allows the depiction of the DNA sequence in the form of image. Using CGR, a biological sequence can be transformed one-to-one to a numerical sequence that preserves the main features of the original sequence. In this research, we propose to encode DNA sequences by considering 2D CGR coordinates as complex numbers, and apply digital signal processing methods to analyze their evolutionary relationship. Computational experiments indicate that this approach gives comparable results to the state-of-the-art multiple sequence alignment method, Clustal Omega, and is significantly faster. The MATLAB code for our method can be accessed from: www.mathworks.com/matlabcentral/fileexchange/57152.

  20. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system.

    Science.gov (United States)

    Schloss, Patrick D; Jenior, Matthew L; Koumpouras, Charles C; Westcott, Sarah L; Highlander, Sarah K

    2016-01-01

    Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.

  1. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system

    Directory of Open Access Journals (Sweden)

    Patrick D. Schloss

    2016-03-01

    Full Text Available Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina’s MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3–V5, V1–V3, V1–V5, V1–V6, and V1–V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1–V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina’s MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.

  2. Transgenerational inheritance: Models and mechanisms of non-DNA sequence-based inheritance.

    Science.gov (United States)

    Miska, Eric A; Ferguson-Smith, Anne C

    2016-10-07

    Heritability has traditionally been thought to be a characteristic feature of the genetic material of an organism-notably, its DNA. However, it is now clear that inheritance not based on DNA sequence exists in multiple organisms, with examples found in microbes, plants, and invertebrate and vertebrate animals. In mammals, the molecular mechanisms have been challenging to elucidate, in part due to difficulties in designing robust models and approaches. Here we review some of the evidence, concepts, and potential mechanisms of non-DNA sequence-based transgenerational inheritance. We highlight model systems and discuss whether phenotypes are replicated or reconstructed over successive generations, as well as whether mechanisms operate at transcriptional and/or posttranscriptional levels. Finally, we explore the short- and long-term implications of non-DNA sequence-based inheritance. Understanding the effects of non-DNA sequence-based mechanisms is key to a full appreciation of heritability in health and disease.

  3. High-throughput sequencing for the identification of binding molecules from DNA-encoded chemical libraries.

    Science.gov (United States)

    Buller, Fabian; Steiner, Martina; Scheuermann, Jörg; Mannocci, Luca; Nissen, Ina; Kohler, Manuel; Beisel, Christian; Neri, Dario

    2010-07-15

    DNA-encoded chemical libraries are large collections of small organic molecules, individually coupled to DNA fragments that serve as amplifiable identification bar codes. The isolation of specific binders requires a quantitative analysis of the distribution of DNA fragments in the library before and after capture on an immobilized target protein of interest. Here, we show how Illumina sequencing can be applied to the analysis of DNA-encoded chemical libraries, yielding over 10 million DNA sequence tags per flow-lane. The technology can be used in a multiplex format, allowing the encoding and subsequent sequencing of multiple selections in the same experiment. The sequence distributions in DNA-encoded chemical library selections were found to be similar to the ones obtained using 454 technology, thus reinforcing the concept that DNA sequencing is an appropriate avenue for the decoding of library selections. The large number of sequences obtained with the Illumina method now enables the study of very large DNA-encoded chemical libraries (>500,000 compounds) and reduces decoding costs.

  4. DNA Data Visualization (DDV): Software for Generating Web-Based Interfaces Supporting Navigation and Analysis of DNA Sequence Data of Entire Genomes.

    Science.gov (United States)

    Neugebauer, Tomasz; Bordeleau, Eric; Burrus, Vincent; Brzezinski, Ryszard

    2015-01-01

    Data visualization methods are necessary during the exploration and analysis activities of an increasingly data-intensive scientific process. There are few existing visualization methods for raw nucleotide sequences of a whole genome or chromosome. Software for data visualization should allow the researchers to create accessible data visualization interfaces that can be exported and shared with others on the web. Herein, novel software developed for generating DNA data visualization interfaces is described. The software converts DNA data sets into images that are further processed as multi-scale images to be accessed through a web-based interface that supports zooming, panning and sequence fragment selection. Nucleotide composition frequencies and GC skew of a selected sequence segment can be obtained through the interface. The software was used to generate DNA data visualization of human and bacterial chromosomes. Examples of visually detectable features such as short and long direct repeats, long terminal repeats, mobile genetic elements, heterochromatic segments in microbial and human chromosomes, are presented. The software and its source code are available for download and further development. The visualization interfaces generated with the software allow for the immediate identification and observation of several types of sequence patterns in genomes of various sizes and origins. The visualization interfaces generated with the software are readily accessible through a web browser. This software is a useful research and teaching tool for genetics and structural genomics.

  5. HMGA1a recognition candidate DNA sequences in humans.

    Directory of Open Access Journals (Sweden)

    Takayuki Manabe

    Full Text Available High mobility group protein A1a (HMGA1a acts as an architectural transcription factor and influences a diverse array of normal biological processes. It binds AT-rich sequences, and previous reports have demonstrated HMGA1a binding to the authentic promoters of various genes. However, the precise sequences that HMGA1a binds to remain to be clarified. Therefore, in this study, we searched for the sequences with the highest affinity for human HMGA1a using an existing SELEX method, and then compared the identified sequences with known human promoter sequences. Based on our results, we propose the sequences "-(G/A-G-(A/T-(A/T-A-T-T-T-" as HMGA1a-binding candidate sequences. Furthermore, these candidate sequences bound native human HMGA1a from SK-N-SH cells. When candidate sequences were analyzed by performing FASTAs against all known human promoter sequences, 500-900 sequences were hit by each one. Some of the extracted genes have already been proven or suggested as HMGA1a-binding promoters. The candidate sequences presented here represent important information for research into the various roles of HMGA1a, including cell differentiation, death, growth, proliferation, and the pathogenesis of cancer.

  6. The DNA sequence, annotation and analysis of human chromosome 3

    DEFF Research Database (Denmark)

    Muzny, Donna M; Scherer, Steven E; Kaul, Rajinder

    2006-01-01

    After the completion of a draft human genome sequence, the International Human Genome Sequencing Consortium has proceeded to finish and annotate each of the 24 chromosomes comprising the human genome. Here we describe the sequencing and analysis of human chromosome 3, one of the largest human chr...

  7. The impact of DNA input amount and DNA source on the performance of whole-exome sequencing in cancer epidemiology.

    Science.gov (United States)

    Zhu, Qianqian; Hu, Qiang; Shepherd, Lori; Wang, Jianmin; Wei, Lei; Morrison, Carl D; Conroy, Jeffrey M; Glenn, Sean T; Davis, Warren; Kwan, Marilyn L; Ergas, Isaac J; Roh, Janise M; Kushi, Lawrence H; Ambrosone, Christine B; Liu, Song; Yao, Song

    2015-08-01

    Whole-exome sequencing (WES) has recently emerged as an appealing approach to systematically study coding variants. However, the requirement for a large amount of high-quality DNA poses a barrier that may limit its application in large cancer epidemiologic studies. We evaluated the performance of WES with low input amount and saliva DNA as an alternative source material. Five breast cancer patients were randomly selected from the Pathways Study. From each patient, four samples, including 3 μg, 1 μg, and 0.2 μg blood DNA and 1 μg saliva DNA, were aliquoted for library preparation using the Agilent SureSelect Kit and sequencing using Illumina HiSeq2500. Quality metrics of sequencing and variant calling, as well as concordance of variant calls from the whole exome and 21 known breast cancer genes, were assessed by input amount and DNA source. There was little difference by input amount or DNA source on the quality of sequencing and variant calling. The concordance rate was about 98% for single-nucleotide variant calls and 83% to 86% for short insertion/deletion calls. For the 21 known breast cancer genes, WES based on low input amount and saliva DNA identified the same set variants in samples from a same patient. Low DNA input amount, as well as saliva DNA, can be used to generate WES data of satisfactory quality. Our findings support the expansion of WES applications in cancer epidemiologic studies where only low DNA amount or saliva samples are available. ©2015 American Association for Cancer Research.

  8. Interference of Co-amplified nuclear mitochondrial DNA sequences on the determination of human mtDNA heteroplasmy by Using the SURVEYOR nuclease and the WAVE HS system.

    Science.gov (United States)

    Yen, Hsiu-Chuan; Li, Shiue-Li; Hsu, Wei-Chien; Tang, Petrus

    2014-01-01

    High-sensitivity and high-throughput mutation detection techniques are useful for screening the homoplasmy or heteroplasmy status of mitochondrial DNA (mtDNA), but might be susceptible to interference from nuclear mitochondrial DNA sequences (NUMTs) co-amplified during polymerase chain reaction (PCR). In this study, we first evaluated the platform of SURVEYOR Nuclease digestion of heteroduplexed DNA followed by the detection of cleaved DNA by using the WAVE HS System (SN/WAVE-HS) for detecting human mtDNA variants and found that its performance was slightly better than that of denaturing high-performance liquid chromatography (DHPLC). The potential interference from co-amplified NUMTs on screening mtDNA heteroplasmy when using these 2 highly sensitive techniques was further examined by using 2 published primer sets containing a total of 65 primer pairs, which were originally designed to be used with one of the 2 techniques. We confirmed that 24 primer pairs could amplify NUMTs by conducting bioinformatic analysis and PCR with the DNA from 143B-ρ0 cells. Using mtDNA extracted from the mitochondria of human 143B cells and a cybrid line with the nuclear background of 143B-ρ0 cells, we demonstrated that NUMTs could affect the patterns of chromatograms for cell DNA during SN-WAVE/HS analysis of mtDNA, leading to incorrect judgment of mtDNA homoplasmy or heteroplasmy status. However, we observed such interference only in 2 of 24 primer pairs selected, and did not observe such effects during DHPLC analysis. These results indicate that NUMTs can affect the screening of low-level mtDNA variants, but it might not be predicted by bioinformatic analysis or the amplification of DNA from 143B-ρ0 cells. Therefore, using purified mtDNA from cultured cells with proven purity to evaluate the effects of NUMTs from a primer pair on mtDNA detection by using PCR-based high-sensitivity methods prior to the use of a primer pair in real studies would be a more practical strategy.

  9. Interference of Co-amplified nuclear mitochondrial DNA sequences on the determination of human mtDNA heteroplasmy by Using the SURVEYOR nuclease and the WAVE HS system.

    Directory of Open Access Journals (Sweden)

    Hsiu-Chuan Yen

    Full Text Available High-sensitivity and high-throughput mutation detection techniques are useful for screening the homoplasmy or heteroplasmy status of mitochondrial DNA (mtDNA, but might be susceptible to interference from nuclear mitochondrial DNA sequences (NUMTs co-amplified during polymerase chain reaction (PCR. In this study, we first evaluated the platform of SURVEYOR Nuclease digestion of heteroduplexed DNA followed by the detection of cleaved DNA by using the WAVE HS System (SN/WAVE-HS for detecting human mtDNA variants and found that its performance was slightly better than that of denaturing high-performance liquid chromatography (DHPLC. The potential interference from co-amplified NUMTs on screening mtDNA heteroplasmy when using these 2 highly sensitive techniques was further examined by using 2 published primer sets containing a total of 65 primer pairs, which were originally designed to be used with one of the 2 techniques. We confirmed that 24 primer pairs could amplify NUMTs by conducting bioinformatic analysis and PCR with the DNA from 143B-ρ0 cells. Using mtDNA extracted from the mitochondria of human 143B cells and a cybrid line with the nuclear background of 143B-ρ0 cells, we demonstrated that NUMTs could affect the patterns of chromatograms for cell DNA during SN-WAVE/HS analysis of mtDNA, leading to incorrect judgment of mtDNA homoplasmy or heteroplasmy status. However, we observed such interference only in 2 of 24 primer pairs selected, and did not observe such effects during DHPLC analysis. These results indicate that NUMTs can affect the screening of low-level mtDNA variants, but it might not be predicted by bioinformatic analysis or the amplification of DNA from 143B-ρ0 cells. Therefore, using purified mtDNA from cultured cells with proven purity to evaluate the effects of NUMTs from a primer pair on mtDNA detection by using PCR-based high-sensitivity methods prior to the use of a primer pair in real studies would be a more practical

  10. Hierarchical-Multiplex DNA Patterns Mediated by Polymer Brush Nanocone Arrays That Possess Potential Application for Specific DNA Sensing.

    Science.gov (United States)

    Liu, Wendong; Liu, Xueyao; Ge, Peng; Fang, Liping; Xiang, Siyuan; Zhao, Xiaohuan; Shen, Huaizhong; Yang, Bai

    2015-11-11

    This paper provides a facile and cost-efficient method to prepare single-strand DNA (ssDNA) nanocone arrays and hierarchical DNA patterns that were mediated by poly(2-hydroxyethyl methacrylate) (PHEMA) brush. The PHEMA brush nanocone arrays with different morphology and period were fabricated via colloidal lithography. The hierarchical structure was prepared through the combination of colloidal lithography and traditional photolithography. The DNA patterns were easily achieved via grafting the amino group modified ssDNA onto the side chain of polymer brush, and the anchored DNA maintained their reactivity. The as-prepared ssDNA nanocone arrays can be applied for target DNA sensing with the detection limit reaching 1.65 nM. Besides, with the help of introducing microfluidic ideology, the hierarchical-multiplex DNA patterns on the same substrate could be easily achieved with each kind of pattern possessing one kind of ssDNA, which are promising surfaces for the preparation of rapid, visible, and multiplex DNA sensors.

  11. Mitochondrial DNA variant discovery and evaluation in human Cardiomyopathies through next-generation sequencing.

    Directory of Open Access Journals (Sweden)

    Michael V Zaragoza

    Full Text Available Mutations in mitochondrial DNA (mtDNA may cause maternally-inherited cardiomyopathy and heart failure. In homoplasmy all mtDNA copies contain the mutation. In heteroplasmy there is a mixture of normal and mutant copies of mtDNA. The clinical phenotype of an affected individual depends on the type of genetic defect and the ratios of mutant and normal mtDNA in affected tissues. We aimed at determining the sensitivity of next-generation sequencing compared to Sanger sequencing for mutation detection in patients with mitochondrial cardiomyopathy. We studied 18 patients with mitochondrial cardiomyopathy and two with suspected mitochondrial disease. We "shotgun" sequenced PCR-amplified mtDNA and multiplexed using a single run on Roche's 454 Genome Sequencer. By mapping to the reference sequence, we obtained 1,300x average coverage per case and identified high-confidence variants. By comparing these to >400 mtDNA substitution variants detected by Sanger, we found 98% concordance in variant detection. Simulation studies showed that >95% of the homoplasmic variants were detected at a minimum sequence coverage of 20x while heteroplasmic variants required >200x coverage. Several Sanger "misses" were detected by 454 sequencing. These included the novel heteroplasmic 7501T>C in tRNA serine 1 in a patient with sudden cardiac death. These results support a potential role of next-generation sequencing in the discovery of novel mtDNA variants with heteroplasmy below the level reliably detected with Sanger sequencing. We hope that this will assist in the identification of mtDNA mutations and key genetic determinants for cardiomyopathy and mitochondrial disease.

  12. Sequence-specific activation of the DNA sensor cGAS by Y-form DNA structures as found in primary HIV-1 cDNA.

    Science.gov (United States)

    Herzner, Anna-Maria; Hagmann, Cristina Amparo; Goldeck, Marion; Wolter, Steven; Kübler, Kirsten; Wittmann, Sabine; Gramberg, Thomas; Andreeva, Liudmila; Hopfner, Karl-Peter; Mertens, Christina; Zillinger, Thomas; Jin, Tengchuan; Xiao, Tsan Sam; Bartok, Eva; Coch, Christoph; Ackermann, Damian; Hornung, Veit; Ludwig, Janos; Barchet, Winfried; Hartmann, Gunther; Schlee, Martin

    2015-10-01

    Cytosolic DNA that emerges during infection with a retrovirus or DNA virus triggers antiviral type I interferon responses. So far, only double-stranded DNA (ds