WorldWideScience

Sample records for region dna sequence

  1. Correlation approach to identify coding regions in DNA sequences

    Science.gov (United States)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  2. Sequence analysis of mitochondrial DNA hypervariable region III of ...

    African Journals Online (AJOL)

    The aims of this research were to study mitochondrial DNA hypervariable region III and establish the degree of variation characteristic of a fragment. The mitochondrial DNA (mtDNA) is a small circular genome located within the mitochondria in the cytoplasm of the cell and a smaller 1.2 kb pair fragment, called the control ...

  3. A fast algorithm for exonic regions prediction in DNA sequences.

    Science.gov (United States)

    Saberkari, Hamidreza; Shamsi, Mousa; Heravi, Hamed; Sedaaghi, Mohammad Hossein

    2013-07-01

    The main purpose of this paper is to introduce a fast method for gene prediction in DNA sequences based on the period-3 property in exons. First, the symbolic DNA sequences were converted to digital signal using the electron ion interaction potential method. Then, to reduce the effect of background noise in the period-3 spectrum, we used the discrete wavelet transform at three levels and applied it on the input digital signal. Finally, the Goertzel algorithm was used to extract period-3 components in the filtered DNA sequence. The proposed algorithm leads to decrease the computational complexity and hence, increases the speed of the process. Detection of small size exons in DNA sequences, exactly, is another advantage of the algorithm. The proposed algorithm ability in exon prediction was compared with several existing methods at the nucleotide level using: (i) specificity - sensitivity values; (ii) receiver operating curves (ROC); and (iii) area under ROC curve. Simulation results confirmed that the proposed method can be used as a promising tool for exon prediction in DNA sequences.

  4. A new database of mitochondrial DNA hypervariable regions I and II sequences from 162 Japanese individuals.

    Science.gov (United States)

    Imaizumi, K; Parsons, T J; Yoshino, M; Holland, M M

    2002-04-01

    A database of mitochondrial DNA (mtDNA) hypervariable region 1 (HV1) and region 2 (HV2) sequences of the mtDNA control region was established from 162 unrelated Japanese individuals. The random match probability and the genetic diversity for this database were 0.96% and 0.997, respectively. Length heteroplasmy in the C-stretch regions located around position 16189 in HVI and 310 in HV2 was observed in 37% and 38% of the samples, respectively. A strategy using internal sequencing primers was devised to obtain confirmed sequences in these length heteroplasmic individuals. This database, combined with other mtDNA sequence databases from the Japanese population, will permit the significance of mtDNA match results to be properly reported in mtDNA typing casework in Japan.

  5. Sequence analysis of mitochondrial DNA hypervariable region III of ...

    African Journals Online (AJOL)

    Aghomotsegin

    2015-07-01

    Jul 1, 2015 ... degradation; third, higher rate of evolution: DNA alterations (mutations) occur in a number of ... The result is that the rate of change, or evolutionary rate, of mitochondrial DNA is about five times greater .... example mass graves in mass disasters, there are newly discovered forensically validated methods ...

  6. Tandemly repeated sequence in 5'end of mtDNA control region of ...

    African Journals Online (AJOL)

    Extensive length variability was observed in 5' end sequence of the mitochondrial DNA control region of the Japanese Spanish mackerel (Scomberomorus niphonius). This length variability was due to the presence of varying numbers of a 56-bp tandemly repeated sequence and a 46-bp insertion/deletion (indel).

  7. Targeted enrichment of genomic DNA regions for next generation sequencing

    NARCIS (Netherlands)

    Mertens, F.; El-Sharawy, A.; Sauer, S.; Van Helvoort, J.; Van der Zaag, P.J.; Franke, A.; Nilsson, M.; Lehrach. H.; Brookes, A.

    2011-01-01

    In this review we discuss the latest targeted enrichment methods, and aspects of their utilization along with second generation sequencing for complex genome analysis. In doing so we provide an overview of issues involved in detecting genetic variation, for which targeted enrichment has become a

  8. Investigation of length heteroplasmy in mitochondrial DNA control region by massively parallel sequencing.

    Science.gov (United States)

    Lin, Chun-Yen; Tsai, Li-Chin; Hsieh, Hsing-Mei; Huang, Chia-Hung; Yu, Yu-Jen; Tseng, Bill; Linacre, Adrian; Lee, James Chun-I

    2017-09-01

    Accurate sequencing of the control region of the mitochondrial genome is notoriously difficult due to the presence of polycytosine bases, termed C-tracts. The precise number of bases that constitute a C-tract and the bases beyond the poly cytosines may not be accurately defined when analyzing Sanger sequencing data separated by capillary electrophoresis. Massively parallel sequencing has the potential to resolve such poor definition and provides the opportunity to discover variants due to length heteroplasmy. In this study, the control region of mitochondrial genomes from 20 samples was sequenced using both standard Sanger methods with separation by capillary electrophoresis and also using massively parallel DNA sequencing technology. After comparison of the two sets of generated sequence, with the exception of the C-tracts where length heteroplasmy was observed, all sequences were concordant. Sequences of three segments 16184-16193, 303-315 and 568-573 with C-tracts in HVI, II and III can be clearly defined from the massively parallel sequencing data using the program SEQ Mapper. Multiple sequence variants were observed in the length of C-tracts longer than 7 bases. Our report illustrates the accurate designation of all the length variants leading to heteroplasmy in the control region of the mitochondrial genome that can be determined by SEQ Mapper based on data generated by massively parallel DNA sequencing. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Repetitive sequences in Eurasian lynx (Lynx lynx L.) mitochondrial DNA control region.

    Science.gov (United States)

    Sindičić, Magda; Gomerčić, Tomislav; Galov, Ana; Polanc, Primož; Huber, Duro; Slavica, Alen

    2012-06-01

    Mitochondrial DNA (mtDNA) control region (CR) of numerous species is known to include up to five different repetitive sequences (RS1-RS5) that are found at various locations, involving motifs of different length and extensive length heteroplasmy. Two repetitive sequences (RS2 and RS3) on opposite sides of mtDNA central conserved region have been described in domestic cat (Felis catus) and some other felid species. However, the presence of repetitive sequence RS3 has not been detected in Eurasian lynx (Lynx lynx) yet. We analyzed mtDNA CR of 35 Eurasian lynx (L. lynx L.) samples to characterize repetitive sequences and to compare them with those found in other felid species. We confirmed the presence of 80 base pairs (bp) repetitive sequence (RS2) at the 5' end of the Eurasian lynx mtDNA CR L strand and for the first time we described RS3 repetitive sequence at its 3' end, consisting of an array of tandem repeats five to ten bp long. We found that felid species share similar RS3 repetitive pattern and fundamental repeat motif TACAC.

  10. Correcting sequencing errors in DNA coding regions using a dynamic programming approach

    Energy Technology Data Exchange (ETDEWEB)

    Xu, Y.; Mural, R.J.; Uberbacher, E.C.

    1994-12-01

    This paper presents an algorithm for detecting and ``correcting`` sequencing errors that occur in DNA coding regions. The types of sequencing error addressed include insertions and deletions (indels) of DNA bases. The goal is to provide a capability which makes single-pass or low-redundancy sequence data more informative, reducing the need for high-redundancy sequencing for gene identification and characterization purposes. The algorithm detects sequencing errors by discovering changes in the statistically preferred reading frame within a putative coding region and then inserts a number of ``neutral`` bases at a perceived reading frame transition point to make the putative exon candidate frame consistent. The authors have implemented the algorithm as a front-end subsystem of the GRAIL DNA sequence analysis system to construct a version which is very error tolerant and also intend to use this as a testbed for further development of sequencing error-correction technology. On a test set consisting of 68 Human DNA sequences with 1% randomly generated indels in coding regions, the algorithm detected and corrected 76% of the indels. The average distance between the position of an indel and the predicted one was 9.4 bases. With this subsystem in place, GRAIL correctly predicted 89% of the coding messages with 10% false message on the ``corrected`` sequences, compared to 69% correctly predicted coding messages and 11% falsely predicted messages on the ``corrupted`` sequences using standard GRAIL II method. The method uses a dynamic programming algorithm, and runs in time and space linear to the size of the input sequence.

  11. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  12. An iternative algorithm for correcting sequencing errors in DNA coding regions

    Energy Technology Data Exchange (ETDEWEB)

    Xu, Ying; Mural, R.J.; Uberbacher, E.C.

    1995-12-31

    Insertion and deletion (indel) sequencing errors in DNA coding regions disrupt DNA-to-protein translation frames, and hence make most frame-sensitive coding recognition approaches fail. This paper extends the authors` previous work on indel detection and `correction` algorithms, and presents a more effective algorithm for localizing indels that appear in DNA coding regions and `correcting` the located indels by inserting or deleting DNA bases. The algorithm localizes indels by discovering changes of the preferred translation frames within presumed coding regions, and then `corrects` the indel errors to restore a consistent translation frame within each coding region. An iterative strategy is exploited to repeatedly localize and `correct` indel errors until no more indels can be found. Test results have shown that the algorithm can accurately locate the positions of indels. The technology presented here has proved to be very useful for single pass EST/cDNA or genomic sequences, and is also often beneficial for higher quality sequences from large genomic clones.

  13. Regions of the polytene chromosomes of Drosophila virilis carrying multiple dispersed p Dv 111 DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    Gubenko, I.S.; Evgen' ev, M.B.

    1986-09-01

    The cloned sequences of p Dv 111 DNA hybridized in situ with more than 170 regions of Drosophila virilis salivary gland chromosomes. Comparative autoradiography of in situ hybridization and the nature of pulse /sup 3/H-thymidine and /sup 3/H-deoxycytidine incorporation into the polytene chromosomes of D. virilis at the puparium formation stage showed that the hybridization sites of p Dv 111 are distributed not only in the heterochromatic regions but also in the euchromatic regions of the chromosomes that are not late replicating. Two distinct bands of hybridization of p Dv 111 /sup 3/H-DNA were observed in the region of the heat shock puff 20CD. The regions of the distal end of chromosome 2, in which breaks appeared during radiation-induced chromosomal rearrangements, hybridized with the p Dv 111 DNA.

  14. Sequencing the hypervariable regions of human mitochondrial DNA using massively parallel sequencing: Enhanced data acquisition for DNA samples encountered in forensic testing.

    Science.gov (United States)

    Davis, Carey; Peters, Dixie; Warshauer, David; King, Jonathan; Budowle, Bruce

    2015-03-01

    Mitochondrial DNA testing is a useful tool in the analysis of forensic biological evidence. In cases where nuclear DNA is damaged or limited in quantity, the higher copy number of mitochondrial genomes available in a sample can provide information about the source of a sample. Currently, Sanger-type sequencing (STS) is the primary method to develop mitochondrial DNA profiles. This method is laborious and time consuming. Massively parallel sequencing (MPS) can increase the amount of information obtained from mitochondrial DNA samples while improving turnaround time by decreasing the numbers of manipulations and more so by exploiting high throughput analyses to obtain interpretable results. In this study 18 buccal swabs, three different tissue samples from five individuals, and four bones samples from casework were sequenced at hypervariable regions I and II using STS and MPS. Sample enrichment for STS and MPS was PCR-based. Library preparation for MPS was performed using Nextera® XT DNA Sample Preparation Kit and sequencing was performed on the MiSeq™ (Illumina, Inc.). MPS yielded full concordance of base calls with STS results, and the newer methodology was able to resolve length heteroplasmy in homopolymeric regions. This study demonstrates short amplicon MPS of mitochondrial DNA is feasible, can provide information not possible with STS, and lays the groundwork for development of a whole genome sequencing strategy for degraded samples. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  15. High Sequence Variations in Mitochondrial DNA Control Region among Worldwide Populations of Flathead Mullet Mugil cephalus

    Directory of Open Access Journals (Sweden)

    Brian Wade Jamandre

    2014-01-01

    Full Text Available The sequence and structure of the complete mtDNA control region (CR of M. cephalus from African, Pacific, and Atlantic populations are presented in this study to assess its usefulness in phylogeographic studies of this species. The mtDNA CR sequence variations among M. cephalus populations largely exceeded intraspecific polymorphisms that are generally observed in other vertebrates. The length of CR sequence varied among M. cephalus populations due to the presence of indels and variable number of tandem repeats at the 3′ hypervariable domain. The high evolutionary rate of the CR in this species probably originated from these mutations. However, no excessive homoplasic mutations were noticed. Finally, the star shaped tree inferred from the CR polymorphism stresses a rapid radiation worldwide, in this species. The CR still appears as a good marker for phylogeographic investigations and additional worldwide samples are warranted to further investigate the genetic structure and evolution in M. cephalus.

  16. Identifications of captive and wild tilapia species existing in Hawaii by mitochondrial DNA control region sequence.

    Directory of Open Access Journals (Sweden)

    Liang Wu

    Full Text Available BACKGROUND: The tilapia family of the Cichlidae includes many fish species, which live in freshwater and saltwater environments. Several species, such as O. niloticus, O. aureus, and O. mossambicus, are excellent for aquaculture because these fish are easily reproduced and readily adapt to diverse environments. Historically, tilapia species, including O. mossambicus, S. melanotheron, and O. aureus, were introduced to Hawaii many decades ago, and the state of Hawaii uses the import permit policy to prevent O. niloticus from coming into the islands. However, hybrids produced from O. niloticus may already be present in the freshwater and marine environments of the islands. The purpose of this study was to identify tilapia species that exist in Hawaii using mitochondrial DNA analysis. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we analyzed 382 samples collected from 13 farm (captive and wild tilapia populations in Oahu and the Hawaii Islands. Comparison of intraspecies variation between the mitochondrial DNA control region (mtDNA CR and cytochrome c oxidase I (COI gene from five populations indicated that mtDNA CR had higher nucleotide diversity than COI. A phylogenetic tree of all sampled tilapia was generated using mtDNA CR sequences. The neighbor-joining tree analysis identified seven distinctive tilapia species: O. aureus, O. mossambicus, O. niloticus, S. melanotheron, O. urolepies, T. redalli, and a hybrid of O. massambicus and O. niloticus. Of all the populations examined, 10 populations consisting of O. aureus, O. mossambicus, O. urolepis, and O. niloticus from the farmed sites were relatively pure, whereas three wild populations showed some degree of introgression and hybridization. CONCLUSIONS/SIGNIFICANCE: This DNA-based tilapia species identification is the first report that confirmed tilapia species identities in the wild and captive populations in Hawaii. The DNA sequence comparisons of mtDNA CR appear to be a valid method for

  17. Polymorphic sequence in the ND3 region of Java endemic Ploceidae birds mitochondrial DNA

    Directory of Open Access Journals (Sweden)

    R. SUSANTI

    2011-04-01

    Full Text Available Susanti R (2011 Polymorphic sequence in the ND3 region of Java endemic Ploceidae birds mitochondrial DNA. Biodiversitas 12: 70-75. As part of biodiversity, Ploceidae bird family must be kept away from extinction and degradation of gene-diversity. This research was aimed to analyze ND3 gene from mitochondrial DNA of Java Island endemic of Ploceidae bird. Each species of Ploceidae birds family was identified based on their morphological character, then the blood sample was taken from the birds nail vein. DNA was isolated from blood using Dixit method. Fragment of ND3 gene was amplified using PCR method with specific primer pairs and sequenced using dideoxy termination method with ABI automatic sequencer. Multiple alignment of ND3 nucleotide sequences were analyzed using ClustalW of MEGA-3.1 program. Estimation of genetic distance and phylogenetic tree construction were analyzed with Neighbor-Joining method and calculation of distance matrix with Kimura 2 –parameter. The result of Java Island endemic of Ploceidae bird family exploration showed that Erythrura hyperythra and Lonchura ferruginosa can not be found anymore in nature, but the Lonchura malacca that are not actually Java island endemic was also found. Nucleotide sequence of mitochondrial ND3 gene of Ploceidae bird family showed a quite high polymorphism, with 122 substitutions from 334 nucleotides analyzed. Phylogenetic tree of nucleotide sequence of Ploceidae bird family formed 2 clusters. One cluster consisted of the Ploceus hypoxanthus, Ploceus philippinus, Ploceus manyar and Passer montanus, and the others species were included in the second cluster. ND3 gene sequence data from this Ploceidae family need to be analyzed further to see possible relationship with a particular phenotype.

  18. DNA Barcoding: Amplification and sequence analysis of rbcl and matK genome regions in three divergent plant species

    Directory of Open Access Journals (Sweden)

    Javed Iqbal Wattoo

    2016-11-01

    Full Text Available Background: DNA barcoding is a novel method of species identification based on nucleotide diversity of conserved sequences. The establishment and refining of plant DNA barcoding systems is more challenging due to high genetic diversity among different species. Therefore, targeting the conserved nuclear transcribed regions would be more reliable for plant scientists to reveal genetic diversity, species discrimination and phylogeny. Methods: In this study, we amplified and sequenced the chloroplast DNA regions (matk+rbcl of Solanum nigrum, Euphorbia helioscopia and Dalbergia sissoo to study the functional annotation, homology modeling and sequence analysis to allow a more efficient utilization of these sequences among different plant species. These three species represent three families; Solanaceae, Euphorbiaceae and Fabaceae respectively. Biological sequence homology and divergence of amplified sequences was studied using Basic Local Alignment Tool (BLAST. Results: Both primers (matk+rbcl showed good amplification in three species. The sequenced regions reveled conserved genome information for future identification of different medicinal plants belonging to these species. The amplified conserved barcodes revealed different levels of biological homology after sequence analysis. The results clearly showed that the use of these conserved DNA sequences as barcode primers would be an accurate way for species identification and discrimination. Conclusion: The amplification and sequencing of conserved genome regions identified a novel sequence of matK in native species of Solanum nigrum. The findings of the study would be applicable in medicinal industry to establish DNA based identification of different medicinal plant species to monitor adulteration.

  19. Genetic structure of Florida green turtle rookeries as indicated by mitochondrial DNA control region sequences

    Science.gov (United States)

    Shamblin, Brian M.; Bagley, Dean A.; Ehrhart, Llewellyn M.; Desjardin, Nicole A.; Martin, R. Erik; Hart, Kristen M.; Naro-Maciel, Eugenia; Rusenko, Kirt; Stiner, John C.; Sobel, Debra; Johnson, Chris; Wilmers, Thomas; Wright, Laura J.; Nairn, Campbell J.

    2014-01-01

    Green turtle (Chelonia mydas) nesting has increased dramatically in Florida over the past two decades, ranking the Florida nesting aggregation among the largest in the Greater Caribbean region. Individual beaches that comprise several hundred kilometers of Florida’s east coast and Keys support tens to thousands of nests annually. These beaches encompass natural to highly developed habitats, and the degree of demographic partitioning among rookeries was previously unresolved. We characterized the genetic structure of ten Florida rookeries from Cape Canaveral to the Dry Tortugas through analysis of 817 base pair mitochondrial DNA (mtDNA) control region sequences from 485 nesting turtles. Two common haplotypes, CM-A1.1 and CM-A3.1, accounted for 87 % of samples, and the haplotype frequencies were strongly partitioned by latitude along Florida’s Atlantic coast. Most genetic structure occurred between rookeries on either side of an apparent genetic break in the vicinity of the St. Lucie Inlet that separates Hutchinson Island and Jupiter Island, representing the finest scale at which mtDNA structure has been documented in marine turtle rookeries. Florida and Caribbean scale analyses of population structure support recognition of at least two management units: central eastern Florida and southern Florida. More thorough sampling and deeper sequencing are necessary to better characterize connectivity among Florida green turtle rookeries as well as between the Florida nesting aggregation and others in the Greater Caribbean region.

  20. The next generation of target capture technologies - large DNA fragment enrichment and sequencing determines regional genomic variation of high complexity.

    Science.gov (United States)

    Dapprich, Johannes; Ferriola, Deborah; Mackiewicz, Kate; Clark, Peter M; Rappaport, Eric; D'Arcy, Monica; Sasson, Ariella; Gai, Xiaowu; Schug, Jonathan; Kaestner, Klaus H; Monos, Dimitri

    2016-07-09

    The ability to capture and sequence large contiguous DNA fragments represents a significant advancement towards the comprehensive characterization of complex genomic regions. While emerging sequencing platforms are capable of producing several kilobases-long reads, the fragment sizes generated by current DNA target enrichment technologies remain a limiting factor, producing DNA fragments generally shorter than 1 kbp. The DNA enrichment methodology described herein, Region-Specific Extraction (RSE), produces DNA segments in excess of 20 kbp in length. Coupling this enrichment method to appropriate sequencing platforms will significantly enhance the ability to generate complete and accurate sequence characterization of any genomic region without the need for reference-based assembly. RSE is a long-range DNA target capture methodology that relies on the specific hybridization of short (20-25 base) oligonucleotide primers to selected sequence motifs within the DNA target region. These capture primers are then enzymatically extended on the 3'-end, incorporating biotinylated nucleotides into the DNA. Streptavidin-coated beads are subsequently used to pull-down the original, long DNA template molecules via the newly synthesized, biotinylated DNA that is bound to them. We demonstrate the accuracy, simplicity and utility of the RSE method by capturing and sequencing a 4 Mbp stretch of the major histocompatibility complex (MHC). Our results show an average depth of coverage of 164X for the entire MHC. This depth of coverage contributes significantly to a 99.94 % total coverage of the targeted region and to an accuracy that is over 99.99 %. RSE represents a cost-effective target enrichment method capable of producing sequencing templates in excess of 20 kbp in length. The utility of our method has been proven to generate superior coverage across the MHC as compared to other commercially available methodologies, with the added advantage of producing longer sequencing

  1. Sequence analysis of mtDNA COI barcode region revealed three haplotypes within Culex pipiens assemblage.

    Science.gov (United States)

    Koosha, Mona; Oshaghi, Mohammad Ali; Sedaghat, Mohammad Mehdi; Vatandoost, Hassan; Azari-Hamidian, Shahyad; Abai, Mohammad Reza; Hanafi-Bojd, Ahmad Ali; Mohtarami, Fatemeh

    2017-10-01

    Members of the Culex (Culex) pipiens assemblage are known vectors of deadly encephalitides, periodic filariasis, and West Nile virus throughout the world. However, members of this assemblage are morphologically indistinguishable or hard to distinguish and play distinct roles in transmission of the diseases. The current study aimed to provide further evidence on utility of the two most popular nuclear (ITS2-rDNA) and mitochondrial (COI barcode region) genetic markers to identify members of the assemblage. Culex pipiens assemblage specimens from different climate zones of Iran were collected and identified to species level based on morphological characteristics. Nucleotide sequences of the loci for the specimens plus available data in the GenBank were analyzed to find species specific genetic structures useful for diagnosis purposes. ITS2 region was highly divergent within species or populations suggesting lack of consistency as a reliable molecular marker. In contrast, sequence analysis of 710 bp of COI gene revealed three fixed haplotypes named here "C, T, H" within the assemblage which can be distinguished by HaeIII and AluI enzymes. There were a correlation between the haplotypes and the world climate regions, where the haplotypes H/T and C are present mainly in temperate and tropical regions of the world, respectively. In the New world, Australia, and Japan only haplotype H is found. In conjunction between tropical and temperate regions such Iran, China, and Turkey, a mix of C/H or C/H/T are present. Although, the haplotypes are not strictly species-specific, however, Cx. quinquefasciatus was mainly of haplotype C. Due to the lack of mating barrier and questionable taxonomic situation of the complex members, the mentioned haplotypes in combination with other morphological and molecular characters might be used to address the genetic structure of the studied populations. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Evolution of DNA sequencing.

    Science.gov (United States)

    Tipu, Hamid Nawaz; Shabbir, Ambreen

    2015-03-01

    Sanger and coworkers introduced DNA sequencing in 1970s for the first time. It principally relied on termination of growing nucleotide chain when a dideoxythymidine triphosphate (ddTTP) was inserted in it. Detection of terminated sequences was done radiographically on Polyacrylamide Gel Electrophoresis (PAGE). Improvements that have evolved over time in original Sanger sequencing include replacement of radiography with fluorescence, use of separate fluorescent markers for each nucleotide, use of capillary electrophoresis instead of polyacrylamide gel electrophoresis and then introduction of capillary array electrophoresis. However, this technique suffered from few inherent limitations like decreased sensitivity for low level mutant alleles, complexities in analyzing highly polymorphic regions like Major Histocompatibility Complex (MHC) and high DNA concentrations required. Several Next Generation Sequencing (NGS) technologies have been introduced by Roche, Illumina and other commercial manufacturers that tend to overcome Sanger sequencing limitations and have been reviewed. Introduction of NGS in clinical research and medical diagnostics is expected to change entire diagnostic approach. These include study of cancer variants, detection of minimal residual disease, exome sequencing, detection of Single Nucleotide Polymorphisms (SNPs) and their disease association, epigenetic regulation of gene expression and sequencing of microorganisms genome.

  3. Molecular phylogenetic analysis of Indonesia Solanaceae based on DNA sequences of internal transcribed spacer region

    Science.gov (United States)

    Hidayat, Topik; Priyandoko, Didik; Islami, Dina Karina; Wardiny, Putri Yunitha

    2016-02-01

    Solanaceae is one of largest family in Angiosperm group with highly diverse in morphological character. In Indonesia, this group of plant is very popular due to its usefulness as food, ornamental and medicinal plants. However, investigation on phylogenetic relationship among the member of this family in Indonesia remains less attention. The purpose of this study was to evaluate the phylogenetics relationship of the family especially distributed in Indonesia. DNA sequences of Internal Transcribed Spacer (ITS) region of 19 species of Solanaceae and three species of outgroup, which belongs to family Convolvulaceae, Apocynaceae, and Plantaginaceae, were isolated, amplified, and sequenced. Phylogenetic tree analysis based on parsimony method was conducted with using data derived from the ITS-1, 5.8S, and ITS-2, separately, and the combination of all. Results indicated that the phylogenetic tree derived from the combined data established better pattern of relationship than separate data. Thus, three major groups were revealed. Group 1 consists of tribe Datureae, Cestreae, and Petunieae, whereas group 2 is member of tribe Physaleae. Group 3 belongs to tribe Solaneae. The use of the ITS region as a molecular markers, in general, support the global Solanaceae relationship that has been previously reported.

  4. In search of coding and non-coding regions of DNA sequences based on balanced estimation of diffusion entropy.

    Science.gov (United States)

    Zhang, Jin; Zhang, Wenqing; Yang, Huijie

    2016-01-01

    Identification of coding regions in DNA sequences remains challenging. Various methods have been proposed, but these are limited by species-dependence and the need for adequate training sets. The elements in DNA coding regions are known to be distributed in a quasi-random way, while those in non-coding regions have typical similar structures. For short sequences, these statistical characteristics cannot be extracted correctly and cannot even be detected. This paper introduces a new way to solve the problem: balanced estimation of diffusion entropy (BEDE).

  5. The nucleotide sequence of the variable region in Trypanosoma brucei completes the sequence analysis of the maxicircle component of mitochondrial kinetoplast DNA

    NARCIS (Netherlands)

    Sloof, P.; de Haan, A.; Eier, W.; van Iersel, M.; Boel, E.; van Steeg, H.; Benne, R.

    1992-01-01

    The nucleotide sequence of two non-contiguous DNA fragments of 4.0 and 2.2 kb, respectively, of the kinetoplast maxicircle of Trypanosoma brucei brucei EATRO strain 427 has been determined, completing the sequence analysis of the so-called variable region (see also de Vries et al., 1988, Mol.

  6. DNA sequencing conference, 2

    Energy Technology Data Exchange (ETDEWEB)

    Cook-Deegan, R.M. [Georgetown Univ., Kennedy Inst. of Ethics, Washington, DC (United States); Venter, J.C. [National Inst. of Neurological Disorders and Strokes, Bethesda, MD (United States); Gilbert, W. [Harvard Univ., Cambridge, MA (United States); Mulligan, J. [Stanford Univ., CA (United States); Mansfield, B.K. [Oak Ridge National Lab., TN (United States)

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  7. Simple species identification of Trichinella isolates by amplification and sequencing of the 5S ribosomal DNA intergenic spacer region.

    Science.gov (United States)

    De Bruyne, Aymeric; Yera, Hélène; Le Guerhier, Franck; Boireau, Pascal; Dupouy-Camet, Jean

    2005-09-05

    We developed a PCR-based assay using a single primer pair to amplify the 5S ribosomal DNA intergenic spacer region to identify Trichinella isolates. In our method, amplified products are directly sequenced on both strands and compared to GenBank sequences. Using this method, we were able to identify Trichinella spiralis, T. britovi and T. nativa. This method permits rapid species identification of Trichinella isolates; however, further evaluation is required before recommending this approach for routine use.

  8. Sequence Analysis of the Ribosomal DNA Intergenic Spacer 1 Regions of Trichosporon Species

    Science.gov (United States)

    Sugita, Takashi; Nakajima, Masamitsu; Ikeda, Reiko; Matsushima, Toshiharu; Shinoda, Takako

    2002-01-01

    We determined the sequence of the intergenic spacer (IGS) 1 region, which is located between the 26S and 5S rRNA genes, in 25 species of the genus Trichosporon. IGS 1 sequences varied in length from 195 to 719 bp. Comparative sequence analysis suggested that the divergence of IGS 1 sequences has been greater than that of the internal transcribed spacer regions. We also identified five genotypes of T. asahii, which is a major causative agent of deep-seated trichosporonosis, based on the IGS 1 sequences of 43 strains. Most of the isolates that originated in Japan were of genotype 1, whereas the American isolates were of genotype 3 or 5. Our results suggest that analysis of IGS regions provides a powerful method to distinguish between phylogenetically closely related species and that a geographic substructure may exist among T. asahii clinical isolates. PMID:11980969

  9. DNA sequence of the lactose operon: the lacA gene and the transcriptional termination region.

    OpenAIRE

    Hediger, M A; Johnson, D F; Nierlich, D P; Zabin, I

    1985-01-01

    The lac operon of Escherichia coli spans approximately 5300 base pairs and includes the lacZ, lacY, and lacA genes in addition to the operator, promoter, and transcription termination regions. We report here the sequence of the lacA gene and the region distal to it, confirming the sequence of thiogalactoside transacetylase and completing the sequence of the lac operon. The lacA gene is characterized by use of rare codons, suggesting an origin from a plasmid, transposon, or virus gene. UUG is ...

  10. Automated DNA Sequencing System

    Energy Technology Data Exchange (ETDEWEB)

    Armstrong, G.A.; Ekkebus, C.P.; Hauser, L.J.; Kress, R.L.; Mural, R.J.

    1999-04-25

    Oak Ridge National Laboratory (ORNL) is developing a core DNA sequencing facility to support biological research endeavors at ORNL and to conduct basic sequencing automation research. This facility is novel because its development is based on existing standard biology laboratory equipment; thus, the development process is of interest to the many small laboratories trying to use automation to control costs and increase throughput. Before automation, biology Laboratory personnel purified DNA, completed cycle sequencing, and prepared 96-well sample plates with commercially available hardware designed specifically for each step in the process. Following purification and thermal cycling, an automated sequencing machine was used for the sequencing. A technician handled all movement of the 96-well sample plates between machines. To automate the process, ORNL is adding a CRS Robotics A- 465 arm, ABI 377 sequencing machine, automated centrifuge, automated refrigerator, and possibly an automated SpeedVac. The entire system will be integrated with one central controller that will direct each machine and the robot. The goal of this system is to completely automate the sequencing procedure from bacterial cell samples through ready-to-be-sequenced DNA and ultimately to completed sequence. The system will be flexible and will accommodate different chemistries than existing automated sequencing lines. The system will be expanded in the future to include colony picking and/or actual sequencing. This discrete event, DNA sequencing system will demonstrate that smaller sequencing labs can achieve cost-effective the laboratory grow.

  11. Large sequence divergence of mitochondrial DNA genotypes of the control region within populations of the African antelope, kob (Kobus kob)

    DEFF Research Database (Denmark)

    Birungi, J.; Arctander, Peter

    2000-01-01

    conservation genetics, control region, Kobus kob, mitochondrial DNA, population expansion, population structure......conservation genetics, control region, Kobus kob, mitochondrial DNA, population expansion, population structure...

  12. Single genetic stock of kawakawa Euthynnus affinis (Cantor, 1849) along the Indian coast inferred from sequence analyses of mitochondrial DNA D-loop region

    Digital Repository Service at National Institute of Oceanography (India)

    GirishKumar; Kunal, S.P.; Menezes, M.R.; Meena, R.

    , genetic variation was assessed using sequence analyses of Mitochondrial DNA (mtDNA) D-loop region. A 500 bp segment of D-loop region was sequenced in 400 samples collected from eight localities (Veraval (VE), Ratnagiri (RA), Kochi (KO), Kavaratti (KA...

  13. Gomphid DNA sequence data

    Data.gov (United States)

    U.S. Environmental Protection Agency — DNA sequence data for several genetic loci. This dataset is not publicly accessible because: It's already publicly available on GenBank. It can be accessed through...

  14. Statistical properties of DNA sequences

    Science.gov (United States)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  15. Minding the gap: Frequency of indels in mtDNA control region sequence data and influence on population genetic analyses

    Science.gov (United States)

    Pearce, J.M.

    2006-01-01

    Insertions and deletions (indels) result in sequences of various lengths when homologous gene regions are compared among individuals or species. Although indels are typically phylogenetically informative, occurrence and incorporation of these characters as gaps in intraspecific population genetic data sets are rarely discussed. Moreover, the impact of gaps on estimates of fixation indices, such as FST, has not been reviewed. Here, I summarize the occurrence and population genetic signal of indels among 60 published studies that involved alignments of multiple sequences from the mitochondrial DNA (mtDNA) control region of vertebrate taxa. Among 30 studies observing indels, an average of 12% of both variable and parsimony-informative sites were composed of these sites. There was no consistent trend between levels of population differentiation and the number of gap characters in a data block. Across all studies, the average influence on estimates of ??ST was small, explaining only an additional 1.8% of among population variance (range 0.0-8.0%). Studies most likely to observe an increase in ??ST with the inclusion of gap characters were those with control region DNA appears small, dependent upon total number of variable sites in the data block, and related to species-specific characteristics and the spatial distribution of mtDNA lineages that contain indels. ?? 2006 Blackwell Publishing Ltd.

  16. Identification and verification of hybridoma-derived monoclonal antibody variable region sequences using recombinant DNA technology and mass spectrometry.

    Science.gov (United States)

    Babrak, Lmar; McGarvey, Jeffery A; Stanker, Larry H; Hnasko, Robert

    2017-10-01

    Antibody engineering requires the identification of antigen binding domains or variable regions (VR) unique to each antibody. It is the VR that define the unique antigen binding properties and proper sequence identification is essential for functional evaluation and performance of recombinant antibodies (rAb). This determination can be achieved by sequence analysis of immunoglobulin (Ig) transcripts obtained from a monoclonal antibody (MAb) producing hybridoma and subsequent expression of a rAb. However the polyploidy nature of a hybridoma cell often results in the added expression of aberrant immunoglobulin-like transcripts or even production of anomalous antibodies which can confound production of rAb. An incorrect VR sequence will result in a non-functional rAb and de novo assembly of Ig primary structure without a sequence map is challenging. To address these problems, we have developed a methodology which combines: 1) selective PCR amplification of VR from both the heavy and light chain IgG from hybridoma, 2) molecular cloning and DNA sequence analysis and 3) tandem mass spectrometry (MS/MS) on enzyme digests obtained from the purified IgG. Peptide analysis proceeds by evaluating coverage of the predicted primary protein sequence provided by the initial DNA maps for the VR. This methodology serves to both identify and verify the primary structure of the MAb VR for production as rAb. Published by Elsevier Ltd.

  17. DNA sequence of the lactose operon: the lacA gene and the transcriptional termination region.

    Science.gov (United States)

    Hediger, M A; Johnson, D F; Nierlich, D P; Zabin, I

    1985-10-01

    The lac operon of Escherichia coli spans approximately 5300 base pairs and includes the lacZ, lacY, and lacA genes in addition to the operator, promoter, and transcription termination regions. We report here the sequence of the lacA gene and the region distal to it, confirming the sequence of thiogalactoside transacetylase and completing the sequence of the lac operon. The lacA gene is characterized by use of rare codons, suggesting an origin from a plasmid, transposon, or virus gene. UUG is the translation initiation codon. A preliminary examination of 3' end of the lac messenger in the region distal to the lacA gene indicates several endpoints. A predominant one is located at the 3' end of a G + C-rich hairpin structure, which may be involved in termination of transcription or in post-transcriptional processing. An open reading frame of 702 base pairs is present on the complementary strand downstream from lacA.

  18. Human DNA repair genes possess potential G-quadruplex sequences in their promoters and 5`-untranslated regions.

    Science.gov (United States)

    Fleming, Aaron M; Zhu, Judy; Ding, Yun; Visser, Joshua A; Zhu, Julia; Burrows, Cynthia J

    2018-01-10

    The cellular response to oxidative stress includes transcriptional changes, particularly for genes involved in DNA repair. Recently, our laboratory demonstrated that oxidation of 2`-deoxyguanosine (G) to 8-oxo-7,8-dihydro-2`-deoxyguanosine (OG) in G-rich potential G-quadruplex sequences (PQSs) in gene promoters impacts the level of gene expression up or down depending on the position of the PQS in the promoter. In the present report, bioinformatic analysis found that the 390 human DNA repair genes in the genome ontology initiative harbor 2,936 PQSs in their promoters and 5`-untranslated regions (5`-UTRs). The average density of PQSs in human DNA repair genes was found to be nearly twofold greater than the average density of PQSs in all coding and non-coding human genes (7.5 vs. 4.3 per gene). The distribution of the PQSs in the DNA repair genes on the non-transcribed (coding) vs. transcribed strands reflects that of PQSs in all human genes. Next, literature data were interrogated to select 30 PQSs to catalog their ability to adopt G-quadruplex (G4) folds in vitro using five different experimental tests. The G4 characterization experiments concluded that 26 of the 30 sequences could adopt G4 topologies in solution. Last, four PQSs were synthesized into the promoter of a luciferase plasmid and co-transfected with the G4-specific ligands pyridostatin, Phen-DC3, or BRACO-19 in human cells to determine whether the PQSs could adopt G4 folds. The cell studies identified changes in luciferase expression when the G4 ligands were present, and the magnitude of the expression changes dependent on the PQS and the coding vs. template strand on which the sequence resided. Our studies demonstrate PQSs exist at a high density in human DNA repair gene promoters and a subset of the identified sequences fold in vitro and in vivo.

  19. [Database establishment of the whole rDNA ITS region of Dendrobium species of "fengdou" and authentication by analysis of their sequences].

    Science.gov (United States)

    Ding, Xiao-yu; Wang, Zheng-tao; Xu, Hong; Xu, Luo-san; Zhou, Kai-ya

    2002-07-01

    To establish the whole rDNA ITS region sequence database of various Dendrobium species of "Fengdou" and to authenticate exactly the inspected species of "Fengdou". The rDNA ITS regions of various Dendrobium species of "Fengdou" were amplified and sequenced. The database of their rDNA ITS regions was established in order to authenticate the inspected species by means of the softwares of CLUSTRAL and MEGA which were used to analyze the rDNA ITS region. A database of the rDNA ITS sequences of 21 species of Dendrobium has been established. The notable and stable differences of the interspecies of the rDNA ITS regions have been demonstrated. The numbers of transitions and transversions among 21 species are 11-122. The variable sites are 341 while the informative sites are 195. The ITS sequence differences between the outgroup species (Pholidota yunnanensis) and species of "Fengdou" are obvious. The numbers of transitions and transversions are 131-161. The population differences of the rDNA ITS region of various species of "Fengdou" are very small (0-6). On the basis of the database of various Dendrobium species of "Fengdou" and two genetics software, the botanical origin of the inspected species of "Fengdou" has been authenticated successfully by sequencing the rDNA ITS regions.

  20. Phylogenetic relations of humans and African apes from DNA sequences in the Psi eta-globin region

    Energy Technology Data Exchange (ETDEWEB)

    Miyamoto, M.M.; Slightom, J.L.; Goodman, M.

    1987-10-16

    Sequences from the upstream and downstream flanking DNA regions of the Psi eta-globin locus in Pan troglodytes (common chimpanzee), Gorilla gorilla (gorilla), and Pongo pygmaeus (orangutan, the closest living relative to Homo, Pan, and Gorilla) provided further data for evaluating the phylogenetic relations of humans and African apes. These newly sequenced orthologs (an additional 4.9 kilobase pairs (kbp) for each species) were combined with published Psi eta-gene sequences and then compared to the same orthologous stretch (a continuous 7.1-kbp region) available for humans. Phylogenetic analysis of these nucleotide sequences by the parsimony method indicated (i) that human and chimpanzee are more closely related to each other than either is to gorilla and (ii) that the slowdown in the rate of sequence evolution evident in higher primates is especially pronounced in humans. These results indicate that features unique to African apes (but not to humans) are primitive and that even local molecular clocks should be applied with caution.

  1. Transposon facilitated DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Berg, D.E.; Berg, C.M.; Huang, H.V.

    1990-01-01

    The purpose of this research is to investigate and develop methods that exploit the power of bacterial transposable elements for large scale DNA sequencing: Our premise is that the use of transposons to put primer binding sites randomly in target DNAs should provide access to all portions of large DNA fragments, without the inefficiencies of methods involving random subcloning and attendant repetitive sequencing, or of sequential synthesis of many oligonucleotide primers that are used to match systematically along a DNA molecule. Two unrelated bacterial transposons, Tn5 and {gamma}{delta}, are being used because they have both proven useful for molecular analyses, and because they differ sufficiently in mechanism and specificity of transposition to merit parallel development.

  2. DNA sequences at a glance.

    Directory of Open Access Journals (Sweden)

    Armando J Pinho

    Full Text Available Data summarization and triage is one of the current top challenges in visual analytics. The goal is to let users visually inspect large data sets and examine or request data with particular characteristics. The need for summarization and visual analytics is also felt when dealing with digital representations of DNA sequences. Genomic data sets are growing rapidly, making their analysis increasingly more difficult, and raising the need for new, scalable tools. For example, being able to look at very large DNA sequences while immediately identifying potentially interesting regions would provide the biologist with a flexible exploratory and analytical tool. In this paper we present a new concept, the "information profile", which provides a quantitative measure of the local complexity of a DNA sequence, independently of the direction of processing. The computation of the information profiles is computationally tractable: we show that it can be done in time proportional to the length of the sequence. We also describe a tool to compute the information profiles of a given DNA sequence, and use the genome of the fission yeast Schizosaccharomyces pombe strain 972 h(- and five human chromosomes 22 for illustration. We show that information profiles are useful for detecting large-scale genomic regularities by visual inspection. Several discovery strategies are possible, including the standalone analysis of single sequences, the comparative analysis of sequences from individuals from the same species, and the comparative analysis of sequences from different organisms. The comparison scale can be varied, allowing the users to zoom-in on specific details, or obtain a broad overview of a long segment. Software applications have been made available for non-commercial use at http://bioinformatics.ua.pt/software/dna-at-glance.

  3. Molecular Identification of Isolated Fungi from Unopened Containers of Greek Yogurt by DNA Sequencing of Internal Transcribed Spacer Region

    Directory of Open Access Journals (Sweden)

    Irshad M. Sulaiman

    2014-06-01

    Full Text Available In our previous study, we described the development of an internal transcribed spacer (ITS1 sequencing method, and used this protocol in species-identification of isolated fungi collected from the manufacturing areas of a compounding company known to have caused the multistate fungal meningitis outbreak in the United States. In this follow-up study, we have analyzed the unopened vials of Greek yogurt from the recalled batch to determine the possible cause of microbial contamination in the product. A total of 15 unopened vials of Greek yogurt belonging to the recalled batch were examined for the detection of fungi in these samples known to cause foodborne illness following conventional microbiological protocols. Fungi were isolated from all of the 15 Greek yogurt samples analyzed. The isolated fungi were genetically typed by DNA sequencing of PCR-amplified ITS1 region of rRNA gene. Analysis of data confirmed all of the isolated fungal isolates from the Greek yogurt to be Rhizomucor variabilis. The generated ITS1 sequences matched 100% with the published sequences available in GenBank. In addition, these yogurt samples were also tested for the presence of five types of bacteria (Salmonella, Listeria, Staphylococcus, Bacillus and Escherichia coli causing foodborne disease in humans, and found negative for all of them.

  4. Phylogeny and Biogeography of the Genus Ainsliaea (Asteraceae) in the Sino-Japanese Region based on Nuclear rDNA and Plastid DNA Sequence Data

    Science.gov (United States)

    Mitsui, Yuki; Chen, Shao-Tien; Zhou, Zhe-Kun; Peng, Ching-I.; Deng, Yun-Fei; Setoguchi, Hiroaki

    2008-01-01

    Background and Aims The flora of the Sino-Japanese plant region of eastern Asia is distinctively rich compared with other floristic regions in the world. However, knowledge of its floristic evolution is fairly limited. The genus Ainsliaea is endemic to and distributed throughout the Sino-Japanese region. Its interspecific phylogenetic relationships have not been resolved. The aim is to provide insight into floristic evolution in eastern Asia on the basis of a molecular phylogenetic analysis of Ainsliaea species. Methods Cladistic analyses of the sequences of two nuclear (ITS, ETS) and one plastid (ndhF) regions were carried out individually and using the combined data from the three markers. Key Results Phylogenetic analyses of three DNA regions confirmed that Ainsliaea is composed of three major clades that correspond to species distributions. Evolution of the three lineages was estimated to have occurred around 1·1 MYA during the early Pleistocene. Conclusions The results suggest that Ainsliaea species evolved allopatrically and that the descendants were isolated in the eastern (between SE China and Japan, through Taiwan and the Ryukyu Islands) and western (Yunnan Province and its surrounding areas, including the Himalayas, the temperate region of Southeast Asia, and Sichuan Province) sides of the Sino-Japanese region. The results suggest that two distinct lineages of Ainsliaea have independently evolved in environmentally heterogeneous regions within the Sino-Japanese region. These regions have maintained rich and original floras due to their diverse climates and topographies. PMID:17981878

  5. Rice pseudomolecule-anchored cross-species DNA sequence alignments indicate regional genomic variation in expressed sequence conservation

    Directory of Open Access Journals (Sweden)

    Thomas Howard

    2007-08-01

    Full Text Available Abstract Background Various methods have been developed to explore inter-genomic relationships among plant species. Here, we present a sequence similarity analysis based upon comparison of transcript-assembly and methylation-filtered databases from five plant species and physically anchored rice coding sequences. Results A comparison of the frequency of sequence alignments, determined by MegaBLAST, between rice coding sequences in TIGR pseudomolecules and annotations vs 4.0 and comprehensive transcript-assembly and methylation-filtered databases from Lolium perenne (ryegrass, Zea mays (maize, Hordeum vulgare (barley, Glycine max (soybean and Arabidopsis thaliana (thale cress was undertaken. Each rice pseudomolecule was divided into 10 segments, each containing 10% of the functionally annotated, expressed genes. This indicated a correlation between relative segment position in the rice genome and numbers of alignments with all the queried monocot and dicot plant databases. Colour-coded moving windows of 100 functionally annotated, expressed genes along each pseudomolecule were used to generate 'heat-maps'. These revealed consistent intra- and inter-pseudomolecule variation in the relative concentrations of significant alignments with the tested plant databases. Analysis of the annotations and derived putative expression patterns of rice genes from 'hot-spots' and 'cold-spots' within the heat maps indicated possible functional differences. A similar comparison relating to ancestral duplications of the rice genome indicated that duplications were often associated with 'hot-spots'. Conclusion Physical positions of expressed genes in the rice genome are correlated with the degree of conservation of similar sequences in the transcriptomes of other plant species. This relative conservation is associated with the distribution of different sized gene families and segmentally duplicated loci and may have functional and evolutionary implications.

  6. Plant DNA sequencing for phylogenetic analyses: from plants to sequences.

    Science.gov (United States)

    Neves, Susana S; Forrest, Laura L

    2011-01-01

    DNA sequences are important sources of data for phylogenetic analysis. Nowadays, DNA sequencing is a routine technique in molecular biology laboratories. However, there are specific questions associated with project design and sequencing of plant samples for phylogenetic analysis, which may not be familiar to researchers starting in the field. This chapter gives an overview of methods and protocols involved in the sequencing of plant samples, including general recommendations on the selection of species/taxa and DNA regions to be sequenced, and field collection of plant samples. Protocols of plant sample preparation, DNA extraction, PCR and cloning, which are critical to the success of molecular phylogenetic projects, are described in detail. Common problems of sequencing (using the Sanger method) are also addressed. Possible applications of second-generation sequencing techniques in plant phylogenetics are briefly discussed. Finally, orientation on the preparation of sequence data for phylogenetic analyses and submission to public databases is also given.

  7. Repeated DNA sequences in fungi

    Energy Technology Data Exchange (ETDEWEB)

    Dutta, S.K.

    1974-11-01

    Several fungal species, representatives of all broad groups like basidiomycetes, ascomycetes and phycomycetes, were examined for the nature of repeated DNA sequences by DNA:DNA reassociation studies using hydroxyapatite chromatography. All of the fungal species tested contained 10 to 20 percent repeated DNA sequences. There are approximately 100 to 110 copies of repeated DNA sequences of approximately 4 x 10/sup 7/ daltons piece size of each. Repeated DNA sequence homoduplexes showed on average 5/sup 0/C difference of T/sub e/50 (temperature at which 50 percent duplexes dissociate) values from the corresponding homoduplexes of unfractionated whole DNA. It is suggested that a part of repetitive sequences in fungi constitutes mitochondrial DNA and a part of it constitutes nuclear DNA. (auth)

  8. [Discrimination of psychoactive fungi (commonly called "magic mushrooms") based on the DNA sequence of the internal transcribed spacer region].

    Science.gov (United States)

    Maruyama, Takuro; Shirota, Osamu; Kawahara, Nobuo; Yokoyama, Kazumasa; Makino, Yukiko; Goda, Yukihiro

    2003-02-01

    'Magic mushrooms' (MMs) are psychoactive fungi containing the hallucinogenic compounds, psilocin (1) and psilocybin (2). Since June 6, 2002, these fungi have been regulated by the Narcotics and Psychotropics Control Law in Japan. Because there are many kinds of MMs and they are sold even as dry powders in local markets, it is very difficult to identify the original species of the MMs by morphological observation. Therefore, we investigated the internal transcribed spacer (ITS) region in the ribosomal RNA gene of MMs obtained in Japanese markets to classify them by a genetic approach. Based on the size and nucleotide sequence of the ITS region amplified by PCR, tested MMs were classified into 6 groups. Furthermore, a comparison of the DNA sequences of the MMs with those of authentic samples or with those found in the databases (GenBank, EMBL and DDBJ) made it possible to identify the species of tested MMs. Analysis by LC revealed that psilocin (1) was contained at the highest level in Panaeolus cyanescens among the MMs, but was absent in the Amanita species.

  9. Entropic fluctuations in DNA sequences

    Science.gov (United States)

    Thanos, Dimitrios; Li, Wentian; Provata, Astero

    2018-03-01

    The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.

  10. Information Theory of DNA Sequencing

    CERN Document Server

    Motahari, Abolfazl; Tse, David

    2012-01-01

    DNA sequencing is the basic workhorse of modern day biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original sequence. By drawing an analogy between the DNA sequencing problem and the classic communication problem, we define an information theoretic notion of sequencing capacity. This is the maximum number of DNA base pairs that can be resolved reliably per read, and provides a fundamental limit to the performance that can be achieved by any assembly algorithm. We compute the sequencing capacity explicitly for a simple statistical model of the DNA sequence and the read process. Using this framework, we also study the impact of noise in the read process on the sequencing capacity.

  11. Characterization of human control region sequences of the African American SWGDAM forensic mtDNA data set.

    Science.gov (United States)

    Allard, Marc W; Polanskey, Deborah; Miller, Kevin; Wilson, Mark R; Monson, Keith L; Budowle, Bruce

    2005-03-10

    The scientific working group on DNA analysis Methods (SWGDAM) mitochondrial DNA (mtDNA) population data set is used to infer the relative rarity of control region mtDNA profiles obtained from evidence samples and of profiles used for identification of missing persons. In this study, the African American haplogroup patterns in the SWGDAM data were analyzed in a phylogenetic context to determine relevant single nucleotide polymorphisms (SNPs) and to describe haplogroup distributions for Africans observed in these data sets. Over 200 SNPs (n=217) were observed in the African American data set (n=1148). These SNPs ranged from having 1-39 changes in the phylogenetic tree, with sites 152 and 16519 being the most variable. On average there were 5.8 changes for a character on the tree. The most variable sites (with 19 or more changes each) observed included 16093, 16129, 16189, 16311, 16362, 16519, 146, 150, 152, 189, and 195. These rapidly changing sites are consistent with other published analyses. Only 34 SNPs are needed to identify all clusters containing 10 or more individuals in the African American data set. The results show that the African American SWGDAM mtDNA data set contains variation consistent with that described in continental African populations. Thirteen of the 18 haplogroups previously observed in African populations were observed and include: L1a, L1b, L1c, L2a, L2b, L2c, L3b, L3d, L3e1, L3e2, L3e3, L3e4 and L3f. Haplogroup L2a is the most commonly observed cluster (18.8%) in the African American data set. The next most common haplogroups in the African American data set include the clusters L1c (11.0%), L1b (9.1%), L3e2 (9.0%) and L3b (8.1%). Approximately 8% of the haplogroups observed within African Americans were common in European Caucasians or East Asians; these were H (n=32), J (n=4), K (n=5), T (n=2), U5 (n=6), U6 (n=9 also known from North Africa), A (n=12), B (n=7), C (n=4), and M (n=16), respectively. The European Caucasian and East Asian

  12. Sanger dideoxy sequencing of DNA.

    Science.gov (United States)

    Walker, Sarah E; Lorsch, Jon

    2013-01-01

    While the ease and reduced cost of automated DNA sequencing has largely obviated the need for manual dideoxy sequencing for routine purposes, specific applications require manual DNA sequencing. For instance, in studies of enzymes or proteins that bind or modify DNA, a DNA ladder is often used to map the site at which an enzyme is bound or a modification occurs. In these cases, the Sanger method for dideoxy sequencing provides a rapid and facile method for producing a labeled DNA ladder. Copyright © 2013 Elsevier Inc. All rights reserved.

  13. cDNA sequence, genomic organization, and evolutionary conservation of a novel gene from the WAGR region

    Energy Technology Data Exchange (ETDEWEB)

    Schwartz, F.; Eisenman, R.; Knoll, J.; Bruns, G. [Children`s Hospital and Department of Pediatrics, Boston, MA (United States)

    1995-09-20

    A new gene (239FB) with predominant and differential expression in fetal brain has recently been isolated from a chromosome 11p13-p14 boundary area near FSHB. The corresponding mRNA has an open reading frame of 294 amino acids, a 3` untranslated region of 1247 nucleotides, and a highly GC-rich 5` untranslated region. The coding and 3` UT sequence is specified by 6 exons within nearly 87 kb of isolated genomic locus. The 5` end region of the transcript maps adjacent to the only genomically defined CpG island in a chromosomal subregion that may be associated with part of the mental retardation of some WAGR (Wilms tumor, aniridia, genitourinary anomalies, and mental retardation) syndrome patients. In addition to nucleotide and amino acid similarity to an EST from a normalized infant brain cDNA library, the predicted protein has extensive similarity to Caenorhbditis elegans polypeptides of, as yet, unknown function. The 239FB locus is, therefore, likely part of a family of genes with two members expressed in human brain. The extensive conservation of the predicted protein suggests a fundamental function of the gene product and will enable evaluation of the role of the 239FB gene in neurogenesis in model organisms. 48 refs., 4 figs., 1 tab.

  14. Geographic structure and demographic history of Iranian brown bear (Ursus arctos based on mtDNA control region sequences

    Directory of Open Access Journals (Sweden)

    Mohammad Reza Ashrafzadeh

    2015-12-01

    Full Text Available In recent years, the brown bear's range has declined and its populations in some areas have faced extinction. Therefore, to have a comprehensive picture of genetic diversity and geographic structure of populations is essential for effective conservation strategies. In this research, we sequenced a 271bp segment of mtDNA control region of seven Iranian brown bears, where a total dataset of 467 sequences (brown and polar bears were used in analyses. Overall, 113 different haplotypes and 77 polymorphic sites were identified within the segment. Based on phylogenetic analyses, Iranian brown bears were not nested in any other clades. The low values of Nm (range=0.014-0.187 and high values of Fst (range=0.728-0.972 among Iranian bears and others revealed a genetically significant differentiation. We aren't found any significant signal of demographic reduction in Iranian bears. The time to the most recent common ancestor of Iranian brown bears (Northern Iran was found to be around 19000 BP.

  15. GMDH-GA Hybrid Model Extracting Exon Region from DNA Sequences

    OpenAIRE

    Ohta, Kouji; Yoshihara, Ikuo; Yamamori, Kunihito; Yasunaga, Moritoshi

    2004-01-01

    Abstract ###A model building method based on Group Method of Data Handling (GMDH) optimized by ###GA is developed for extracting exon regions. GMDH, that is originally a method to construct ###higher order polynomial model, is extended to constructing higher order logical model. ###The model built by proposed method is compared with Genetic Programming (GP)-based ###model as to the extraction rate of best, worst and average. The proposed method is superior to GP ###as to extraction rate of al...

  16. The Dynamics of DNA Sequencing.

    Science.gov (United States)

    Morvillo, Nancy

    1997-01-01

    Describes a paper-and-pencil activity that helps students understand DNA sequencing and expands student understanding of DNA structure, replication, and gel electrophoresis. Appropriate for advanced biology students who are familiar with the Sanger method. (DDR)

  17. Biosensors for DNA sequence detection

    Science.gov (United States)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  18. Phylogenetic relationships in Solanaceae and related species based on cpDNA sequence from plastid trnE-trnT region

    Directory of Open Access Journals (Sweden)

    Danila Montewka Melotto-Passarin

    2008-01-01

    Full Text Available Intergenic spacers of chloroplast DNA (cpDNA are very useful in phylogenetic and population genetic studiesof plant species, to study their potential integration in phylogenetic analysis. The non-coding trnE-trnT intergenic spacer ofcpDNA was analyzed to assess the nucleotide sequence polymorphism of 16 Solanaceae species and to estimate its ability tocontribute to the resolution of phylogenetic studies of this group. Multiple alignments of DNA sequences of trnE-trnT intergenicspacer made the identification of nucleotide variability in this region possible and the phylogeny was estimated by maximumparsimony and rooted with Convolvulaceae Ipomoea batatas, the most closely related family. Besides, this intergenic spacerwas tested for the phylogenetic ability to differentiate taxonomic levels. For this purpose, species from four other families wereanalyzed and compared with Solanaceae species. Results confirmed polymorphism in the trnE-trnT region at different taxonomiclevels.

  19. Graphene nanodevices for DNA sequencing

    Science.gov (United States)

    Heerema, Stephanie J.; Dekker, Cees

    2016-02-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with nanopores. Owing to its unique structure and properties, graphene provides interesting opportunities for the development of a new sequencing technology. In recent years, a wide range of creative ideas for graphene sequencers have been theoretically proposed and the first experimental demonstrations have begun to appear. Here, we review the different approaches to using graphene nanodevices for DNA sequencing, which involve DNA passing through graphene nanopores, nanogaps, and nanoribbons, and the physisorption of DNA on graphene nanostructures. We discuss the advantages and problems of each of these key techniques, and provide a perspective on the use of graphene in future DNA sequencing technology.

  20. Novel sequencing strategy for repetitive DNA in a Drosophila BAC clone reveals that the centromeric region of the Y chromosome evolved from a telomere.

    Science.gov (United States)

    Méndez-Lago, María; Wild, Jadwiga; Whitehead, Siobhan L; Tracey, Alan; de Pablos, Beatriz; Rogers, Jane; Szybalski, Waclaw; Villasante, Alfredo

    2009-04-01

    The centromeric and telomeric heterochromatin of eukaryotic chromosomes is mainly composed of middle-repetitive elements, such as transposable elements and tandemly repeated DNA sequences. Because of this repetitive nature, Whole Genome Shotgun Projects have failed in sequencing these regions. We describe a novel kind of transposon-based approach for sequencing highly repetitive DNA sequences in BAC clones. The key to this strategy relies on physical mapping the precise position of the transposon insertion, which enables the correct assembly of the repeated DNA. We have applied this strategy to a clone from the centromeric region of the Y chromosome of Drosophila melanogaster. The analysis of the complete sequence of this clone has allowed us to prove that this centromeric region evolved from a telomere, possibly after a pericentric inversion of an ancestral telocentric chromosome. Our results confirm that the use of transposon-mediated sequencing, including positional mapping information, improves current finishing strategies. The strategy we describe could be a universal approach to resolving the heterochromatic regions of eukaryotic genomes.

  1. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  2. Section-level relationships of North American Agalinis (Orobanchaceae based on DNA sequence analysis of three chloroplast gene regions

    Directory of Open Access Journals (Sweden)

    Neel Maile C

    2004-06-01

    Full Text Available Abstract Background The North American Agalinis are representatives of a taxonomically difficult group that has been subject to extensive taxonomic revision from species level through higher sub-generic designations (e.g., subsections and sections. Previous presentations of relationships have been ambiguous and have not conformed to modern phylogenetic standards (e.g., were not presented as phylogenetic trees. Agalinis contains a large number of putatively rare taxa that have some degree of taxonomic uncertainty. We used DNA sequence data from three chloroplast genes to examine phylogenetic relationships among sections within the genus Agalinis Raf. (=Gerardia, and between Agalinis and closely related genera within Orobanchaceae. Results Maximum likelihood analysis of sequences data from rbcL, ndhF, and matK gene regions (total aligned length 7323 bp yielded a phylogenetic tree with high bootstrap values for most branches. Likelihood ratio tests showed that all but a few branch lengths were significantly greater than zero, and an additional likelihood ratio test rejected the molecular clock hypothesis. Comparisons of substitution rates between gene regions based on linear models of pairwise distance estimates between taxa show both ndhF and matK evolve more rapidly than rbcL, although the there is substantial rate heterogeneity within gene regions due in part to rate differences among codon positions. Conclusions Phylogenetic analysis supports the monophyly of Agalinis, including species formerly in Tomanthera, and this group is sister to a group formed by the genera Aureolaria, Brachystigma, Dasistoma, and Seymeria. Many of the previously described sections within Agalinis are polyphyletic, although many of the subsections appear to form natural groups. The analysis reveals a single evolutionary event leading to a reduction in chromosome number from n = 14 to n = 13 based on the sister group relationship of section Erectae and section Purpureae

  3. Phylogeny of Cercis based on DNA sequences of nuclear ITS and four plastid regions: implications for transatlantic historical biogeography.

    Science.gov (United States)

    Fritsch, Peter W; Cruz, Boni C

    2012-03-01

    The disjunct genus Cercis has been used to test models of Northern Hemisphere historical biogeography. Previous phylogenetic estimates employing DNA sequences of the ITS region and (in one study) those of ndhF recovered a well supported clade of North American and western Eurasian species that was nested within a paraphyletic group of Chinese species. Resolution and clade support within the tree were otherwise low and the monophyly of Cercis canadensis was uncertain. Here we conduct a phylogenetic analysis of Cercis with a higher number of regions (ITS, ndhF, rpoB-trnC, trnT-trnD, and trnS-trnG) and samples than in previous studies. Results corroborate the initial divergence between the Chinese species Cercis chingii and the rest of the genus. Support is newly found both for a clade of the two North American species as sister to the western Eurasian species, and for the monophyly of C. canadensis. As in a previous study, divergence between North American and western Eurasian Cercis was estimated as mid-Miocene (ca. 13 million years ago), and the ancestor in which this divergence occurred was inferred to be xerophytic. Contrary to previous studies, however, our data infer strictly east-to-west vicariance. The timing of the transatlantic divergence in Cercis is too recent to be explained by a postulated continuous belt of semi-arid vegetation between North America and Europe in the Paleogene, suggesting instead the presence of a Miocene North Atlantic corridor for semi-arid plants. In the absence of strong evidence from other sources, the possibility that Cercis has been able to quickly adapt from mesophytic antecedents to semi-arid conditions whenever the latter have arisen in the Northern Hemisphere can be considered a plausible alternative, although parsimony optimization renders this scenario two steps longer. Copyright © 2011 Elsevier Inc. All rights reserved.

  4. Characterization of a DNA sequence family in the Prader-Willi/Angelman syndrome chromosome region in 15q11-q13

    Energy Technology Data Exchange (ETDEWEB)

    Dittrich, B.; Knoblauch, H.; Buiting, K.; Horsthemke, B. (Universitaetsklinikum Essen (Germany))

    1993-04-01

    IR4-3R (D15S11) is an anonymous DNA sequence from human chromosome 15. Using YAC cloning and restriction enzyme analysis, the authors have found that IR4-3R detects five related DNA sequences, which are spread over 700 kb within the Prader-Willi/Angelman syndrome chromosome region in 15q11-q 13. The RsaI and StyI polymorphisms, which were described previously, are associated with the most proximal copy of IR4-3R and are in strong linkage disequilibrium. IR4-3R represents the third DNA sequence family that has been identified in 15q11-q13. 14 refs., 2 figs., 1 tab.

  5. PCR-mediated Detection of Xanthomonas oryzae pv. oryzae by Amplification of the 16S-23S rDNA Spacer Region Sequence

    OpenAIRE

    Naoto, Adachi; Takashi, OKU; Ishikawa Agriculture Research Center; Hiroshima Prefectural University, School of Bioresources

    2000-01-01

    A detection method specific for Xanthomonas oryzae pv. oryzae, the pathogen responsible for bacterial blight of rice, was based on the polymerase chain reaction (PCR) and designed by amplifying the 16S-23S rDNA apacer region from this bacterium. The nucleotide sequence of the spacer region between the 16S and 23S rDNA, consisting of approximately 580-bp, from X. oryzae pv. oryzae, X. campestris pv. alfalfae, X. campestris pv. campestris, X. campestris pv. cannabis, X. campestris pv. citri, X....

  6. DNA Sequence Variants in the Five Prime Untranslated Region of the Cyclooxygenase-2 Gene Are Commonly Found in Healthy Dogs and Gray Wolves.

    Science.gov (United States)

    Safra, Noa; Hayward, Louisa J; Aguilar, Miriam; Sacks, Benjamin N; Westropp, Jodi L; Mohr, F Charles; Mellersh, Cathryn S; Bannasch, Danika L

    2015-01-01

    The aim of this study was to investigate the frequency of regional DNA variants upstream to the translation initiation site of the canine Cyclooxygenase-2 (Cox-2) gene in healthy dogs. Cox-2 plays a role in various disease conditions such as acute and chronic inflammation, osteoarthritis and malignancy. A role for Cox-2 DNA variants in genetic predisposition to canine renal dysplasia has been proposed and dog breeders have been encouraged to select against these DNA variants. We sequenced 272-422 bases in 152 dogs unaffected by renal dysplasia and found 19 different haplotypes including 11 genetic variants which had not been described previously. We genotyped 7 gray wolves to ascertain the wildtype variant and found that the wolves we analyzed had predominantly the second most common DNA variant found in dogs. Our results demonstrate an elevated level of regional polymorphism that appears to be a feature of healthy domesticated dogs.

  7. DNA Sequence Variants in the Five Prime Untranslated Region of the Cyclooxygenase-2 Gene Are Commonly Found in Healthy Dogs and Gray Wolves.

    Directory of Open Access Journals (Sweden)

    Noa Safra

    Full Text Available The aim of this study was to investigate the frequency of regional DNA variants upstream to the translation initiation site of the canine Cyclooxygenase-2 (Cox-2 gene in healthy dogs. Cox-2 plays a role in various disease conditions such as acute and chronic inflammation, osteoarthritis and malignancy. A role for Cox-2 DNA variants in genetic predisposition to canine renal dysplasia has been proposed and dog breeders have been encouraged to select against these DNA variants. We sequenced 272-422 bases in 152 dogs unaffected by renal dysplasia and found 19 different haplotypes including 11 genetic variants which had not been described previously. We genotyped 7 gray wolves to ascertain the wildtype variant and found that the wolves we analyzed had predominantly the second most common DNA variant found in dogs. Our results demonstrate an elevated level of regional polymorphism that appears to be a feature of healthy domesticated dogs.

  8. Statistical and linguistic features of DNA sequences

    Science.gov (United States)

    Havlin, S.; Buldyrev, S. V.; Goldberger, A. L.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.

  9. Genetic diversity in captive and wild Matschie's tree kangaroo (Dendrolagus matschiei) from Huon Peninsula, Papua New Guinea, based on mtDNA control region sequences.

    Science.gov (United States)

    McGreevy, Thomas J; Dabek, Lisa; Gomez-Chiarri, Marta; Husband, Thomas P

    2009-05-01

    The Association of Zoos and Aquariums (AZA) Matschie's tree kangaroo (Dendrolagus matschiei) population is at a critical point for assessing long-term viability. This population, established from 19 genetically uncharacterized D. matschiei, has endured a founder effect because only four individuals contributed the majority of offspring. The highly variable mitochondrial DNA (mtDNA) control region was sequenced for five of the female-founders by examining extant representatives of their maternal lineage and compared with wild (n = 13) and captive (n = 18) D. matschiei from Papua New Guinea (PNG). AZA female-founder D. matschiei control region haplotype diversity was low, compared with captive D. matschiei held in PNG. AZA D. matschiei have only two control region haplotypes because four out of five AZA female-founder D. matschiei had an identical sequence. Both AZA haplotypes were identified among the 17 wild and captive D. matschiei haplotypes from PNG. Genomic DNA extracted from wild D. matschiei fecal samples was a reliable source of mtDNA that could be used for a larger scale study. We recommend a nuclear DNA genetic analysis to more fully characterize AZA D. matschiei genetic diversity and to assist their Species Survival Plan((R)). An improved understanding of D. matschiei genetics will contribute substantially to the conservation of these unique animals both in captivity and the wild.

  10. Procerain B, a cysteine protease from Calotropis procera, requires N-terminus pro-region for activity: cDNA cloning and expression with pro-sequence.

    Science.gov (United States)

    Nandana, Vidhyadhar; Singh, Sushant; Singh, Abhay Narayan; Dubey, Vikash Kumar

    2014-11-01

    We have previously reported isolation and characterization of a novel plant cysteine protease, Procerain B, from the latex of Calotropis procera. Our initial attempts for active recombinant Procerain B in Escherichiacoli expression system was not successful. The reason for inactive enzyme production was attributed to the absence of 5' pro-region in the Procerain B cDNA that may be involved in proper folding and production of mature active protein. The current manuscript reports the cloning of full length Procerain B for the production of the active protein. The complete cDNA sequence of Procerain B with pro-region sequence was obtained by using RNA ligase mediated rapid amplification of 5' cDNA ends (RLM-RACE). The N-terminus pro-sequence region consists of 127 amino acids and characterized as the member of inhibitory I29 family. Further the three dimensional structure of full length Procerain B was modelled by homology modelling using X-ray crystal structure of procaricain (PDB ID: 1PCI). N-terminus pro-sequence of full length Procerain B runs along the active site cleft. Full length Procerain B was expressed in prokaryotic system and activated in vitro at pH 4.0. This is the first study reporting the production of active recombinant cysteine protease from C.procera. Copyright © 2014 Elsevier Inc. All rights reserved.

  11. Phylogenetic relationships of Scomberomorus commerson using sequence analysis of the mtDNA D-loop region in the Persian Gulf, Oman Sea and Arabian Sea

    Directory of Open Access Journals (Sweden)

    Ana Mansourkiaei

    2016-04-01

    Full Text Available Abstract Narrow-barred Spanish mackerel, Scomberomorus commerson, is an epipelagic and migratory species of family Scombridae which have a significant role in terms of ecology and fishery. 100 samples were collected from the Persian Gulf, Oman Sea and Arabian Sea. Part of their dorsal fins was snipped and transferred to micro-tubes containing ethanol; then, DNAs were extracted and HRM-Real Time PCR was performed to designate representative specimens for sequencing. Phylogenetic relationships of S. commerson from Persian Gulf, Oman Sea and Arabian Sea were investigated using sequence data of mitochondrial DNA D-loop region. None clustered Neighbor Joining tree indicated the proximity amid S. commerson in four sites. As numbers demonstrated in sequence analyses of mitochondrial DNA D-Loop region a sublimely high degree of genetic similarity among S. commerson from the Persian Gulf and Oman Sea were perceived, thereafter, having one stock structure of S. commerson in four regions were proved, and this approximation can be merely justified by their migration process along the coasts of Oman Sea and Persian Gulf. Therefore, the assessment of distribution patterns of 20 haplotypes in the constructed phylogenetic tree using mtDNA D-Loop sequences ascertained that no significant clustering according to the sampling sites was concluded.

  12. Graphene nanodevices for DNA sequencing

    NARCIS (Netherlands)

    Heerema, S.J.; Dekker, C.

    2016-01-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with

  13. Mutational analysis of the resolution sequence of vaccinia virus DNA: essential sequence consists of two separate AT-rich regions highly conserved among poxviruses.

    Science.gov (United States)

    Merchlinsky, M

    1990-01-01

    In replicative forms of vaccinia virus DNA, the unit genomes are connected by palindromic junction fragments that are resolved into mature viral genomes with hairpin termini. Bacterial plasmids containing the junction fragment for vaccinia virus or Shope fibroma virus were converted into linear minichromosomes of vector sequence flanked by poxvirus hairpin loops after transfection into infected cells. Analysis of a series of symmetrical deletion mutations demonstrated that in vaccinia virus the presence of the DNA sequence ATTTAGTGTCTAGAAAAAAA on both sides of the apical segment of the concatemer junction is crucial for resolution. To determine the precise architecture of the resolution site, a series of site-directed mutations within this tract of nucleotides were made and the relative contribution of each nucleotide to the efficaciousness of resolution was determined. The nucleotide sequence necessary for the resolution of the vaccinia virus concatemer junction, (A/T)TTT(A/G)N7-9AAAAAAA, is highly conserved among poxviruses and found proximal to the hairpin loop in the genomes of members of the Leporipoxvirus, Avipoxvirus, and Capripoxvirus genera. Images PMID:2398534

  14. DNA Sequencing Sensors: An Overview

    Directory of Open Access Journals (Sweden)

    Jose Antonio Garrido-Cardenas

    2017-03-01

    Full Text Available The first sequencing of a complete genome was published forty years ago by the double Nobel Prize in Chemistry winner Frederick Sanger. That corresponded to the small sized genome of a bacteriophage, but since then there have been many complex organisms whose DNA have been sequenced. This was possible thanks to continuous advances in the fields of biochemistry and molecular genetics, but also in other areas such as nanotechnology and computing. Nowadays, sequencing sensors based on genetic material have little to do with those used by Sanger. The emergence of mass sequencing sensors, or new generation sequencing (NGS meant a quantitative leap both in the volume of genetic material that was able to be sequenced in each trial, as well as in the time per run and its cost. One can envisage that incoming technologies, already known as fourth generation sequencing, will continue to cheapen the trials by increasing DNA reading lengths in each run. All of this would be impossible without sensors and detection systems becoming smaller and more precise. This article provides a comprehensive overview on sensors for DNA sequencing developed within the last 40 years.

  15. Nucleotide sequence of the promoter-distal region of the tra operon of plasmid R100, including traI (DNA helicase I) and traD genes.

    Science.gov (United States)

    Yoshioka, Y; Fujita, Y; Ohtsubo, E

    1990-07-05

    The nucleotide sequence of the promoter-distal region of the tra operon of R100 was determined. There are five open reading frames in the region between traT and finO, and their protein products were identified. Nucleotide sequences of plasmid F corresponding to the junction regions among the open reading frames seen in R100 were also determined. Comparison of these nucleotide sequences revealed strong homology in the regions containing traD, traI and an open reading frame (named orfD). The TraD protein (83,899 Da) contains three hydrophobic regions, of which two are located near the amino-terminal region. This protein also contains a possible ATP-binding consensus sequence at the amino-terminal region and a characteristic repeated peptide sequence (Gln-Gln-Pro)10 at the carboxy-terminal region. The TraI protein (191,679 Da) contains the sequence motif conserved in an ATP-dependent DNA helicase superfamily in its carboxy-terminal region. The protein product of orfD, which is probably a new tra gene (named traX), contains 65% hydrophobic amino acids, especially rich in alanine and leucine. There exist non-homologous regions between R100 and F that could be represented as four I-D (insertion or deletion) loops in heteroduplex molecules. Assignment of each loop to the strand of R100 or F was , however, found to be the reverse from that previously assumed. The three I-D loops that were located between traT and traD, between traD and traI, and between traI and finO had no terminal inverted repeat sequences nor had they any homology with known insertion sequences, while the fourth was IS3, located within the finO gene of F. The sequences in the I-D loops, except IS3, may also code for proteins that are, however, likely to be nonessential for transfer of plasmids.

  16. Region segmentation along image sequence

    Energy Technology Data Exchange (ETDEWEB)

    Monchal, L.; Aubry, P.

    1995-12-31

    A method to extract regions in sequence of images is proposed. Regions are not matched from one image to the following one. The result of a region segmentation is used as an initialization to segment the following and image to track the region along the sequence. The image sequence is exploited as a spatio-temporal event. (authors). 12 refs., 8 figs.

  17. Temporal transcription of the lactococcal temperate phage TP901-1 and DNA sequence of the early promoter region

    DEFF Research Database (Denmark)

    Madsen, Hans Peter Lynge; Hammer, Karin

    1998-01-01

    , of which at least two (the integrase gene and putative repressor) are needed for lysogeny, and the divergent and longer transcriptional unit from PL, presumably encoding functions required for the lytic life cycle. ORFs with homology to proteins involved in DNA replication were identified on the latter......Transcriptional analysis by Northern blotting identified clusters of early, middle and late transcribed regions of the temperate lactococcal bacteriophage TP901-1 during one-step growth experiments. The latent period was found to be 65 min and the burst size 40 +/- 10. The eight early transcripts...

  18. Nanogrid rolling circle DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Church, George M.; Porreca, Gregory J.; Shendure, Jay; Rosenbaum, Abraham Meir

    2017-04-18

    The present invention relates to methods for sequencing a polynucleotide immobilized on an array having a plurality of specific regions each having a defined diameter size, including synthesizing a concatemer of a polynucleotide by rolling circle amplification, wherein the concatemer has a cross-sectional diameter greater than the diameter of a specific region, immobilizing the concatemer to the specific region to make an immobilized concatemer, and sequencing the immobilized concatemer.

  19. A Demonstration of Automated DNA Sequencing.

    Science.gov (United States)

    Latourelle, Sandra; Seidel-Rogol, Bonnie

    1998-01-01

    Details a simulation that employs a paper-and-pencil model to demonstrate the principles behind automated DNA sequencing. Discusses the advantages of automated sequencing as well as the chemistry of automated DNA sequencing. (DDR)

  20. The sequence of sequencers: The history of sequencing DNA

    Science.gov (United States)

    Heather, James M.; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. PMID:26554401

  1. The 172-kb genomic DNA region of the O. rufipogon yld1.1 locus: comparative sequence analysis with O. sativa ssp. japonica and O. sativa ssp. indica.

    Science.gov (United States)

    Song, Beng-Kah; Hein, Ingo; Druka, Arnis; Waugh, Robbie; Marshall, David; Nadarajah, Kalaivani; Yap, Soon-Joo; Ratnam, Wickneswari

    2009-02-01

    Common wild rice (Oryza rufipogon) plays an important role by contributing to modern rice breeding. In this paper, we report the sequence and analysis of a 172-kb genomic DNA region of wild rice around the RM5 locus, which is associated with the yield QTL yld1.1. Comparative sequence analysis between orthologous RM5 regions from Oryza sativa ssp. japonica, O. sativa ssp. indica and O. rufipogon revealed a high level of conserved synteny in the content, homology, structure, orientation, and physical distance of all 14 predicted genes. Twelve of the putative genes were supported by matches to proteins with known function, whereas two were predicted by homology to rice and other plant expressed sequence tags or complementary DNAs. The remarkably high level of conservation found in coding, intronic and intergenic regions may indicate high evolutionary selection on the RM5 region. Although our analysis has not defined which gene(s) determine the yld1.1 phenotype, allelic variation and the insertion of transposable elements, among other nucleotide changes, represent potential variation responsible for the yield QTL. However, as suggested previously, two putative receptor-like protein kinase genes remain the key suspects for yld1.1.

  2. Genetic relationships among some subspecies of the Peregrine Falcon (Falco peregrinus L.), inferred from mitochondrial DNA control-region sequences

    Science.gov (United States)

    White, Clayton M.; Sonsthagen, Sarah A.; Sage, George K.; Anderson, Clifford; Talbot, Sandra L.

    2013-01-01

    The ability to successfully colonize and persist in diverse environments likely requires broad morphological and behavioral plasticity and adaptability, and this may partly explain why the Peregrine Falcon (Falco peregrinus) exhibits a large range of morphological characteristics across their global distribution. Regional and local differences within Peregrine Falcons were sufficiently variable that ∼75 subspecies have been described; many were subsumed, and currently 19 are generally recognized. We used sequence information from the control region of the mitochondrial genome to test for concordance between genetic structure and representatives of 12 current subspecies and from two areas where subspecies distributions overlap. Haplotypes were broadly shared among subspecies, and all geographic locales shared a widely distributed common haplotype (FalconCR2). Haplotypes were distributed in a star-like phylogeny, consistent with rapid expansion of a recently derived species, with observed genetic patterns congruent with incomplete lineage sorting and/or differential rates of evolution on morphology and neutral genetic characters. Hierarchical analyses of molecular variance did not uncover genetic partitioning at the continental level, despite strong population-level structure (FST = 0.228). Similar analyses found weak partitioning, albeit significant, among subspecies (FCT = 0.138). All reconstructions placed the hierofalcons' (Gyrfalcon [F. rusticolus] and Saker Falcon [F. cherrug]) haplotypes in a well-supported clade either basal or unresolved with respect to the Peregrine Falcon. In addition, haplotypes representing Taita Falcon (F. fasciinucha) were placed within the Peregrine Falcon clade.

  3. Mitochondrial DNA control region sequence variation suggests an independent origin of an {open_quotes}Asian-specific{close_quotes} 9-bp deletion in Africans

    Energy Technology Data Exchange (ETDEWEB)

    Soodyall, H.; Redd, A.; Vigilant [Pennsylvania State Univ., Univeristy Park, PA (United States)] [and others

    1994-09-01

    The intergenic noncoding region between the cytochrome oxidase II and lysyl tRNA genes of human mitochondrial DNA (mtDNA) is associated with two tandemly arranged copies of a 9-bp sequence. A deletion of one of these repeats has been found at varying frequencies in populations of Asian descent, and is commonly referred to as an {open_quotes}Asian-specific{close_quotes} marker. We report here that the 9-bp deletion is also found at a frequency of 10.2% (66/649) in some indigenous African populations, with frequencies of 28.6% (20/70) in Pygmies, 26.6% (12/45) in Malawians and 15.4% (31/199) in southeastern Bantu-speaking populations. The deletion was not found in 123 Khoisan individuals nor in 209 western Bantu-speaking individuals, with the exception of 3 individuals from one group that was admixed with Pygmies. Sequence analysis of the two hypervariable segments of the mtDNA control region reveals that the types associated with the African 9-bp deletion are different from those found in Asian-derived populations with the deletion. Phylogenetic analysis separates the {open_quotes}African{close_quotes} and {open_quotes}Asian{close_quotes} 9-bp deletion types into two different clusters which are statistically supported. Mismatch distributions based on the number of differences between pairs of mtDNA types are consistent with this separation. These findings strongly support the view that the 9-bp deletion originated independently in Africa and in Asia.

  4. Origin and genetic diversity of Egyptian native chickens based on complete sequence of mitochondrial DNA D-loop region.

    Science.gov (United States)

    Osman, Sayed A-M; Yonezawa, Takahiro; Nishibori, Masahide

    2016-06-01

    Domestic chickens (Gallus gallus) play a significant role, ranging from food and entertainment to religion and ornamentation. However, the details on their domestication process are still controversial, especially the origin and evolution of African chickens. Egypt is thought to be important place for this event because of its geographic location as well as its long history of civilization. However, the genetic component and structure of Egyptian native chicken (ENC) have not been studied so far. The aim of this study is to clarify the origin and evolution of African chickens through assessing the genetic diversities and structure of five ENC breeds using the mitochondrial D-loop sequences. Our results suggest there is genetic differentiation between the pure native breeds and the improved native breeds. The latter breeds were established by the hybridization of the pure native and the exotic breeds. The pure native breeds were estimated to be established about 800 years ago. Subsequently, we extensively analyzed the D-loop sequences from the ENC as well as the globally collected chickens (2,010 individuals in total). Our phylogenetic tree among the regional populations shows African chickens can be separated to two distinct clades. The first clade consists of North African (Egypt), Central African (Sudan and Cameroon), European, and West (and Central) Asian chickens. The second clade consists of East African (Kenya, Malawi, and Zimbabwe) and Pacific chickens. It suggests the dual origins of African native chickens. The first group was probably originated from South Asia, and then migrated to West Asia, and finally arrived to Africa thorough Egypt. The second group migrated from Pacific to East Africa via Indian Ocean probably by Austronesian people. This dual origin hypothesis as well as estimated divergence times in this study is harmonious with the archaeological and historical evidences. Our migration analysis suggests there is limited gene flow within African

  5. Footprinting with an automated capillary DNA sequencer.

    Science.gov (United States)

    Yindeeyoungyeon, W; Schell, M A

    2000-11-01

    Footprinting is a valuable tool for studying DNA-protein contacts. However, it usually involves expensive, tedious and hazardous steps such as radioactive labeling and analyses on polyacrylamide sequencing gels. We have developed an easy four-step footprinting method involving (i) the generation and purification of a PCR fragment that is fluorescently labeled at one end with 6-carboxyfluorescein; (ii) brief exposure of the fragment to a DNA-binding protein and then DNase I; (iii) spin-column purification; and (iv) analysis of partial digestion products on the ABI Prism 310 capillary DNA sequencer/genetic analyzer. Very detailed and sensitive footprints of large (> 400 bp) DNA fragments can be easily obtained, as illustrated by our use of this method to characterize binding of PhcA, a LysR-type activator, to two sites greater than 100 bp apart in the 5' untranslated region of xpsR, one of its regulated target genes. The advantages of this new method are that it (i) uses long-lived, safe and easy-to-make fluorescently labeled target fragments; (ii) uses sensitive, robust and highly reproducible fragment analysis using an automated DNA sequencer, instead of gel electrophoresis and autoradiography; and (iii) is cost effective.

  6. Perspectives in Biochemistry: Methods for DNA Sequencing.

    Science.gov (United States)

    Wood, Anne T.

    1984-01-01

    Describes two frequently used DNA sequencing methods: Sander's enzymatic dideoxy method and Maxam and Gilbert's chemical sequencing method. Indicates that studying these methods provides students with knowledge of the chemical structure of DNA and how DNA sequence data are obtained. (JN)

  7. Sequence-Specific DNA Binding by a Short Peptide Dimer

    Science.gov (United States)

    Talanian, Robert V.; McKnight, C. James; Kim, Peter S.

    1990-08-01

    A recently described class of DNA binding proteins is characterized by the "bZIP" motif, which consists of a basic region that contacts DNA and an adjacent "leucine zipper" that mediates protein dimerization. A peptide model for the basic region of the yeast transcriptional activator GCN4 has been developed in which the leucine zipper has been replaced by a disulfide bond. The 34-residue peptide dimer, but not the reduced monomer, binds DNA with nanomolar affinity at 4^circC. DNA binding is sequence-specific as judged by deoxyribonuclease I footprinting. Circular dichroism spectroscopy suggests that the peptide adopts a helical structure when bound to DNA. These results demonstrate directly that the GCN4 basic region is sufficient for sequence-specific DNA binding and suggest that a major function of the GCN4 leucine zipper is simply to mediate protein dimerization. Our approach provides a strategy for the design of short sequence-specific DNA binding peptides.

  8. The historical demography and genetic variation of the endangered Cycas multipinnata (Cycadaceae in the red river region, examined by chloroplast DNA sequences and microsatellite markers.

    Directory of Open Access Journals (Sweden)

    Yi-Qing Gong

    Full Text Available Cycas multipinnata C.J. Chen & S.Y. Yang is a cycad endemic to the Red River drainage region that occurs under evergreen forest on steep limestone slopes in Southwest China and northern Vietnam. It is listed as endangered due to habitat loss and over-collecting for the ornamental plant trade, and only several populations remain. In this study, we assess the genetic variation, population structure, and phylogeography of C. multipinnata populations to help develop strategies for the conservation of the species. 60 individuals from six populations were used for chloroplast DNA (cpDNA sequencing and 100 individuals from five populations were genotyped using 17 nuclear microsatellites. High genetic differentiation among populations was detected, suggesting that pollen or seed dispersal was restricted within populations. Two main genetic clusters were observed in both the cpDNA and microsatellite loci, corresponding to Yunnan China and northern Vietnam. These clusters indicated low levels of gene flow between the regions since their divergence in the late Pleistocene, which was inferred from both Bayesian and coalescent analysis. In addition, the result of a Bayesian skyline plot based on cpDNA portrayed a long history of constant population size followed by a decline in the last 50,000 years of C. multipinnata that was perhaps affected by the Quaternary glaciations, a finding that was also supported by the Garza-Williamson index calculated from the microsatellite data. The genetic consequences produced by climatic oscillations and anthropogenic disturbances are considered key pressures on C. multipinnata. To establish a conservation management plan, each population of C. multipinnata should be recognized as a Management Unit (MU. In situ and ex situ actions, such as controlling overexploitation and creating a germplasm bank with high genetic diversity, should be urgently implemented to preserve this species.

  9. Phylogenetic relationships in Solanaceae and related species based on cpDNA sequence from plastid trnE-trnT region

    OpenAIRE

    Danila Montewka Melotto-Passarin; Irving Joseph Berger; Keini Dressano; Valentina de Fátima De Martin; Giancarlo Conde Xavier Oliveira; Ralph Bock; Helaine Carrer

    2008-01-01

    Intergenic spacers of chloroplast DNA (cpDNA) are very useful in phylogenetic and population genetic studies of plant species, to study their potential integration in phylogenetic analysis. The non-coding trnE-trnT intergenic spacer of cpDNA was analyzed to assess the nucleotide sequence polymorphism of 16 Solanaceae species and to estimate its ability to contribute to the resolution of phylogenetic studies of this group. Multiple alignments of DNA sequences of trnE-trnT intergenic spacer mad...

  10. High-quality mtDNA control region sequences from 680 individuals sampled across the Netherlands to establish a national forensic mtDNA reference database

    NARCIS (Netherlands)

    L.C. Chaitanya (Lakshmi); M. van Oven (Mannis); S. Brauer (Silke); B. Zimmermann (Bettina); G. Huber (Gabriela); C. Xavier (Catarina); W. Parson (Walther); P. de Knijff (Peter); M.H. Kayser (Manfred)

    2016-01-01

    textabstractThe use of mitochondrial DNA (mtDNA) for maternal lineage identification often marks the last resort when investigating forensic and missing-person cases involving highly degraded biological materials. As with all comparative DNA testing, a match between evidence and reference sample

  11. Identification of sequence polymorphism in the D-Loop region of mitochondrial DNA as a risk factor for hepatocellular carcinoma with distinct etiology

    Directory of Open Access Journals (Sweden)

    Zhang Ruixing

    2010-09-01

    Full Text Available Abstract Background Hepatocellular carcinoma (HCC is frequently preceded by hepatitis virus infection or alcohol abuse. Genetic backgrounds may increase susceptibility to HCC from these exposures. Methods Mitochondrial DNA (mtDNA of peripheral blood, tumor, and/or adjacent non-tumor tissue from 49 hepatitis B virus-related and 11 alcohol-related HCC patients, and from 38 controls without HCC were examined for single nucleotide polymorphisms (SNPs and mutations in the D-Loop region. Results Single nucleotide polymorphisms (SNPs in the D-loop region of mt DNA were examined in HCC patients. Individual SNPs, namely the 16266C/T, 16293A/G, 16299A/G, 16303G/A, 242C/T, 368A/G, and 462C/T minor alleles, were associated with increased risk for alcohol- HCC, and the 523A/del was associated with increased risks of both HCC types. The mitochondrial haplotypes under the M haplogroup with a defining 489C polymorphism were detected in 27 (55.1% of HBV-HCCand 8 (72.7% of alcohol- HCC patients, and in 15 (39.5% of controls. Frequencies of the 489T/152T, 489T/523A, and 489T/525C haplotypes were significantly reduced in HBV-HCC patients compared with controls. In contrast, the haplotypes of 489C with 152T, 249A, 309C, 523Del, or 525Del associated significantly with increase of alcohol-HCC risk. Mutations in the D-Loop region were detected in 5 adjacent non-tumor tissues and increased in cancer stage (21 of 49 HBV-HCC and 4 of 11 alcohol- HCC, p Conclusions In sum, mitochondrial haplotypes may differentially predispose patients to HBV-HCC and alcohol-HCC. Mutations of the mitochondrial D-Loop sequence may relate to HCC development.

  12. DNA sequencing technologies: 2006-2016.

    Science.gov (United States)

    Mardis, Elaine R

    2017-02-01

    Recent advances in the field of genomics have largely been due to the ability to sequence DNA at increasing throughput and decreasing cost. DNA sequencing was first introduced in 1977, and next-generation sequencing technologies have been available only during the past decade, but the diverse experiments and corresponding analyses facilitated by these techniques have transformed biological and biomedical research. Here, I review developments in DNA sequencing technologies over the past 10 years and look to the future for further applications.

  13. Investigating the prehistory of Tungusic peoples of Siberia and the Amur-Ussuri region with complete mtDNA genome sequences and Y-chromosomal markers.

    Directory of Open Access Journals (Sweden)

    Ana T Duggan

    Full Text Available Evenks and Evens, Tungusic-speaking reindeer herders and hunter-gatherers, are spread over a wide area of northern Asia, whereas their linguistic relatives the Udegey, sedentary fishermen and hunter-gatherers, are settled to the south of the lower Amur River. The prehistory and relationships of these Tungusic peoples are as yet poorly investigated, especially with respect to their interactions with neighbouring populations. In this study, we analyse over 500 complete mtDNA genome sequences from nine different Evenk and even subgroups as well as their geographic neighbours from Siberia and their linguistic relatives the Udegey from the Amur-Ussuri region in order to investigate the prehistory of the Tungusic populations. These data are supplemented with analyses of Y-chromosomal haplogroups and STR haplotypes in the Evenks, Evens, and neighbouring Siberian populations. We demonstrate that whereas the North Tungusic Evenks and Evens show evidence of shared ancestry both in the maternal and in the paternal line, this signal has been attenuated by genetic drift and differential gene flow with neighbouring populations, with isolation by distance further shaping the maternal genepool of the Evens. The Udegey, in contrast, appear quite divergent from their linguistic relatives in the maternal line, with a mtDNA haplogroup composition characteristic of populations of the Amur-Ussuri region. Nevertheless, they show affinities with the Evenks, indicating that they might be the result of admixture between local Amur-Ussuri populations and Tungusic populations from the north.

  14. Loss of genetic variability in a hatchery strain of Senegalese sole (Solea senegalensis revealed by sequence data of the mitochondrial DNA control region and microsatellite markers

    Directory of Open Access Journals (Sweden)

    Pablo Sánchez

    2012-06-01

    Full Text Available Comparisons of the levels of genetic variation within and between a hatchery F1 (FAR, n=116 of Senegalese sole, Solea senegalensis, and its wild donor population (ATL, n = 26, both native to the SW Atlantic coast of the Iberian peninsula, as well as between the wild donor population and a wild western Mediterranean sample (MED, n=18, were carried out by characterizing 412 base pairs of the nucleotide sequence of the mitochondrial DNA control region I, and six polymorphic microsatellite loci. FAR showed a substantial loss of genetic variability (haplotypic diversity, h=0.49±0.066; nucleotide diversity, π=0.006±0.004; private allelic richness, pAg=0.28 to its donor population ATL (h=0.69±0.114; π=0.009±0.006; pAg=1.21. Pairwise FST values of microsatellite data were highly significant (P < 0.0001 between FAR and ATL (0.053 and FAR and MED (0.055. The comparison of wild samples revealed higher values of genetic variability in MED than in ATL, but only with mtDNA CR-I sequence data (h=0.948±0.033; π=0.030±0.016. However, pairwise ΦST and FST values between ATL and MED were highly significant (P < 0.0001 with mtDNA CR-I (0.228 and with microsatellite data (0.095, respectively. While loss of genetic variability in FAR could be associated with the sampling error when the broodstock was established, the results of parental and sibship inference suggest that most of these losses can be attributed to a high variance in reproductive success among members of the broodstock, particularly among females.

  15. Chloroplast DNA analysis of Tunisian cork oak populations (Quercus suber L.): sequence variations and molecular evolution of the trnL (UAA)-trnF (GAA) region.

    Science.gov (United States)

    Abdessamad, A; Baraket, G; Sakka, H; Ammari, Y; Ksontini, M; Hannachi, A Salhi

    2016-10-24

    Sequences of the trnL-trnF spacer and combined trnL-trnF region in chloroplast DNA of cork oak (Quercus suber L.) were analyzed to detect polymorphisms and to elucidate molecular evolution and demographic history. The aligned sequences varied in length and nucleotide composition. The overall ratio of transition/transversion (ti/tv) of 0.724 for the intergenic spacer and 0.258 for the pooled sequences were estimated, and indicated that transversions are more frequent than transitions. The molecular evolution and demographic history of Q. suber were investigated. Neutrality tests (Tajima's D and Fu and Li) ruled out the null hypothesis of a strictly neutral model, and Fu's Fs and Ramos-Onsins and Rozas' R2 confirmed the recent expansion of cork oak trees, validating its persistency in North Africa since the last glaciation during the Quaternary. The observed uni-modal mismatch distribution and the Harpending's raggedness index confirmed the demographic history model for cork oak. A phylogenetic dendrogram showed that the distribution of Q. suber trees occurs independently of geographical origin, the relief of the population site, and the bioclimatic stages. The molecular history and cytoplasmic diversity suggest that in situ and ex situ conservation strategies can be recommended for preserving landscape value and facing predictable future climatic changes.

  16. Evaluation of variation in control region sequences for Hispanic individuals in the SWGDAM mtDNA data set.

    Science.gov (United States)

    Allard, Marc W; Polanskey, Deborah; Wilson, Mark R; Monson, Keith L; Budowle, Bruce

    2006-05-01

    The Scientific Working Group on DNA Analysis Methods (SWGDAM) Hispanic data set was analyzed to determine the diversity, phylogeny, and relevant single nucleotide polymorphisms (SNPs) that describe haplogroup patterns for Hispanic Americans (N=686), and to assess the degree of admixture regarding mitochondrial DNA (mtDNA). The largest component of admixture based on mtDNA analysis derives from the four major haplogroups previously observed in Native American ancestry, including A (29.3%), B (15.7%), C (20.6%), and D (4.8%). European (17.8%) and African (11.8%) haplogroups also were observed within this data set. Hispanic SWGDAM samples from the southwest, compared with other SWGDAM Hispanic samples, were observed to have a greater percent of Native American haplogroups present (79.9%), and fewer African American haplogroups (4.5%). A total of 234 SNPs were observed in the data set, including 36 newly reported variable positions. These SWGDAM Hispanic data set SNPs ranged from having 1 to 31 changes (Length=L) on the phylogenetic tree, with site 16519 being the most variable. On average, there were 3.9 character changes for each variable position on the tree. The most variable sites (with 13 or more changes each listed from fastest to slowest) observed were 16519 (L=31), 16189 (L=23), 152 (L=23), 16311 (L=19), 146 (L=17), 195 (L=17), 16093 (L=15), 16362 (L=14), 16129 (L=13), 150 (L=13), and 153 (L=13). These sites are consistent with other reports on highly variable positions. A total of 27 SNPs were chosen to identify all clusters containing 1% (N=7) or more individuals in the SWGDAM Hispanic data set. The descriptive analyses revealed that the SWGDAM Hispanic data set is similar to published Native American and Hispanic data sets.

  17. Novel Bacteriocinogenic Lactobacillus plantarum Strains and Their Differentiation by Sequence Analysis of 16S rDNA, 16S-23S and 23S-5S Intergenic Spacer Regions and Randomly Amplified Polymorphic DNA Analysis

    Directory of Open Access Journals (Sweden)

    Morteza Shojaei Moghadam

    2010-01-01

    Full Text Available Six strains of bacteriocinogenic Lactobacillus plantarum (TL1, RG11, RS5, UL4, RG14 and RI11 isolated from Malaysian foods were investigated for their structural bacteriocin genes. A new combination of plantaricin EF and plantaricin W bacteriocin structural genes was successfully amplified from all studied strains, suggesting that they were novel bacteriocin-producing L. plantarum strains. A four-base pair variable region was detected in the short 16S-23S intergenic spacer regions of the studied strains by a comparative analysis with 17 L. plantarum strains deposited in the GenBank, implying they were new genotypes. The studied L. plantarum strains were subsequently differentiated into four groups on the basis of the detected four-base pair variable region of the short 16S-23S intergenic spacer region. Further analysis of the DNA sequence of 23S-5S intergenic spacer region revealed only one type of 23S-5S intergenic spacer region present in the studied strains, indicating it was highly conserved among the studied L. plantarum strains. Three randomly amplified polymorphic DNA experiments using three different combinations of arbitrary primers successfully differentiated the studied L. plantarum strains from each other, confirming they were different strains. In conclusion, the studied L. plantarum strains were shown to be novel bacteriocin producers and high level of strain discrimination could be achieved with a combination of randomly amplified polymorphic DNA analysis and the analysis of the variable region of short 16S-23S intergenic spacer region present in L. plantarum strains.

  18. Specific identification of Western Atlantic Ocean scombrids using mitochondrial DNA cytochrome c oxidase subunit I (COI) gene region sequences

    National Research Council Canada - National Science Library

    Paine, Melissa A; McDowell, Jan R; Graves, John E

    2007-01-01

    .... The mitochondrial cytochrome c oxidase subunit I (COI) gene region was evaluated as a molecular marker for the specific identification of the 17 members of the family Scombridae common to the western Atlantic Ocean...

  19. Three genetic stocks of frigate tuna Auxis thazard thazard (Lacepede, 1800) along the Indian coast revealed from sequence analyses of mitochondrial DNA D-loop region

    Digital Repository Service at National Institute of Oceanography (India)

    GirishKumar; Kunal, S.P.; Menezes, M.R.; Meena, R.M.

    , Foster City, CA, USA). Representative sequences have been deposited in GenBank, with accession numbers JN398671- JN399010. Data analyses DNA sequences were edited with the program BioEdit (version 7.0.1, Hall 1999) and aligned using the Clustal.... Molecular diversity indices, such as transitions, transversions, substitutions, and indels were obtained using program Arlequin version 3.11 (Excoffier et al. 2005). The aligned sequences were used to analyze the population structure and genetic variation...

  20. Complexity of a small non-protein coding sequence in chromosomal region 22q11.2: presence of specialized DNA secondary structures and RNA exon/intron motifs.

    Science.gov (United States)

    Delihas, Nicholas

    2015-10-14

    DiGeorge Syndrome is a genetic abnormality involving ~3 Mb deletion in human chromosome 22, termed 22q.11.2. To better understand the non-coding regions of 22q.11.2, a small 10,000 bp non-protein-coding sequence close to the DiGeorge Critical Region 6 gene (DGCR6) was chosen for analysis and functional entities as the homologous sequence in the chimpanzee genome could be aligned and used for comparisons. The GenBank database provided genomic sequences. In silico computer programs were used to find homologous DNA sequences in human and chimpanzee genomes, generate random sequences, determine DNA sequence alignments, sequence comparisons and nucleotide repeat copies, and to predicted DNA secondary structures. At its 5' half, the 10,000 bp sequence has three distinct sections that represent phylogenetically variable sequences. These Variable Regions contain biased mutations with a very high A + T content, multiple copies of the motif TATAATATA and sequences that fold into long A:T-base-paired stem loops. The 3' half of the 10,000 bp unit, highly conserved between human and chimpanzee, has sequences representing exons of lncRNA genes and segments of introns of protein genes. Central to the 10,000 bp unit are the multiple copies of a sequence that originates from the flanking 5' end of the translocation breakpoint Type A sequence. This breakpoint flanking sequence carries the exon and intron motifs. The breakpoint Type A sequence seems to be a major player in the proliferation of these RNA motifs, as well as the proliferation of Variable Regions in the 10,000 bp segment and other regions within 22q.11.2. The data indicate that a non-coding region of the chromosome may be reserved for highly biased mutations that lead to formation of specialized sequences and DNA secondary structures. On the other hand, the highly conserved nucleotide sequence of the non-coding region may form storage sites for RNA motifs.

  1. Concordant genetic distinctness of the phylogroup of the Siberian chipmunk from the Korean peninsula (Tamias sibiricus barberi), reexamined with nuclear DNA c-myc gene exon 2 and mtDNA control region sequences.

    Science.gov (United States)

    Koh, Hung Sun; Zhang, Minghai; Bayarlkhagva, Damdingiin; Ham, Eui Jeong; Kim, Jin Seong; Jang, Kyung Hee; Park, Nam Jeong

    2010-08-01

    We reexamined Tamias sibiricus barberi from Korea by sequencing c-myc exon 2 and the mtDNA control region. In the c-myc exon, the monogenic T. s. barberi differed from the monogenic T. s. orientalis (nucleotide distance 0.48%; 3 variable sites at 168, 306, and 552), whereas T. s. orientalis was identical to T. s. sibiricus. In the control region, T. s. barberi differed from T. s. orientalis (distance 6.84%) and T. s. sibiricus (9.35%). We considered the concordant, extensive gaps between the phylogroup of T. s. barberi and other subspecies of T. sibiricus in the c-myc gene, control region, and cytochrome b gene to be evidence of a lack of intergradation through North Korea from T. s. barberi to T. s. orientalis. Our results, showing the genetic and morphological distinctness of T. s. barberi, support that this phylogroup is a distinct species.

  2. "First generation" automated DNA sequencing technology.

    Science.gov (United States)

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  3. DNA Sequencing in Undergraduate Laboratory Courses.

    Science.gov (United States)

    Hamilton, Robert G.

    1997-01-01

    Discusses strategies to duplicate current research protocols using biochemical methods of analysis. Describes the use of the Silver Sequence kit that provides a technically simple and relatively inexpensive DNA sequencing exercise. (JRH)

  4. Haplogrouping mitochondrial DNA sequences in Legal Medicine/Forensic Genetics.

    Science.gov (United States)

    Bandelt, Hans-Jürgen; van Oven, Mannis; Salas, Antonio

    2012-11-01

    Haplogrouping refers to the classification of (partial) mitochondrial DNA (mtDNA) sequences into haplogroups using the current knowledge of the worldwide mtDNA phylogeny. Haplogroup assignment of mtDNA control-region sequences assists in the focused comparison with closely related complete mtDNA sequences and thus serves two main goals in forensic genetics: first is the a posteriori quality analysis of sequencing results and second is the prediction of relevant coding-region sites for confirmation or further refinement of haplogroup status. The latter may be important in forensic casework where discrimination power needs to be as high as possible. However, most articles published in forensic genetics perform haplogrouping only in a rudimentary or incorrect way. The present study features PhyloTree as the key tool for assigning control-region sequences to haplogroups and elaborates on additional Web-based searches for finding near-matches with complete mtDNA genomes in the databases. In contrast, none of the automated haplogrouping tools available can yet compete with manual haplogrouping using PhyloTree plus additional Web-based searches, especially when confronted with artificial recombinants still present in forensic mtDNA datasets. We review and classify the various attempts at haplogrouping by using a multiplex approach or relying on automated haplogrouping. Furthermore, we re-examine a few articles in forensic journals providing mtDNA population data where appropriate haplogrouping following PhyloTree immediately highlights several kinds of sequence errors.

  5. Detection of DNA Methylation by Whole-Genome Bisulfite Sequencing.

    Science.gov (United States)

    Li, Qing; Hermanson, Peter J; Springer, Nathan M

    2018-01-01

    DNA methylation plays an important role in the regulation of the expression of transposons and genes. Various methods have been developed to assay DNA methylation levels. Bisulfite sequencing is considered to be the "gold standard" for single-base resolution measurement of DNA methylation levels. Coupled with next-generation sequencing, whole-genome bisulfite sequencing (WGBS) allows DNA methylation to be evaluated at a genome-wide scale. Here, we described a protocol for WGBS in plant species with large genomes. This protocol has been successfully applied to assay genome-wide DNA methylation levels in maize and barley. This protocol has also been successfully coupled with sequence capture technology to assay DNA methylation levels in a targeted set of genomic regions.

  6. Fibonacci Sequence and Supramolecular Structure of DNA.

    Science.gov (United States)

    Shabalkin, I P; Grigor'eva, E Yu; Gudkova, M V; Shabalkin, P I

    2016-05-01

    We proposed a new model of supramolecular DNA structure. Similar to the previously developed by us model of primary DNA structure [11-15], 3D structure of DNA molecule is assembled in accordance to a mathematic rule known as Fibonacci sequence. Unlike primary DNA structure, supramolecular 3D structure is assembled from complex moieties including a regular tetrahedron and a regular octahedron consisting of monomers, elements of the primary DNA structure. The moieties of the supramolecular DNA structure forming fragments of regular spatial lattice are bound via linker (joint) sequences of the DNA chain. The lattice perceives and transmits information signals over a considerable distance without acoustic aberrations. Linker sequences expand conformational space between lattice segments allowing their sliding relative to each other under the action of external forces. In this case, sliding is provided by stretching of the stacked linker sequences.

  7. Understanding human DNA sequence variation.

    Science.gov (United States)

    Kidd, K K; Pakstis, A J; Speed, W C; Kidd, J R

    2004-01-01

    Over the past century researchers have identified normal genetic variation and studied that variation in diverse human populations to determine the amounts and distributions of that variation. That information is being used to develop an understanding of the demographic histories of the different populations and the species as a whole, among other studies. With the advent of DNA-based markers in the last quarter century, these studies have accelerated. One of the challenges for the next century is to understand that variation. One component of that understanding will be population genetics. We present here examples of many of the ways these new data can be analyzed from a population perspective using results from our laboratory on multiple individual DNA-based polymorphisms, many clustered in haplotypes, studied in multiple populations representing all major geographic regions of the world. These data support an "out of Africa" hypothesis for human dispersal around the world and begin to refine the understanding of population structures and genetic relationships. We are also developing baseline information against which we can compare findings at different loci to aid in the identification of loci subject, now and in the past, to selection (directional or balancing). We do not yet have a comprehensive understanding of the extensive variation in the human genome, but some of that understanding is coming from population genetics.

  8. DNA Sequencing in Cultural Heritage.

    Science.gov (United States)

    Vai, Stefania; Lari, Martina; Caramelli, David

    2016-02-01

    During the last three decades, DNA analysis on degraded samples revealed itself as an important research tool in anthropology, archaeozoology, molecular evolution, and population genetics. Application on topics such as determination of species origin of prehistoric and historic objects, individual identification of famous personalities, characterization of particular samples important for historical, archeological, or evolutionary reconstructions, confers to the paleogenetics an important role also for the enhancement of cultural heritage. A really fast improvement in methodologies in recent years led to a revolution that permitted recovering even complete genomes from highly degraded samples with the possibility to go back in time 400,000 years for samples from temperate regions and 700,000 years for permafrozen remains and to analyze even more recent material that has been subjected to hard biochemical treatments. Here we propose a review on the different methodological approaches used so far for the molecular analysis of degraded samples and their application on some case studies.

  9. Optimization of sequence alignment for simple sequence repeat regions

    Directory of Open Access Journals (Sweden)

    Ogbonnaya Francis C

    2011-07-01

    Full Text Available Abstract Background Microsatellites, or simple sequence repeats (SSRs, are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs. SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. Findings To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type. When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. Conclusions The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic

  10. Phylogenetic Analysis of a 'Jewel Orchid' Genus Goodyera (Orchidaceae) Based on DNA Sequence Data from Nuclear and Plastid Regions.

    Science.gov (United States)

    Hu, Chao; Tian, Huaizhen; Li, Hongqing; Hu, Aiqun; Xing, Fuwu; Bhattacharjee, Avishek; Hsu, Tianchuan; Kumar, Pankaj; Chung, Shihwen

    2016-01-01

    A molecular phylogeny of Asiatic species of Goodyera (Orchidaceae, Cranichideae, Goodyerinae) based on the nuclear ribosomal internal transcribed spacer (ITS) region and two chloroplast loci (matK and trnL-F) was presented. Thirty-five species represented by 132 samples of Goodyera were analyzed, along with other 27 genera/48 species, using Pterostylis longifolia and Chloraea gaudichaudii as outgroups. Bayesian inference, maximum parsimony and maximum likelihood methods were used to reveal the intrageneric relationships of Goodyera and its intergeneric relationships to related genera. The results indicate that: 1) Goodyera is not monophyletic; 2) Goodyera could be divided into four sections, viz., Goodyera, Otosepalum, Reticulum and a new section; 3) sect. Reticulum can be further divided into two subsections, viz., Reticulum and Foliosum, whereas sect. Goodyera can in turn be divided into subsections Goodyera and a new subsection.

  11. Sequence variants of the CRH 5'-flanking region: effects on DNA-protein interactions studied by EMSA in PC12 cells.

    Science.gov (United States)

    Wagner, Uta; Wahle, Matthias; Malysheva, Olga; Wagner, Ulf; Häntzschel, Holm; Baerwald, Christoph

    2006-06-01

    Recently, studies in adult rheumatoid arthritis patients have shown an association with four single-nucleotide polymorphisms (SNPs) in the 3.7-kb regulatory region of human corticotropin-releasing hormone (hCRH) gene located at positions -3531, -3371, -2353, and -684 bp. Three of these novel polymorphisms are in absolute linkage disequilibrium, resulting in three combined alleles, named A1B1, A2B1, and A2B2. To study whether the described polymorphic nucleotide sequences in the 5' region of the hCRH gene interfere with binding of nuclear proteins, an electric mobility shift assay (EMSA) was performed. At position -2353 bp, a specific DNA protein complex was detected for the wild-type sequence only, possibly interfering with a binding site for the activating transcription factor 6 (ATF6). In contrast, no difference could be detected for the other SNPs. However, at position -684, a quantitative difference in protein binding due to cAMP incubation could be observed. To further investigate whether these SNPs in the CRH promoter are associated with an altered regulation of the CRH gene, we performed a luciferase reporter gene assay with transiently transfected rat pheochromocytoma cells PC12. Incubation with 8-Br-cAMP alone or in combination with cytokines enhanced significantly the promoter activity in PC12 cells. The promoter haplotypes studied exhibited a differential capacity to modulate CRH gene expression. In all our experiments, haplotype A1B1 showed the most pronounced influence on promoter activity. Taken together, our results demonstrate a differential binding capacity of nuclear proteins of the promoter polymorphisms resulting in a different gene regulation. Most probably the SNP at position -2,353 plays a major role in mediating these differences.

  12. Sequencing of mitochondrial HV1 and HV2 DNA with length heteroplasmy

    DEFF Research Database (Denmark)

    Rasmussen, E. Michael; Eriksen, Birthe; Larsen, Hans Jakob

    2003-01-01

    This study presents a fast method for sequencing the poly C/G regions in HV1 and HV2 in the mitochondrial DNA (mtDNA)......This study presents a fast method for sequencing the poly C/G regions in HV1 and HV2 in the mitochondrial DNA (mtDNA)...

  13. DNA sequence pattern recognition methods in GRAIL

    Energy Technology Data Exchange (ETDEWEB)

    Uberbacher, E.C.; Xu, Ying; Shah, M.; Matis, S.; Guan, X.; Mural, R.J.

    1995-12-31

    The goal of the GRAIL project has been to create a comprehensive analysis environment where a host of questions about genes and genome structure can be answered as quickly and accurately as possible. Constructing this system has entailed solving a number of significant technical challenges including: (a) making coding recognition in sequence more sensitive and accurate, (b) compensating for isochore base compositional effects in coding prediction, (c) developing methods to determine which parts of each strand of a long genomic DNA are the coding strand, (d) improving the accuracy of splice site prediction and recognizing non-consensus sites, and (e) recognizing variable regulatory structures such as polymerase II promoters. An additional challenge has been to construct algorithms which compensate for the deleterious effects of insertion or deletion (indel) errors in the coding region recognition process. This paper addresses progress on these technical issues and the current state of sequence feature recognition methods.

  14. Production and use of bovine DNA libraries: DNA-sequencing.

    Science.gov (United States)

    Sallmann, H P; Fuhrmann, H; Huttel, K; Geldermann, H

    1990-03-01

    An important part in the use of genomic DNA libraries is the sequencing of identified clones for detailed information. In this study, methods for DNA sequence analysis were elaborated and employed for the k-casein gene, a bovine milk protein. The results encourage further research.

  15. Sequence Affects the Cyclization of DNA Minicircles.

    Science.gov (United States)

    Wang, Qian; Pettitt, B Montgomery

    2016-03-17

    Understanding how the sequence of a DNA molecule affects its dynamic properties is a central problem affecting biochemistry and biotechnology. The process of cyclizing short DNA, as a critical step in molecular cloning, lacks a comprehensive picture of the kinetic process containing sequence information. We have elucidated this process by using coarse-grained simulations, enhanced sampling methods, and recent theoretical advances. We are able to identify the types and positions of structural defects during the looping process at a base-pair level. Correlations along a DNA molecule dictate critical sequence positions that can affect the looping rate. Structural defects change the bending elasticity of the DNA molecule from a harmonic to subharmonic potential with respect to bending angles. We explore the subelastic chain as a possible model in loop formation kinetics. A sequence-dependent model is developed to qualitatively predict the relative loop formation time as a function of DNA sequence.

  16. Control region sequences for East Asian individuals in the Scientific Working Group on DNA Analysis Methods forensic mtDNA data set.

    Science.gov (United States)

    Allard, Marc W; Wilson, Mark R; Monson, Keith L; Budowle, Bruce

    2004-03-01

    The Scientific Working Group on DNA Analysis Methods (SWGDAM) mitochondrial DNA (mtDNA) population data set is used to infer the relative rarity of mtDNA profiles obtained from evidence samples and of profiles used to identify missing persons. In this study, the East Asian haplogroup patterns in the SWGDAM data sets were analyzed in a phylogenetic context to determine relevant single nucleotide polymorphisms (SNPs) and to describe haplogroup distributions for Asians (n = 753; with a breakdown of individuals from China n = 356, Korea n = 182, Japan n = 163, and Thailand n = 52). We focus on the patterns observed in the SWGDAM Chinese data set and refer to interesting differences in the smaller subgroup data sets for the other East Asian populations (Japanese, Korean, and Thai). A total of 218 SNPs were observed in the data set, including 37 observed positions not previously reported. In the largest of the East Asian SWGDAM data sets (Chinese), these SNPs ranged from having 1 to 29 changes in the phylogenetic tree, with site 16519 being the most variable. On average there were 4.5 changes for a character on the tree. The most variable sites (with 14 or more changes each listed from fastest to slowest) observed were 16519 (L = 29), 16311 (L = 27), 152 (L = 24), 146 (L = 21), 16172 (L = 17), 16189 (L = 17), 195 (L = 16), 16362 (L = 15), 16093 (L = 14), 16129 (L = 14) and 150 (L = 14). These rapidly changing sites are consistent with other published analyses. Only 28 SNPs are needed to identify all clusters containing 1% (n = 7) or more individuals in the East Asian data set. All 36 haplogroups previously observed in East Asian populations were also seen in the SWGDAM data sets and include: A, B, B4, B4a, B4b, B5a, B5b, C, D, D4, D4a, D4b, D5, D5a, F, F1, F1a, F1b, F1c, F2a, G2, G2a, M, M7a1, M7b, M7b1, M7b2, M7c, M8a, M9, M10, N9a, R, R9a, Y, and Z. Haplogroups A, B4a, D4, and F1a were the most commonly observed clusters in the Chinese data set (the largest of the data

  17. Dynamics and Control of DNA Sequence Amplification

    CERN Document Server

    Marimuthu, Karthikeyan

    2014-01-01

    DNA amplification is the process of replication of a specified DNA sequence \\emph{in vitro} through time-dependent manipulation of its external environment. A theoretical framework for determination of the optimal dynamic operating conditions of DNA amplification reactions, for any specified amplification objective, is presented based on first-principles biophysical modeling and control theory. Amplification of DNA is formulated as a problem in control theory with optimal solutions that can differ considerably from strategies typically used in practice. Using the Polymerase Chain Reaction (PCR) as an example, sequence-dependent biophysical models for DNA amplification are cast as control systems, wherein the dynamics of the reaction are controlled by a manipulated input variable. Using these control systems, we demonstrate that there exists an optimal temperature cycling strategy for geometric amplification of any DNA sequence and formulate optimal control problems that can be used to derive the optimal tempe...

  18. Intragenomic sequence variation at the ITS1 - ITS2 region and at the 18S and 28S nuclear ribosomal DNA genes of the New Zealand mud snail, Potamopyrgus antipodarum (Hydrobiidae: mollusca)

    Science.gov (United States)

    Hoy, Marshal S.; Rodriguez, Rusty J.

    2013-01-01

    Molecular genetic analysis was conducted on two populations of the invasive non-native New Zealand mud snail (Potamopyrgus antipodarum), one from a freshwater ecosystem in Devil's Lake (Oregon, USA) and the other from an ecosystem of higher salinity in the Columbia River estuary (Hammond Harbor, Oregon, USA). To elucidate potential genetic differences between the two populations, three segments of nuclear ribosomal DNA (rDNA), the ITS1-ITS2 regions and the 18S and 28S rDNA genes were cloned and sequenced. Variant sequences within each individual were found in all three rDNA segments. Folding models were utilized for secondary structure analysis and results indicated that there were many sequences which contained structure-altering polymorphisms, which suggests they could be nonfunctional pseudogenes. In addition, analysis of molecular variance (AMOVA) was used for hierarchical analysis of genetic variance to estimate variation within and among populations and within individuals. AMOVA revealed significant variation in the ITS region between the populations and among clones within individuals, while in the 5.8S rDNA significant variation was revealed among individuals within the two populations. High levels of intragenomic variation were found in the ITS regions, which are known to be highly variable in many organisms. More interestingly, intragenomic variation was also found in the 18S and 28S rDNA, which has rarely been observed in animals and is so far unreported in Mollusca. We postulate that in these P. antipodarum populations the effects of concerted evolution are diminished due to the fact that not all of the rDNA genes in their polyploid genome should be essential for sustaining cellular function. This could lead to a lessening of selection pressures, allowing mutations to accumulate in some copies, changing them into variant sequences.                   

  19. EGNAS: an exhaustive DNA sequence design algorithm

    Directory of Open Access Journals (Sweden)

    Kick Alfred

    2012-06-01

    Full Text Available Abstract Background The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of sequences with defined properties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences offers the possibility of controlling both interstrand and intrastrand properties. The guanine-cytosine content can be adjusted. Sequences can be forced to start and end with guanine or cytosine. This option reduces the risk of “fraying” of DNA strands. It is possible to limit cross hybridizations of a defined length, and to adjust the uniqueness of sequences. Self-complementarity and hairpin structures of certain length can be avoided. Sequences and subsequences can optionally be forbidden. Furthermore, sequences can be designed to have minimum interactions with predefined strands and neighboring sequences. Results The algorithm is realized in a C++ program. TAG sequences can be generated and combined with primers for single-base extension reactions, which were described for multiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldback through intrastrand interaction of TAG-primer pairs can be limited. The design of sequences for specific attachment of molecular constructs to DNA origami is presented. Conclusions We developed a new software tool called EGNAS for the design of unique nucleic acid sequences. The presented exhaustive algorithm allows to generate greater sets of sequences than with previous software and equal constraints. EGNAS is freely available for noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS.

  20. Syntenic homology of human unique DNA sequences within chromossome regions 5q31, 10q22, 13q32-33 and 19q13.1 in the great apes

    Directory of Open Access Journals (Sweden)

    Vallente-Samonte Rhea U.

    2000-01-01

    Full Text Available Homologies between chromosome banding patterns and DNA sequences in the great apes and humans suggest an apparent common origin for these two lineages. The availability of DNA probes for specific regions of human chromosomes (5q31, 10q22, 13q32-33 and 19q13.1 led us to cross-hybridize these to chimpanzee (Pan troglodytes, PTR, gorilla (Gorilla gorilla, GGO and orangutan (Pongo pygmaeus, PPY chromosomes in a search for equivalent regions in the great apes. Positive hybridization signals to the chromosome 5q31-specific DNA probe were observed at HSA 5q31, PTR 4q31, GGO 4q31 and PPY 4q31, while fluorescent signals using the chromosome 10q22-specific DNA probe were noted at HSA 10q22, PTR 8q22, GGO 8q22 and PPY 7q22. The chromosome arms showing hybridization signals to the Quint-EssentialTM 13-specific DNA probe were identified as HSA 13q32-33, PTR 14q32-33, GGO 14q32-33 and PPY 14q32-33, while those presenting hybridization signals to the chromosome 19q13.1-specific DNA probe were identified as HSA 19q13.1, PTR 20q13, GGO 20q13 and PPY 20q13. All four probes presumably hybridized to homologous chromosomal locations in the apes, which suggests a homology of certain unique DNA sequences among hominoid species.

  1. Inconsistencies in Neanderthal genomic DNA sequences.

    Directory of Open Access Journals (Sweden)

    Jeffrey D Wall

    2007-10-01

    Full Text Available Two recently published papers describe nuclear DNA sequences that were obtained from the same Neanderthal fossil. Our reanalyses of the data from these studies show that they are not consistent with each other and point to serious problems with the data quality in one of the studies, possibly due to modern human DNA contaminants and/or a high rate of sequencing errors.

  2. Mitochondrial DNA sequence evolution in shorebird populations

    NARCIS (Netherlands)

    Wenink, P.W.

    1994-01-01

    This thesis describes the global molecular population structure of two shorebird species, in particular of the dunlin, Calidris alpina, by means of comparative sequence analysis of the most variable part of the mitochondrial DNA (mtDNA) genome. There are several reasons

  3. On site DNA barcoding by nanopore sequencing.

    Directory of Open Access Journals (Sweden)

    Michele Menegon

    Full Text Available Biodiversity research is becoming increasingly dependent on genomics, which allows the unprecedented digitization and understanding of the planet's biological heritage. The use of genetic markers i.e. DNA barcoding, has proved to be a powerful tool in species identification. However, full exploitation of this approach is hampered by the high sequencing costs and the absence of equipped facilities in biodiversity-rich countries. In the present work, we developed a portable sequencing laboratory based on the portable DNA sequencer from Oxford Nanopore Technologies, the MinION. Complementary laboratory equipment and reagents were selected to be used in remote and tough environmental conditions. The performance of the MinION sequencer and the portable laboratory was tested for DNA barcoding in a mimicking tropical environment, as well as in a remote rainforest of Tanzania lacking electricity. Despite the relatively high sequencing error-rate of the MinION, the development of a suitable pipeline for data analysis allowed the accurate identification of different species of vertebrates including amphibians, reptiles and mammals. In situ sequencing of a wild frog allowed us to rapidly identify the species captured, thus confirming that effective DNA barcoding in the field is possible. These results open new perspectives for real-time-on-site DNA sequencing thus potentially increasing opportunities for the understanding of biodiversity in areas lacking conventional laboratory facilities.

  4. PCR primers for metazoan mitochondrial 12S ribosomal DNA sequences.

    Directory of Open Access Journals (Sweden)

    Ryuji J Machida

    Full Text Available BACKGROUND: Assessment of the biodiversity of communities of small organisms is most readily done using PCR-based analysis of environmental samples consisting of mixtures of individuals. Known as metagenetics, this approach has transformed understanding of microbial communities and is beginning to be applied to metazoans as well. Unlike microbial studies, where analysis of the 16S ribosomal DNA sequence is standard, the best gene for metazoan metagenetics is less clear. In this study we designed a set of PCR primers for the mitochondrial 12S ribosomal DNA sequence based on 64 complete mitochondrial genomes and then tested their efficacy. METHODOLOGY/PRINCIPAL FINDINGS: A total of the 64 complete mitochondrial genome sequences representing all metazoan classes available in GenBank were downloaded using the NCBI Taxonomy Browser. Alignment of sequences was performed for the excised mitochondrial 12S ribosomal DNA sequences, and conserved regions were identified for all 64 mitochondrial genomes. These regions were used to design a primer pair that flanks a more variable region in the gene. Then all of the complete metazoan mitochondrial genomes available in NCBI's Organelle Genome Resources database were used to determine the percentage of taxa that would likely be amplified using these primers. Results suggest that these primers will amplify target sequences for many metazoans. CONCLUSIONS/SIGNIFICANCE: Newly designed 12S ribosomal DNA primers have considerable potential for metazoan metagenetic analysis because of their ability to amplify sequences from many metazoans.

  5. DNA sequencing using fluorescence background electroblotting membrane

    Science.gov (United States)

    Caldwell, K.D.; Chu, T.J.; Pitt, W.G.

    1992-05-12

    A method for the multiplex sequencing on DNA is disclosed which comprises the electroblotting or specific base terminated DNA fragments, which have been resolved by gel electrophoresis, onto the surface of a neutral non-aromatic polymeric microporous membrane exhibiting low background fluorescence which has been surface modified to contain amino groups. Polypropylene membranes are preferably and the introduction of amino groups is accomplished by subjecting the membrane to radio or microwave frequency plasma discharge in the presence of an aminating agent, preferably ammonia. The membrane, containing physically adsorbed DNA fragments on its surface after the electroblotting, is then treated with crosslinking means such as UV radiation or a glutaraldehyde spray to chemically bind the DNA fragments to the membrane through amino groups contained on the surface. The DNA fragments chemically bound to the membrane are subjected to hybridization probing with a tagged probe specific to the sequence of the DNA fragments. The tagging may be by either fluorophores or radioisotopes. The tagged probes hybridized to the target DNA fragments are detected and read by laser induced fluorescence detection or autoradiograms. The use of aminated low fluorescent background membranes allows the use of fluorescent detection and reading even when the available amount of DNA to be sequenced is small. The DNA bound to the membranes may be reprobed numerous times. No Drawings

  6. Analysis of a new strain of Euphorbia mosaic virus with distinct replication specificity unveils a lineage of begomoviruses with short Rep sequences in the DNA-B intergenic region

    Directory of Open Access Journals (Sweden)

    Argüello-Astorga Gerardo R

    2010-10-01

    Full Text Available Abstract Background Euphorbia mosaic virus (EuMV is a member of the SLCV clade, a lineage of New World begomoviruses that display distinctive features in their replication-associated protein (Rep and virion-strand replication origin. The first entirely characterized EuMV isolate is native from Yucatan Peninsula, Mexico; subsequently, EuMV was detected in weeds and pepper plants from another region of Mexico, and partial DNA-A sequences revealed significant differences in their putative replication specificity determinants with respect to EuMV-YP. This study was aimed to investigate the replication compatibility between two EuMV isolates from the same country. Results A new isolate of EuMV was obtained from pepper plants collected at Jalisco, Mexico. Full-length clones of both genomic components of EuMV-Jal were biolistically inoculated into plants of three different species, which developed symptoms indistinguishable from those induced by EuMV-YP. Pseudorecombination experiments with EuMV-Jal and EuMV-YP genomic components demonstrated that these viruses do not form infectious reassortants in Nicotiana benthamiana, presumably because of Rep-iteron incompatibility. Sequence analysis of the EuMV-Jal DNA-B intergenic region (IR led to the unexpected discovery of a 35-nt-long sequence that is identical to a segment of the rep gene in the cognate viral DNA-A. Similar short rep sequences ranging from 35- to 51-nt in length were identified in all EuMV isolates and in three distinct viruses from South America related to EuMV. These short rep sequences in the DNA-B IR are positioned downstream to a ~160-nt non-coding domain highly similar to the CP promoter of begomoviruses belonging to the SLCV clade. Conclusions EuMV strains are not compatible in replication, indicating that this begomovirus species probably is not a replicating lineage in nature. The genomic analysis of EuMV-Jal led to the discovery of a subgroup of SLCV clade viruses that contain in

  7. Osmylated DNA, a novel concept for sequencing DNA using nanopores.

    Science.gov (United States)

    Kanavarioti, Anastassia

    2015-03-27

    Saenger sequencing has led the advances in molecular biology, while faster and cheaper next generation technologies are urgently needed. A newer approach exploits nanopores, natural or solid-state, set in an electrical field, and obtains base sequence information from current variations due to the passage of a ssDNA molecule through the pore. A hurdle in this approach is the fact that the four bases are chemically comparable to each other which leads to small differences in current obstruction. 'Base calling' becomes even more challenging because most nanopores sense a short sequence and not individual bases. Perhaps sequencing DNA via nanopores would be more manageable, if only the bases were two, and chemically very different from each other; a sequence of 1s and 0s comes to mind. Osmylated DNA comes close to such a sequence of 1s and 0s. Osmylation is the addition of osmium tetroxide bipyridine across the C5-C6 double bond of the pyrimidines. Osmylation adds almost 400% mass to the reactive base, creates a sterically and electronically notably different molecule, labeled 1, compared to the unreactive purines, labeled 0. If osmylated DNA were successfully sequenced, the result would be a sequence of osmylated pyrimidines (1), and purines (0), and not of the actual nucleobases. To solve this problem we studied the osmylation reaction with short oligos and with M13mp18, a long ssDNA, developed a UV-vis assay to measure extent of osmylation, and designed two protocols. Protocol A uses mild conditions and yields osmylated thymidines (1), while leaving the other three bases (0) practically intact. Protocol B uses harsher conditions and effectively osmylates both pyrimidines, but not the purines. Applying these two protocols also to the complementary of the target polynucleotide yields a total of four osmylated strands that collectively could define the actual base sequence of the target DNA.

  8. Improved Algorithm for Analysis of DNA Sequences Using Multiresolution Transformation

    Directory of Open Access Journals (Sweden)

    T. M. Inbamalar

    2015-01-01

    Full Text Available Bioinformatics and genomic signal processing use computational techniques to solve various biological problems. They aim to study the information allied with genetic materials such as the deoxyribonucleic acid (DNA, the ribonucleic acid (RNA, and the proteins. Fast and precise identification of the protein coding regions in DNA sequence is one of the most important tasks in analysis. Existing digital signal processing (DSP methods provide less accurate and computationally complex solution with greater background noise. Hence, improvements in accuracy, computational complexity, and reduction in background noise are essential in identification of the protein coding regions in the DNA sequences. In this paper, a new DSP based method is introduced to detect the protein coding regions in DNA sequences. Here, the DNA sequences are converted into numeric sequences using electron ion interaction potential (EIIP representation. Then discrete wavelet transformation is taken. Absolute value of the energy is found followed by proper threshold. The test is conducted using the data bases available in the National Centre for Biotechnology Information (NCBI site. The comparative analysis is done and it ensures the efficiency of the proposed system.

  9. Improved algorithm for analysis of DNA sequences using multiresolution transformation.

    Science.gov (United States)

    Inbamalar, T M; Sivakumar, R

    2015-01-01

    Bioinformatics and genomic signal processing use computational techniques to solve various biological problems. They aim to study the information allied with genetic materials such as the deoxyribonucleic acid (DNA), the ribonucleic acid (RNA), and the proteins. Fast and precise identification of the protein coding regions in DNA sequence is one of the most important tasks in analysis. Existing digital signal processing (DSP) methods provide less accurate and computationally complex solution with greater background noise. Hence, improvements in accuracy, computational complexity, and reduction in background noise are essential in identification of the protein coding regions in the DNA sequences. In this paper, a new DSP based method is introduced to detect the protein coding regions in DNA sequences. Here, the DNA sequences are converted into numeric sequences using electron ion interaction potential (EIIP) representation. Then discrete wavelet transformation is taken. Absolute value of the energy is found followed by proper threshold. The test is conducted using the data bases available in the National Centre for Biotechnology Information (NCBI) site. The comparative analysis is done and it ensures the efficiency of the proposed system.

  10. PREDICTION OF CHROMATIN STATES USING DNA SEQUENCE PROPERTIES

    KAUST Repository

    Bahabri, Rihab R.

    2013-06-01

    Activities of DNA are to a great extent controlled epigenetically through the internal struc- ture of chromatin. This structure is dynamic and is influenced by different modifications of histone proteins. Various combinations of epigenetic modification of histones pinpoint to different functional regions of the DNA determining the so-called chromatin states. How- ever, the characterization of chromatin states by the DNA sequence properties remains largely unknown. In this study we aim to explore whether DNA sequence patterns in the human genome can characterize different chromatin states. Using DNA sequence motifs we built binary classifiers for each chromatic state to eval- uate whether a given genomic sequence is a good candidate for belonging to a particular chromatin state. Of four classification algorithms (C4.5, Naive Bayes, Random Forest, and SVM) used for this purpose, the decision tree based classifiers (C4.5 and Random Forest) yielded best results among those we evaluated. Our results suggest that in general these models lack sufficient predictive power, although for four chromatin states (insulators, het- erochromatin, and two types of copy number variation) we found that presence of certain motifs in DNA sequences does imply an increased probability that such a sequence is one of these chromatin states.

  11. Nuclear and mitochondrial DNA sequences from two Denisovan individuals.

    Science.gov (United States)

    Sawyer, Susanna; Renaud, Gabriel; Viola, Bence; Hublin, Jean-Jacques; Gansauge, Marie-Theres; Shunkov, Michael V; Derevianko, Anatoly P; Prüfer, Kay; Kelso, Janet; Pääbo, Svante

    2015-12-22

    Denisovans, a sister group of Neandertals, have been described on the basis of a nuclear genome sequence from a finger phalanx (Denisova 3) found in Denisova Cave in the Altai Mountains. The only other Denisovan specimen described to date is a molar (Denisova 4) found at the same site. This tooth carries a mtDNA sequence similar to that of Denisova 3. Here we present nuclear DNA sequences from Denisova 4 and a morphological description, as well as mitochondrial and nuclear DNA sequence data, from another molar (Denisova 8) found in Denisova Cave in 2010. This new molar is similar to Denisova 4 in being very large and lacking traits typical of Neandertals and modern humans. Nuclear DNA sequences from the two molars form a clade with Denisova 3. The mtDNA of Denisova 8 is more diverged and has accumulated fewer substitutions than the mtDNAs of the other two specimens, suggesting Denisovans were present in the region over an extended period. The nuclear DNA sequence diversity among the three Denisovans is comparable to that among six Neandertals, but lower than that among present-day humans.

  12. Visual DNA -- identification of DNA sequence variations by bead trapping.

    Science.gov (United States)

    Ståhl, Patrik L; Gantelius, Jesper; Natanaelsson, Christian; Ahmadian, Afshin; Andersson-Svahn, Helene; Lundeberg, Joakim

    2007-12-01

    In this paper we describe a method that uses the nearly covalent strength biotin-streptavidin interaction to attach a paramagnetic bead of micrometer size to a DNA molecule of nanometer size, scaling up the spatial size of a query DNA strand by a factor of 1000, making it visible to the human eye. The use of magnetic principles enables rapid binding and washing of detector beads, facilitating a readout of amplified DNA sequences in a few minutes. Here we exemplify the method on mitochondrial DNA variations using an array platform. Visual identification and documentation can be performed with an ordinary mobile phone equipped with a built-in camera.

  13. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  14. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    Science.gov (United States)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  15. Entire Mitochondrial DNA Sequencing on Massively Parallel Sequencing for the Korean Population.

    Science.gov (United States)

    Park, Sohyung; Cho, Sohee; Seo, Hee Jin; Lee, Ji Hyun; Kim, Moon Young; Lee, Soong Deok

    2017-04-01

    Mitochondrial DNA (mtDNA) genome analysis has been a potent tool in forensic practice as well as in the understanding of human phylogeny in the maternal lineage. The traditional mtDNA analysis is focused on the control region, but the introduction of massive parallel sequencing (MPS) has made the typing of the entire mtDNA genome (mtGenome) more accessible for routine analysis. The complete mtDNA information can provide large amounts of novel genetic data for diverse populations as well as improved discrimination power for identification. The genetic diversity of the mtDNA sequence in different ethnic populations has been revealed through MPS analysis, but the Korean population not only has limited MPS data for the entire mtGenome, the existing data is mainly focused on the control region. In this study, the complete mtGenome data for 186 Koreans, obtained using Ion Torrent Personal Genome Machine (PGM) technology and retrieved from rather common mtDNA haplogroups based on the control region sequence, are described. The results showed that 24 haplogroups, determined with hypervariable regions only, branched into 47 subhaplogroups, and point heteroplasmy was more frequent in the coding regions. In addition, sequence variations in the coding regions observed in this study were compared with those presented in other reports on different populations, and there were similar features observed in the sequence variants for the predominant haplogroups among East Asian populations, such as Haplogroup D and macrohaplogroups M9, G, and D. This study is expected to be the trigger for the development of Korean specific mtGenome data followed by numerous future studies. © 2017 The Korean Academy of Medical Sciences.

  16. Chromosome number9 specific repetitive DNA sequence

    Energy Technology Data Exchange (ETDEWEB)

    Joste, N.E.; Cram, L.S.; Hildebrand, C.E.; Jones, M.; Longmire, J.; Robinson, T.; Moyzis, R.K.

    1986-05-01

    Human repetitive DNA libraries have been constructed and various recombinant DNA clones isolated that are likely candidates for chromosome specific sequences. The first clone tested (pHuR 98; plasmid human repeat 98) was biotinylated and hybridized to human chromosomes in situ. The hybridized recombinant probe was detected with fluoresceinated avidin, and chromosomes were counter-stained with either propidium iodide or distamycin-DAPI. Specific hybridization to chromosome band 9q1 was obtained. The localization was confirmed by hybridizing radiolabeled pHuR 98 DNA to human chromosomes sorted by flow cytometry. Various methods, including orthogonal field pulsed gel electrophoresis analysis indicate that 75 kilobase blocks of this sequence are interspersed with other repetitive DNA sequences in this chromosome band. This study is the first to report a human repetitive DNA sequence uniquely localized to a specific chromosome. This clone provides an easily detected and highly specific chromosomal marker for molecular cytogenetic analyses in numerous basic research and clinical studies.

  17. Restriction and Sequence Alterations Affect DNA Uptake Sequence-Dependent Transformation in Neisseria meningitidis

    Science.gov (United States)

    Ambur, Ole Herman; Frye, Stephan A.; Nilsen, Mariann; Hovland, Eirik; Tønjum, Tone

    2012-01-01

    Transformation is a complex process that involves several interactions from the binding and uptake of naked DNA to homologous recombination. Some actions affect transformation favourably whereas others act to limit it. Here, meticulous manipulation of a single type of transforming DNA allowed for quantifying the impact of three different mediators of meningococcal transformation: NlaIV restriction, homologous recombination and the DNA Uptake Sequence (DUS). In the wildtype, an inverse relationship between the transformation frequency and the number of NlaIV restriction sites in DNA was observed when the transforming DNA harboured a heterologous region for selection (ermC) but not when the transforming DNA was homologous with only a single nucleotide heterology. The influence of homologous sequence in transforming DNA was further studied using plasmids with a small interruption or larger deletions in the recombinogenic region and these alterations were found to impair transformation frequency. In contrast, a particularly potent positive driver of DNA uptake in Neisseria sp. are short DUS in the transforming DNA. However, the molecular mechanism(s) responsible for DUS specificity remains unknown. Increasing the number of DUS in the transforming DNA was here shown to exert a positive effect on transformation. Furthermore, an influence of variable placement of DUS relative to the homologous region in the donor DNA was documented for the first time. No effect of altering the orientation of DUS was observed. These observations suggest that DUS is important at an early stage in the recognition of DNA, but does not exclude the existence of more than one level of DUS specificity in the sequence of events that constitute transformation. New knowledge on the positive and negative drivers of transformation may in a larger perspective illuminate both the mechanisms and the evolutionary role(s) of one of the most conserved mechanisms in nature: homologous recombination. PMID

  18. Restriction and sequence alterations affect DNA uptake sequence-dependent transformation in Neisseria meningitidis.

    Directory of Open Access Journals (Sweden)

    Ole Herman Ambur

    Full Text Available Transformation is a complex process that involves several interactions from the binding and uptake of naked DNA to homologous recombination. Some actions affect transformation favourably whereas others act to limit it. Here, meticulous manipulation of a single type of transforming DNA allowed for quantifying the impact of three different mediators of meningococcal transformation: NlaIV restriction, homologous recombination and the DNA Uptake Sequence (DUS. In the wildtype, an inverse relationship between the transformation frequency and the number of NlaIV restriction sites in DNA was observed when the transforming DNA harboured a heterologous region for selection (ermC but not when the transforming DNA was homologous with only a single nucleotide heterology. The influence of homologous sequence in transforming DNA was further studied using plasmids with a small interruption or larger deletions in the recombinogenic region and these alterations were found to impair transformation frequency. In contrast, a particularly potent positive driver of DNA uptake in Neisseria sp. are short DUS in the transforming DNA. However, the molecular mechanism(s responsible for DUS specificity remains unknown. Increasing the number of DUS in the transforming DNA was here shown to exert a positive effect on transformation. Furthermore, an influence of variable placement of DUS relative to the homologous region in the donor DNA was documented for the first time. No effect of altering the orientation of DUS was observed. These observations suggest that DUS is important at an early stage in the recognition of DNA, but does not exclude the existence of more than one level of DUS specificity in the sequence of events that constitute transformation. New knowledge on the positive and negative drivers of transformation may in a larger perspective illuminate both the mechanisms and the evolutionary role(s of one of the most conserved mechanisms in nature: homologous

  19. A Bioluminometric Method of DNA Sequencing

    Science.gov (United States)

    Ronaghi, Mostafa; Pourmand, Nader; Stolc, Viktor; Arnold, Jim (Technical Monitor)

    2001-01-01

    Pyrosequencing is a bioluminometric single-tube DNA sequencing method that takes advantage of co-operativity between four enzymes to monitor DNA synthesis. In this sequencing-by-synthesis method, a cascade of enzymatic reactions yields detectable light, which is proportional to incorporated nucleotides. Pyrosequencing has the advantages of accuracy, flexibility and parallel processing. It can be easily automated. Furthermore, the technique dispenses with the need for labeled primers, labeled nucleotides and gel-electrophoresis. In this chapter, the use of this technique for different applications is discussed.

  20. DNA watermarks in non-coding regulatory sequences

    Directory of Open Access Journals (Sweden)

    Pyka Martin

    2009-07-01

    Full Text Available Abstract Background DNA watermarks can be applied to identify the unauthorized use of genetically modified organisms. It has been shown that coding regions can be used to encrypt information into living organisms by using the DNA-Crypt algorithm. Yet, if the sequence of interest presents a non-coding DNA sequence, either the function of a resulting functional RNA molecule or a regulatory sequence, such as a promoter, could be affected. For our studies we used the small cytoplasmic RNA 1 in yeast and the lac promoter region of Escherichia coli. Findings The lac promoter was deactivated by the integrated watermark. In addition, the RNA molecules displayed altered configurations after introducing a watermark, but surprisingly were functionally intact, which has been verified by analyzing the growth characteristics of both wild type and watermarked scR1 transformed yeast cells. In a third approach we introduced a second overlapping watermark into the lac promoter, which did not affect the promoter activity. Conclusion Even though the watermarked RNA and one of the watermarked promoters did not show any significant differences compared to the wild type RNA and wild type promoter region, respectively, it cannot be generalized that other RNA molecules or regulatory sequences behave accordingly. Therefore, we do not recommend integrating watermark sequences into regulatory regions.

  1. The DNA sequence specificity of bleomycin cleavage in a systematically altered DNA sequence.

    Science.gov (United States)

    Gautam, Shweta D; Chen, Jon K; Murray, Vincent

    2017-08-01

    Bleomycin is an anti-tumour agent that is clinically used to treat several types of cancers. Bleomycin cleaves DNA at specific DNA sequences and recent genome-wide DNA sequencing specificity data indicated that the sequence 5'-RTGT*AY (where T* is the site of bleomycin cleavage, R is G/A and Y is T/C) is preferentially cleaved by bleomycin in human cells. Based on this DNA sequence, we constructed a plasmid clone to explore this bleomycin cleavage preference. By systematic variation of single nucleotides in the 5'-RTGT*AY sequence, we were able to investigate the effect of nucleotide changes on bleomycin cleavage efficiency. We observed that the preferred consensus DNA sequence for bleomycin cleavage in the plasmid clone was 5'-YYGT*AW (where W is A/T). The most highly cleaved sequence was 5'-TCGT*AT and, in fact, the seven most highly cleaved sequences conformed to the consensus sequence 5'-YYGT*AW. A comparison with genome-wide results was also performed and while the core sequence was similar in both environments, the surrounding nucleotides were different.

  2. Vander Lugt correlation of DNA sequence data

    Science.gov (United States)

    Christens-Barry, William A.; Hawk, James F.; Martin, James C.

    1990-12-01

    DNA, the molecule containing the genetic code of an organism, is a linear chain of subunits. It is the sequence of subunits, of which there are four kinds, that constitutes the unique blueprint of an individual. This sequence is the focus of a large number of analyses performed by an army of geneticists, biologists, and computer scientists. Most of these analyses entail searches for specific subsequences within the larger set of sequence data. Thus, most analyses are essentially pattern recognition or correlation tasks. Yet, there are special features to such analysis that influence the strategy and methods of an optical pattern recognition approach. While the serial processing employed in digital electronic computers remains the main engine of sequence analyses, there is no fundamental reason that more efficient parallel methods cannot be used. We describe an approach using optical pattern recognition (OPR) techniques based on matched spatial filtering. This allows parallel comparison of large blocks of sequence data. In this study we have simulated a Vander Lugt1 architecture implementing our approach. Searches for specific target sequence strings within a block of DNA sequence from the Co/El plasmid2 are performed.

  3. Sequence-Dependent Persistence Lengths of DNA.

    Science.gov (United States)

    Mitchell, Jonathan S; Glowacki, Jaroslaw; Grandchamp, Alexandre E; Manning, Robert S; Maddocks, John H

    2017-04-11

    A Monte Carlo code applied to the cgDNA coarse-grain rigid-base model of B-form double-stranded DNA is used to predict a sequence-averaged persistence length of l F = 53.5 nm in the sense of Flory, and of l p = 160 bp or 53.5 nm in the sense of apparent tangent-tangent correlation decay. These estimates are slightly higher than the consensus experimental values of 150 bp or 50 nm, but we believe the agreement to be good given that the cgDNA model is itself parametrized from molecular dynamics simulations of short fragments of length 10-20 bp, with no explicit fit to persistence length. Our Monte Carlo simulations further predict that there can be substantial dependence of persistence lengths on the specific sequence [Formula: see text] of a fragment. We propose, and confirm the numerical accuracy of, a simple factorization that separates the part of the apparent tangent-tangent correlation decay [Formula: see text] attributable to intrinsic shape, from a part [Formula: see text] attributable purely to stiffness, i.e., a sequence-dependent version of what has been called sequence-averaged dynamic persistence length l̅ d (=58.8 nm within the cgDNA model). For ensembles of both random and λ-phage fragments, the apparent persistence length [Formula: see text] has a standard deviation of 4 nm over sequence, whereas our dynamic persistence length [Formula: see text] has a standard deviation of only 1 nm. However, there are notable dynamic persistence length outliers, including poly(A) (exceptionally straight and stiff), poly(TA) (tightly coiled and exceptionally soft), and phased A-tract sequence motifs (exceptionally bent and stiff). The results of our numerical simulations agree reasonably well with both molecular dynamics simulation and diverse experimental data including minicircle cyclization rates and stereo cryo-electron microscopy images.

  4. Is photocleavage of DNA by YOYO-1 using a synchrotron radiation light source sequence dependent?

    DEFF Research Database (Denmark)

    Gilroy, Emma L.; Hoffmann, Søren Vrønning; Jones, Nykola C.

    2011-01-01

    ) throughout the irradiation period. The dependence of LD signals on DNA sequences and on time in the intense light beam was explored and quantified for single-stranded poly(dA), poly[(dA-dT)2], calf thymus DNA (ctDNA) and Micrococcus luteus DNA (mlDNA). The DNA and ligand regions of the spectrum showed...... was predominantly responsible for the catalysis of DNA cleavage. In homopolymeric DNAs, intercalated YOYO was unable to cleave DNA. In mixed-sequence DNAs the data suggest that YOYO in some but not all intercalated binding sites can cause cleavage. It is also likely that cleavage occurs at transient single...

  5. Defining the sequence requirements for the positioning of base J in DNA using SMRT sequencing.

    Science.gov (United States)

    Genest, Paul-Andre; Baugh, Loren; Taipale, Alex; Zhao, Wanqi; Jan, Sabrina; van Luenen, Henri G A M; Korlach, Jonas; Clark, Tyson; Luong, Khai; Boitano, Matthew; Turner, Steve; Myler, Peter J; Borst, Piet

    2015-02-27

    Base J (β-D-glucosyl-hydroxymethyluracil) replaces 1% of T in the Leishmania genome and is only found in telomeric repeats (99%) and in regions where transcription starts and stops. This highly restricted distribution must be co-determined by the thymidine hydroxylases (JBP1 and JBP2) that catalyze the initial step in J synthesis. To determine the DNA sequences recognized by JBP1/2, we used SMRT sequencing of DNA segments inserted into plasmids grown in Leishmania tarentolae. We show that SMRT sequencing recognizes base J in DNA. Leishmania DNA segments that normally contain J also picked up J when present in the plasmid, whereas control sequences did not. Even a segment of only 10 telomeric (GGGTTA) repeats was modified in the plasmid. We show that J modification usually occurs at pairs of Ts on opposite DNA strands, separated by 12 nucleotides. Modifications occur near G-rich sequences capable of forming G-quadruplexes and JBP2 is needed, as it does not occur in JBP2-null cells. We propose a model whereby de novo J insertion is mediated by JBP2. JBP1 then binds to J and hydroxylates another T 13 bp downstream (but not upstream) on the complementary strand, allowing JBP1 to maintain existing J following DNA replication. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Analysis of mitochondrial DNA sequences in patients with isolated or combined oxidative phosphorylation system deficiency.

    NARCIS (Netherlands)

    Hinttala, R.; Smeets, R.; Moilanen, J.S.; Ugalde, C.; Uusimaa, J.; Smeitink, J.A.M.; Majamaa, K.

    2006-01-01

    BACKGROUND: Enzyme deficiencies of the oxidative phosphorylation (OXPHOS) system may be caused by mutations in the mitochondrial DNA (mtDNA) or in the nuclear DNA. OBJECTIVE: To analyse the sequences of the mtDNA coding region in 25 patients with OXPHOS system deficiency to identify the underlying

  7. Dog Y chromosomal DNA sequence: identification, sequencing and SNP discovery

    Directory of Open Access Journals (Sweden)

    Kirkness Ewen

    2006-10-01

    Full Text Available Abstract Background Population genetic studies of dogs have so far mainly been based on analysis of mitochondrial DNA, describing only the history of female dogs. To get a picture of the male history, as well as a second independent marker, there is a need for studies of biallelic Y-chromosome polymorphisms. However, there are no biallelic polymorphisms reported, and only 3200 bp of non-repetitive dog Y-chromosome sequence deposited in GenBank, necessitating the identification of dog Y chromosome sequence and the search for polymorphisms therein. The genome has been only partially sequenced for one male dog, disallowing mapping of the sequence into specific chromosomes. However, by comparing the male genome sequence to the complete female dog genome sequence, candidate Y-chromosome sequence may be identified by exclusion. Results The male dog genome sequence was analysed by Blast search against the human genome to identify sequences with a best match to the human Y chromosome and to the female dog genome to identify those absent in the female genome. Candidate sequences were then tested for male specificity by PCR of five male and five female dogs. 32 sequences from the male genome, with a total length of 24 kbp, were identified as male specific, based on a match to the human Y chromosome, absence in the female dog genome and male specific PCR results. 14437 bp were then sequenced for 10 male dogs originating from Europe, Southwest Asia, Siberia, East Asia, Africa and America. Nine haplotypes were found, which were defined by 14 substitutions. The genetic distance between the haplotypes indicates that they originate from at least five wolf haplotypes. There was no obvious trend in the geographic distribution of the haplotypes. Conclusion We have identified 24159 bp of dog Y-chromosome sequence to be used for population genetic studies. We sequenced 14437 bp in a worldwide collection of dogs, identifying 14 SNPs for future SNP analyses, and

  8. Physical localization of the 18S-5.8S-26S rDNA and sequence analysis of ITS regions in Thinopyrum ponticum (Poaceae: Triticeae): implications for concerted evolution.

    Science.gov (United States)

    Li, Dayong; Zhang, Xueyong

    2002-10-01

    Fluorescence in situ hybridization was used in Thinopyrum ponticum, a decaploid species, and its related diploid species, to investigate the distribution of the 18S-5.8S-26S rDNA. The distribution of rDNA was similar in all three diploid species (Th. bessarabicum, Th. elongatum and Pseudoroegneria stipifolia). Two pairs of loci were observed in each somatic cell at metaphase and interphase. One pair was located near the terminal end and the other in the interstitial regions of the short arms of one pair of chromosomes. However, all of the major loci in Th. ponticum were located on the terminal end of the short arms of chromosomes, and one chromosome had only one major locus. The maximum number of major loci detected on metaphase spreads was 20, which was the sum of that of its progenitors. The interstitial loci that exist in the possible diploid genome donor species were probably 'lost' during the evolutionary process of the decaploid species. A number of minor loci were also detected on whole regions of two pairs of homologous chromosomes. These results suggested that the position of rDNA loci in the Triticeae might be changeable rather than fixed. Positional changes of 18S-5.8S-26S rDNA loci between Th. ponticum and its candidate genome donors indicate that it is almost impossible to find a genome in the polyploid species that is completely identical to that of its diploid donors. The possible evolutionary significance of the distribution of the rDNA is also discussed. Internal transcribed spacer (ITS) regions of nuclear DNA in Th. ponticum were investigated by PCR amplification and sequencing. The sequence data from five positive clones selected at random, together with restriction site analysis, indicated that the ITS repeated units are nearly homogeneous in this autoallodecapolypoid species. Combined with in situ hybridization results, the data led to the conclusion that the ITS region has experienced interlocus as well as intralocus concerted evolution

  9. Physical Localization of the 18S‐5·8S‐26S rDNA and Sequence Analysis of ITS Regions in Thinopyrum ponticum (Poaceae: Triticeae): Implications for Concerted Evolution

    Science.gov (United States)

    LI, DAYONG; ZHANG, XUEYONG

    2002-01-01

    Fluorescence in situ hybridization was used in Thinopyrum ponticum, a decaploid species, and its related diploid species, to investigate the distribution of the 18S‐5·8S‐26S rDNA. The distribution of rDNA was similar in all three diploid species (Th. bessarabicum, Th. elongatum and Pseudoroegneria stipifolia). Two pairs of loci were observed in each somatic cell at metaphase and interphase. One pair was located near the terminal end and the other in the interstitial regions of the short arms of one pair of chromosomes. However, all of the major loci in Th. ponticum were located on the terminal end of the short arms of chromosomes, and one chromosome had only one major locus. The maximum number of major loci detected on metaphase spreads was 20, which was the sum of that of its progenitors. The interstitial loci that exist in the possible diploid genome donor species were probably ‘lost’ during the evolutionary process of the decaploid species. A number of minor loci were also detected on whole regions of two pairs of homologous chromosomes. These results suggested that the position of rDNA loci in the Triticeae might be changeable rather than fixed. Positional changes of 18S‐5·8S‐26S rDNA loci between Th. ponticum and its candidate genome donors indicate that it is almost impossible to find a genome in the polyploid species that is completely identical to that of its diploid donors. The possible evolutionary significance of the distribution of the rDNA is also discussed. Internal transcribed spacer (ITS) regions of nuclear DNA in Th. ponticum were investigated by PCR amplification and sequencing. The sequence data from five positive clones selected at random, together with restriction site analysis, indicated that the ITS repeated units are nearly homogeneous in this autoallodecapolypoid species. Combined with in situ hybridization results, the data led to the conclusion that the ITS region has experienced interlocus as well as intralocus concerted

  10. Special Issue: Next Generation DNA Sequencing

    Directory of Open Access Journals (Sweden)

    Paul Richardson

    2010-10-01

    Full Text Available Next Generation Sequencing (NGS refers to technologies that do not rely on traditional dideoxy-nucleotide (Sanger sequencing where labeled DNA fragments are physically resolved by electrophoresis. These new technologies rely on different strategies, but essentially all of them make use of real-time data collection of a base level incorporation event across a massive number of reactions (on the order of millions versus 96 for capillary electrophoresis for instance. The major commercial NGS platforms available to researchers are the 454 Genome Sequencer (Roche, Illumina (formerly Solexa Genome analyzer, the SOLiD system (Applied Biosystems/Life Technologies and the Heliscope (Helicos Corporation. The techniques and different strategies utilized by these platforms are reviewed in a number of the papers in this special issue. These technologies are enabling new applications that take advantage of the massive data produced by this next generation of sequencing instruments. [...

  11. DNA sequencing versus standard prenatal aneuploidy screening.

    Science.gov (United States)

    Bianchi, Diana W; Parker, R Lamar; Wentworth, Jeffrey; Madankumar, Rajeevi; Saffer, Craig; Das, Anita F; Craig, Joseph A; Chudova, Darya I; Devers, Patricia L; Jones, Keith W; Oliver, Kelly; Rava, Richard P; Sehnert, Amy J

    2014-02-27

    In high-risk pregnant women, noninvasive prenatal testing with the use of massively parallel sequencing of maternal plasma cell-free DNA (cfDNA testing) accurately detects fetal autosomal aneuploidy. Its performance in low-risk women is unclear. At 21 centers in the United States, we collected blood samples from women with singleton pregnancies who were undergoing standard aneuploidy screening (serum biochemical assays with or without nuchal translucency measurement). We performed massively parallel sequencing in a blinded fashion to determine the chromosome dosage for each sample. The primary end point was a comparison of the false positive rates of detection of fetal trisomies 21 and 18 with the use of standard screening and cfDNA testing. Birth outcomes or karyotypes were the reference standard. The primary series included 1914 women (mean age, 29.6 years) with an eligible sample, a singleton fetus without aneuploidy, results from cfDNA testing, and a risk classification based on standard screening. For trisomies 21 and 18, the false positive rates with cfDNA testing were significantly lower than those with standard screening (0.3% vs. 3.6% for trisomy 21, Paneuploidy (5 for trisomy 21, 2 for trisomy 18, and 1 for trisomy 13; negative predictive value, 100% [95% confidence interval, 99.8 to 100]). The positive predictive values for cfDNA testing versus standard screening were 45.5% versus 4.2% for trisomy 21 and 40.0% versus 8.3% for trisomy 18. In a general obstetrical population, prenatal testing with the use of cfDNA had significantly lower false positive rates and higher positive predictive values for detection of trisomies 21 and 18 than standard screening. (Funded by Illumina; ClinicalTrials.gov number, NCT01663350.).

  12. Sequence periodicity of Escherichia coli is concentrated in intergenic regions

    Directory of Open Access Journals (Sweden)

    Trifonov Edward N

    2004-08-01

    Full Text Available Abstract Background Sequence periodicity with a period close to the DNA helical repeat is a very basic genomic property. This genomic feature was demonstrated for many prokaryotic genomes. The Escherichia coli sequences display the period close to 11 base pairs. Results Here we demonstrate that practically only ApA/TpT dinucleotides contribute to overall dinucleotide periodicity in Escherichia coli. The noncoding sequences reveal this periodicity much more prominently compared to protein-coding sequences. The sequence periodicity of ApC/GpT, ApT and GpC dinucleotides along the Escherichia coli K-12 is found to be located as well mainly within the intergenic regions. Conclusions The observed concentration of the dinucleotide sequence periodicity in the intergenic regions of E. coli suggests that the periodicity is a typical property of prokaryotic intergenic regions. We suppose that this preferential distribution of dinucleotide periodicity serves many biological functions; first of all, the regulation of transcription.

  13. Targeted deep DNA methylation analysis of circulating cell-free DNA in plasma using massively parallel semiconductor sequencing.

    Science.gov (United States)

    Vaca-Paniagua, Felipe; Oliver, Javier; Nogueira da Costa, Andre; Merle, Philippe; McKay, James; Herceg, Zdenko; Holmila, Reetta

    2015-01-01

    To set up a targeted methylation analysis using semiconductor sequencing and evaluate the potential for studying methylation in circulating cell-free DNA (cfDNA). Methylation of VIM, FBLN1, LTBP2, HINT2, h19 and IGF2 was analyzed in plasma cfDNA and white blood cell DNA obtained from eight hepatocellular carcinoma patients and eight controls using Ion Torrent™ PGM sequencer. h19 and IGF2 showed consistent methylation levels and methylation was detected for VIM and FBLN1, whereas LTBP2 and HINT2 did not show methylation for target regions. VIM gene promoter methylation was higher in HCC cfDNA than in cfDNA of controls or white blood cell DNA. Semiconductor sequencing is a suitable method for analyzing methylation profiles in cfDNA. Furthermore, differences in cfDNA methylation can be detected between controls and hepatocellular carcinoma cases, even though due to the small sample set these results need further validation.

  14. [Images of Alu-sequence in 7 DNA clones from the human genome].

    Science.gov (United States)

    Korotkov, E V

    1987-01-01

    Information theory methods were used for computer search of Alu-like sequences in human DNA and RNA. Eight new regions related to the Alu repeat sequence was revealed in 85 clones from the EMBL-5 data bank. Some of these regions are purine-pyrimidine images of Alu repeats sequence, the rest are more complex images of Alu repeat sequence. A new definition for the likeness of different sequences--information image of sequence--was introduced. This information theory application greatly increases the power of DNA sequences computer analysis.

  15. Genomic signal processing for DNA sequence clustering.

    Science.gov (United States)

    Mendizabal-Ruiz, Gerardo; Román-Godínez, Israel; Torres-Ramos, Sulema; Salido-Ruiz, Ricardo A; Vélez-Pérez, Hugo; Morales, J Alejandro

    2018-01-01

    Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.

  16. Targeted DNA methylation analysis by next-generation sequencing.

    Science.gov (United States)

    Masser, Dustin R; Stanford, David R; Freeman, Willard M

    2015-02-24

    The role of epigenetic processes in the control of gene expression has been known for a number of years. DNA methylation at cytosine residues is of particular interest for epigenetic studies as it has been demonstrated to be both a long lasting and a dynamic regulator of gene expression. Efforts to examine epigenetic changes in health and disease have been hindered by the lack of high-throughput, quantitatively accurate methods. With the advent and popularization of next-generation sequencing (NGS) technologies, these tools are now being applied to epigenomics in addition to existing genomic and transcriptomic methodologies. For epigenetic investigations of cytosine methylation where regions of interest, such as specific gene promoters or CpG islands, have been identified and there is a need to examine significant numbers of samples with high quantitative accuracy, we have developed a method called Bisulfite Amplicon Sequencing (BSAS). This method combines bisulfite conversion with targeted amplification of regions of interest, transposome-mediated library construction and benchtop NGS. BSAS offers a rapid and efficient method for analysis of up to 10 kb of targeted regions in up to 96 samples at a time that can be performed by most research groups with basic molecular biology skills. The results provide absolute quantitation of cytosine methylation with base specificity. BSAS can be applied to any genomic region from any DNA source. This method is useful for hypothesis testing studies of target regions of interest as well as confirmation of regions identified in genome-wide methylation analyses such as whole genome bisulfite sequencing, reduced representation bisulfite sequencing, and methylated DNA immunoprecipitation sequencing.

  17. Google matrix analysis of DNA sequences.

    Science.gov (United States)

    Kandiah, Vivek; Shepelyansky, Dima L

    2013-01-01

    For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  18. Google matrix analysis of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Vivek Kandiah

    Full Text Available For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW. At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  19. Sequencing strategy of mitochondrial HV1 and HV2 DNA with length heteroplasmy

    DEFF Research Database (Denmark)

    Rasmussen, Erik Michael; Sørensen, E; Eriksen, Birthe

    2002-01-01

    We describe a method to obtain reliable mitochondrial DNA (mtDNA) sequences downstream of the homopolymeric stretches with length heteroplasmy in the sequencing direction. The method is based on the use of junction primers that bind to a part of the homopolymeric stretch and the first 2-4 bases...... downstream of the homopolymeric region. This junction primer method gave clear and unambiguous results using samples from 21 individuals with length heteroplasmy in the hypervariable regions HV1, HV2 or both. The method is of special value for forensic casework, because sequencing of both strands of an mtDNA...... region is preferable in order to reduce ambiguities in sequence determination....

  20. What Advances Are Being Made in DNA Sequencing?

    Science.gov (United States)

    ... diagnosis in the future. For more information about DNA sequencing technologies and their use: Genetics Home Reference discusses ... illustration of the decline in the cost of DNA sequencing , including that caused by the introduction of new ...

  1. Next-generation sequencing offers new insights into DNA degradation

    DEFF Research Database (Denmark)

    Overballe-Petersen, Søren; Orlando, Ludovic Antoine Alexandre; Willerslev, Eske

    2012-01-01

    The processes underlying DNA degradation are central to various disciplines, including cancer research, forensics and archaeology. The sequencing of ancient DNA molecules on next-generation sequencing platforms provides direct measurements of cytosine deamination, depurination and fragmentation...

  2. Assessing Symbiodinium diversity in scleractinian corals via next-generation sequencing-based genotyping of the ITS2 rDNA region

    KAUST Repository

    Arif, Chatchanit

    2014-09-01

    The persistence of coral reef ecosystems relies on the symbiotic relationship between scleractinian corals and intracellular, photosynthetic dinoflagellates in the genus Symbiodinium. Genetic evidence indicates that these symbionts are biologically diverse and exhibit discrete patterns of environmental and host distribution. This makes the assessment of Symbiodinium diversity critical to understanding the symbiosis ecology of corals. Here, we applied pyrosequencing to the elucidation of Symbiodinium diversity via analysis of the internal transcribed spacer 2 (ITS2) region, a multicopy genetic marker commonly used to analyse Symbiodinium diversity. Replicated data generated from isoclonal Symbiodinium cultures showed that all genomes contained numerous, yet mostly rare, ITS2 sequence variants. Pyrosequencing data were consistent with more traditional denaturing gradient gel electrophoresis (DGGE) approaches to the screening of ITS2 PCR amplifications, where the most common sequences appeared as the most intense bands. Further, we developed an operational taxonomic unit (OTU)-based pipeline for Symbiodinium ITS2 diversity typing to provisionally resolve ecologically discrete entities from intragenomic variation. A genetic distance cut-off of 0.03 collapsed intragenomic ITS2 variants of isoclonal cultures into single OTUs. When applied to the analysis of field-collected coral samples, our analyses confirm that much of the commonly observed Symbiodinium ITS2 diversity can be attributed to intragenomic variation. We conclude that by analysing Symbiodinium populations in an OTU-based framework, we can improve objectivity, comparability and simplicity when assessing ITS2 diversity in field-based studies.

  3. Complete nucleotide sequence of minicircle kinetoplast DNA from Trypanosoma equiperdum.

    Science.gov (United States)

    Barrois, M; Riou, G; Galibert, F

    1981-06-01

    The kinetoplast DNA of Trypanosoma equiperdum is composed of about 3000 supercoiled minicircles of 1000 base pairs and about 50 supercoiled maxicircles of 23,000 base pairs topologically interlocked so as to form a compact network. Minicircles of T. equiperdum, which are homogeneous in base sequence, were purified by equilibrium CsCl centrifugation and used as starting material for DNA sequence analysis. One minicircle is composed of 1012 base pairs and has an adenine.thymine base pair content of 72.8%. The termination codons are uniformly distributed along the molecule and restrict the coding potentiality of the molecule to oligopeptides of about 20 amino acids. The molecule contains three dyad symmetries and a sequence of 12 nucleotides is repeated six times. We also noted the presence of a region of about 130 base pairs that is almost perfectly homologous with that of the minicircles from the closely related species T. brucei.

  4. DNA qualification workflow for next generation sequencing of histopathological samples.

    Directory of Open Access Journals (Sweden)

    Michele Simbolo

    Full Text Available Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF tissues, 6 formalin-fixed paraffin-embedded (FFPE tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard

  5. Dog Y chromosomal DNA sequence: identification, sequencing and SNP discovery

    OpenAIRE

    Natanaelsson, Christian; Oskarsson, Mattias CR; Angleby, Helen; Lundeberg, Joakim; Kirkness, Ewen; Savolainen, Peter

    2006-01-01

    Abstract Background Population genetic studies of dogs have so far mainly been based on analysis of mitochondrial DNA, describing only the history of female dogs. To get a picture of the male history, as well as a second independent marker, there is a need for studies of biallelic Y-chromosome polymorphisms. However, there are no biallelic polymorphisms reported, and only 3200 bp of non-repetitive dog Y-chromosome sequence deposited in GenBank, necessitating the identification of dog Y chromo...

  6. cDNA encoding a polypeptide including a hevein sequence

    Science.gov (United States)

    Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

    1993-02-16

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a pu GOVERNMENT RIGHTS This application was funded under Department of Energy Contract DE-AC02-76ER01338. The U.S. Government has certain rights under this application and any patent issuing thereon.

  7. Mitochondrial DNA sequencing of cat hair: an informative forensic tool.

    Science.gov (United States)

    Tarditi, Christy R; Grahn, Robert A; Evans, Jeffrey J; Kurushima, Jennifer D; Lyons, Leslie A

    2011-01-01

    Approximately 81.7 million cats are in 37.5 million U.S. households. Shed fur can be criminal evidence because of transfer to victims, suspects, and/or their belongings. To improve cat hairs as forensic evidence, the mtDNA control region from single hairs, with and without root tags, was sequenced. A dataset of a 402-bp control region segment from 174 random-bred cats representing four U.S. geographic areas was generated to determine the informativeness of the mtDNA region. Thirty-two mtDNA mitotypes were observed ranging in frequencies from 0.6-27%. Four common types occurred in all populations. Low heteroplasmy, 1.7%, was determined. Unique mitotypes were found in 18 individuals, 10.3% of the population studied. The calculated discrimination power implied that 8.3 of 10 randomly selected individuals can be excluded by this region. The genetic characteristics of the region and the generated dataset support the use of this cat mtDNA region in forensic applications. 2010 American Academy of Forensic Sciences. Published 2010. This article is a U.S. Government work and is in the public domain in the U.S.A.

  8. cDNA sequence quality data - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Budding yeast cDNA sequencing project cDNA sequence quality data Data detail Data name cDNA sequence quality... data DOI 10.18908/lsdba.nbdc00838-003 Description of data contents Phred's quality score. P...tion Download License Update History of This Database Site Policy | Contact Us cDNA sequence quality

  9. Predicting DNA hybridization kinetics from sequence

    Science.gov (United States)

    Zhang, Jinny X.; Fang, John Z.; Duan, Wei; Wu, Lucia R.; Zhang, Angela W.; Dalchau, Neil; Yordanov, Boyan; Petersen, Rasmus; Phillips, Andrew; Zhang, David Yu

    2018-01-01

    Hybridization is a key molecular process in biology and biotechnology, but so far there is no predictive model for accurately determining hybridization rate constants based on sequence information. Here, we report a weighted neighbour voting (WNV) prediction algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants. To construct this algorithm we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (36 nt sub-sequences of the CYCS and VEGF genes) at temperatures ranging from 28 to 55 °C. Automated feature selection and weighting optimization resulted in a final six-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 3 with ∼91% accuracy, based on leave-one-out cross-validation. Accurate prediction of hybridization kinetics allows the design of efficient probe sequences for genomics research.

  10. Nucleosome DNA sequence structure of isochores

    Directory of Open Access Journals (Sweden)

    Trifonov Edward N

    2011-04-01

    Full Text Available Abstract Background Significant differences in G+C content between different isochore types suggest that the nucleosome positioning patterns in DNA of the isochores should be different as well. Results Extraction of the patterns from the isochore DNA sequences by Shannon N-gram extension reveals that while the general motif YRRRRRYYYYYR is characteristic for all isochore types, the dominant positioning patterns of the isochores vary between TAAAAATTTTTA and CGGGGGCCCCCG due to the large differences in G+C composition. This is observed in human, mouse and chicken isochores, demonstrating that the variations of the positioning patterns are largely G+C dependent rather than species-specific. The species-specificity of nucleosome positioning patterns is revealed by dinucleotide periodicity analyses in isochore sequences. While human sequences are showing CG periodicity, chicken isochores display AG (CT periodicity. Mouse isochores show very weak CG periodicity only. Conclusions Nucleosome positioning pattern as revealed by Shannon N-gram extension is strongly dependent on G+C content and different in different isochores. Species-specificity of the pattern is subtle. It is reflected in the choice of preferentially periodical dinucleotides.

  11. Rapid quantification of DNA libraries for next-generation sequencing.

    Science.gov (United States)

    Buehler, Bernd; Hogrefe, Holly H; Scott, Graham; Ravi, Harini; Pabón-Peña, Carlos; O'Brien, Scott; Formosa, Rachel; Happe, Scott

    2010-04-01

    The next-generation DNA sequencing workflows require an accurate quantification of the DNA molecules to be sequenced which assures optimal performance of the instrument. Here, we demonstrate the use of qPCR for quantification of DNA libraries used in next-generation sequencing. In addition, we find that qPCR quantification may allow improvements to current NGS workflows, including reducing the amount of library DNA required, increasing the accuracy in quantifying amplifiable DNA, and avoiding amplification bias by reducing or eliminating the need to amplify DNA before sequencing. Copyright 2010. Published by Elsevier Inc.

  12. Detection of regional DNA methylation using DNA-graphene affinity interactions.

    Science.gov (United States)

    Haque, Md Hakimul; Gopalan, Vinod; Yadav, Sharda; Islam, Md Nazmul; Eftekhari, Ehsan; Li, Qin; Carrascosa, Laura G; Nguyen, Nam-Trung; Lam, Alfred K; Shiddiky, Muhammad J A

    2017-01-15

    We report a new method for the detection of regional DNA methylation using base-dependent affinity interaction (i.e., adsorption) of DNA with graphene. Due to the strongest adsorption affinity of guanine bases towards graphene, bisulfite-treated guanine-enriched methylated DNA leads to a larger amount of the adsorbed DNA on the graphene-modified electrodes in comparison to the adenine-enriched unmethylated DNA. The level of the methylation is quantified by monitoring the differential pulse voltammetric current as a function of the adsorbed DNA. The assay is sensitive to distinguish methylated and unmethylated DNA sequences at single CpG resolution by differentiating changes in DNA methylation as low as 5%. Furthermore, this method has been used to detect methylation levels in a collection of DNA samples taken from oesophageal cancer tissues. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. DNA sequencing using biotinylated dideoxynucleotides and mass spectrometry

    Science.gov (United States)

    Edwards, John R.; Itagaki, Yasuhiro; Ju, Jingyue

    2001-01-01

    Matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MS) has been explored widely for DNA sequencing. The major requirement for this method is that the DNA sequencing fragments must be free from alkaline and alkaline earth salts as well as other contaminants for accurately measuring the masses of the DNA fragments. We report here the development of a novel MS DNA sequencing method that generates Sanger-sequencing fragments in one tube using biotinylated dideoxynucleotides. The DNA sequencing fragments that carry a biotin at the 3′-end are made free from salts and other components in the sequencing reaction by capture with streptavidin-coated magnetic beads. Only correctly terminated biotinylated DNA fragments are subsequently released and loaded onto a mass spectrometer to obtain accurate DNA sequencing data. Compared with gel electrophoresis-based sequencing systems, MS produces a very high resolution of DNA-sequencing fragments, fast separation on microsecond time scales, and completely eliminates the compressions associated with gel electrophoresis. The high resolution of MS allows accurate mutation and heterozygote detection. This optimized solid-phase DNA-sequencing chemistry plus future improvements in detector sensitivity for large DNA fragments in MS instrumentation will further improve MS for DNA sequencing. PMID:11691941

  14. Image correlation method for DNA sequence alignment.

    Directory of Open Access Journals (Sweden)

    Millaray Curilem Saldías

    Full Text Available The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs and 100 scenes represented by 100 x 100 images each (in total, one million base pair database were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%, specificity (98.99% and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment.

  15. Multilocus DNA Sequence Comparisons Rapidly Identify Pathogenic Molds

    Science.gov (United States)

    Rakeman, Jennifer L.; Bui, Uyen; LaFe, Karen; Chen, Yi-Ching; Honeycutt, Rhonda J.; Cookson, Brad T.

    2005-01-01

    The increasing incidence of opportunistic fungal infections necessitates rapid and accurate identification of the associated fungi to facilitate optimal patient treatment. Traditional phenotype-based identification methods utilized in clinical laboratories rely on the production and recognition of reproductive structures, making identification difficult or impossible when these structures are not observed. We hypothesized that DNA sequence analysis of multiple loci is useful for rapidly identifying medically important molds. Our study included the analysis of the D1/D2 hypervariable region of the 28S ribosomal gene and the internal transcribed spacer (ITS) regions 1 and 2 of the rRNA operon. Two hundred one strains, including 143 clinical isolates and 58 reference and type strains, representing 43 recognized species and one possible new species, were examined. We generated a phenotypically validated database of 118 diagnostic alleles. DNA length polymorphisms detected among ITS1 and ITS2 PCR products can differentiate 20 of 33 species of molds tested, and ITS DNA sequence analysis permits identification of all species tested. For 42 of 44 species tested, conspecific strains displayed >99% sequence identity at ITS1 and ITS2; sequevars were detected in two species. For all 44 species, identifications by genotypic and traditional phenotypic methods were 100% concordant. Because dendrograms based on ITS sequence analysis are similar in topology to 28S-based trees, we conclude that ITS sequences provide phylogenetically valid information and can be utilized to identify clinically important molds. Additionally, this phenotypically validated database of ITS sequences will be useful for identifying new species of pathogenic molds. PMID:16000456

  16. Silicene nanoribbon as a new DNA sequencing device

    Science.gov (United States)

    Alesheikh, Sara; Shahtahmassebi, Nasser; Roknabadi, Mahmood Rezaee; Pilevar Shahri, Raheleh

    2018-02-01

    The importance of applying DNA sequencing in different fields, results in looking for fast and cheap methods. Nanotechnology helps this development by introducing nanostructures used for DNA sequencing. In this work we study the interaction between zigzag silicene nanoribbon and DNA nucleobases using DFT and non equilibrium Green's function approach, to investigate the possibility of using zigzag silicene nanoribbons as a biosensor for DNA sequencing.

  17. Next generation sequencing of DNA-launched Chikungunya vaccine virus

    Energy Technology Data Exchange (ETDEWEB)

    Hidajat, Rachmat; Nickols, Brian [Medigen, Inc., 8420 Gas House Pike, Suite S, Frederick, MD 21701 (United States); Forrester, Naomi [Institute for Human Infections and Immunity, Sealy Center for Vaccine Development and Department of Pathology, University of Texas Medical Branch, GNL, 301 University Blvd., Galveston, TX 77555 (United States); Tretyakova, Irina [Medigen, Inc., 8420 Gas House Pike, Suite S, Frederick, MD 21701 (United States); Weaver, Scott [Institute for Human Infections and Immunity, Sealy Center for Vaccine Development and Department of Pathology, University of Texas Medical Branch, GNL, 301 University Blvd., Galveston, TX 77555 (United States); Pushko, Peter, E-mail: ppushko@medigen-usa.com [Medigen, Inc., 8420 Gas House Pike, Suite S, Frederick, MD 21701 (United States)

    2016-03-15

    Chikungunya virus (CHIKV) represents a pandemic threat with no approved vaccine available. Recently, we described a novel vaccination strategy based on iDNA® infectious clone designed to launch a live-attenuated CHIKV vaccine from plasmid DNA in vitro or in vivo. As a proof of concept, we prepared iDNA plasmid pCHIKV-7 encoding the full-length cDNA of the 181/25 vaccine. The DNA-launched CHIKV-7 virus was prepared and compared to the 181/25 virus. Illumina HiSeq2000 sequencing revealed that with the exception of the 3′ untranslated region, CHIKV-7 viral RNA consistently showed a lower frequency of single-nucleotide polymorphisms than the 181/25 RNA including at the E2-12 and E2-82 residues previously identified as attenuating mutations. In the CHIKV-7, frequencies of reversions at E2-12 and E2-82 were 0.064% and 0.086%, while in the 181/25, frequencies were 0.179% and 0.133%, respectively. We conclude that the DNA-launched virus has a reduced probability of reversion mutations, thereby enhancing vaccine safety. - Highlights: • Chikungunya virus (CHIKV) is an emerging pandemic threat. • In vivo DNA-launched attenuated CHIKV is a novel vaccine technology. • DNA-launched virus was sequenced using HiSeq2000 and compared to the 181/25 virus. • DNA-launched virus has lower frequency of SNPs at E2-12 and E2-82 attenuation loci.

  18. Sequence polymorphisms of mtDNA HV1, HV2, and HV3 regions in the Malay population of Peninsular Malaysia.

    Science.gov (United States)

    Nur Haslindawaty, Abd Rashid; Panneerchelvam, Sundararajulu; Edinur, Hisham Atan; Norazmi, Mohd Nor; Zafarina, Zainuddin

    2010-09-01

    The uniparentally inherited mitochondrial DNA (mtDNA) is in the limelight for the past two decades, in studies relating to demographic history of mankind and in forensic kinship testing. In this study, human mtDNA hypervariable segments 1, 2, and 3 (HV1, HV2, and HV3) were analyzed in 248 unrelated Malay individuals in Peninsular Malaysia. Combined analyses of HV1, HV2, and HV3 revealed a total of 180 mtDNA haplotypes with 149 unique haplotypes and 31 haplotypes occurring in more than one individual. The genetic diversity was estimated to be 99.47%, and the probability of any two individuals sharing the same mtDNA haplotype was 0.93%. The most frequent mtDNA haplotype (73, 146, 150, 195, 263, 315.1C, 16140, 16182C, 16183C, 16189, 16217, 16274, and 16335) was shared by 11 (4.44%) individuals. The nucleotide diversity and mean of pair-wise differences were found to be 0.036063 ± 0.020101 and 12.544022 ± 6.230486, respectively.

  19. Computational optimisation of targeted DNA sequencing for cancer detection

    Science.gov (United States)

    Martinez, Pierre; McGranahan, Nicholas; Birkbak, Nicolai Juul; Gerlinger, Marco; Swanton, Charles

    2013-12-01

    Despite recent progress thanks to next-generation sequencing technologies, personalised cancer medicine is still hampered by intra-tumour heterogeneity and drug resistance. As most patients with advanced metastatic disease face poor survival, there is need to improve early diagnosis. Analysing circulating tumour DNA (ctDNA) might represent a non-invasive method to detect mutations in patients, facilitating early detection. In this article, we define reduced gene panels from publicly available datasets as a first step to assess and optimise the potential of targeted ctDNA scans for early tumour detection. Dividing 4,467 samples into one discovery and two independent validation cohorts, we show that up to 76% of 10 cancer types harbour at least one mutation in a panel of only 25 genes, with high sensitivity across most tumour types. Our analyses demonstrate that targeting ``hotspot'' regions would introduce biases towards in-frame mutations and would compromise the reproducibility of tumour detection.

  20. Identification of Meconopsis species by a DNA barcode sequence ...

    African Journals Online (AJOL)

    Deoxyribonucleic acid (DNA) barcoding is a novel technology that uses a standard DNA sequence to facilitate species identification. Species identification is necessary for the authentication of traditional plant based medicines. Although a consensus has not been agreed regarding which DNA sequences can be used as ...

  1. Structural properties of replication origins in yeast DNA sequences

    Science.gov (United States)

    Cao, Xiao-Qin; Zeng, Jia; Yan, Hong

    2008-09-01

    Sequence-dependent DNA flexibility is an important structural property originating from the DNA 3D structure. In this paper, we investigate the DNA flexibility of the budding yeast (S. Cerevisiae) replication origins on a genome-wide scale using flexibility parameters from two different models, the trinucleotide and the tetranucleotide models. Based on analyzing average flexibility profiles of 270 replication origins, we find that yeast replication origins are significantly rigid compared with their surrounding genomic regions. To further understand the highly distinctive property of replication origins, we compare the flexibility patterns between yeast replication origins and promoters, and find that they both contain significantly rigid DNAs. Our results suggest that DNA flexibility is an important factor that helps proteins recognize and bind the target sites in order to initiate DNA replication. Inspired by the role of the rigid region in promoters, we speculate that the rigid replication origins may facilitate binding of proteins, including the origin recognition complex (ORC), Cdc6, Cdt1 and the MCM2-7 complex.

  2. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  3. Short-sequence DNA repeats in prokaryotic genomes

    NARCIS (Netherlands)

    A.F. van Belkum (Alex); S. Scherer; L. van Alphen (Loek); H.A. Verbrugh (Henri)

    1998-01-01

    textabstractShort-sequence DNA repeat (SSR) loci can be identified in all eukaryotic and many prokaryotic genomes. These loci harbor short or long stretches of repeated nucleotide sequence motifs. DNA sequence motifs in a single locus can be identical and/or

  4. SWORDS: A statistical tool for analysing large DNA sequences

    Indian Academy of Sciences (India)

    In this article, we present some simple yet effective statistical techniques for analysing and comparing large DNA sequences. These techniques are based on frequency distributions of DNA words in a large sequence, and have been packaged into a software called SWORDS. Using sequences available in public domain ...

  5. Considering DNA damage when interpreting mtDNA heteroplasmy in deep sequencing data.

    Science.gov (United States)

    Rathbun, Molly M; McElhoe, Jennifer A; Parson, Walther; Holland, Mitchell M

    2017-01-01

    Resolution of mitochondrial (mt) DNA heteroplasmy is now possible when applying a massively parallel sequencing (MPS) approach, including minor components down to 1%. However, reporting thresholds and interpretation criteria will need to be established for calling heteroplasmic variants that address a number of important topics, one of which is DNA damage. We assessed the impact of increasing amounts of DNA damage on the interpretation of minor component sequence variants in the mtDNA control region, including low-level mixed sites. A passive approach was used to evaluate the impact of storage conditions, and an active approach was employed to accelerate the process of hydrolytic damage (for example, replication errors associated with depurination events). The patterns of damage were compared and assessed in relation to damage typically encountered in poor quality samples. As expected, the number of miscoding lesions increased as conditions worsened. Single nucleotide polymorphisms (SNPs) associated with miscoding lesions were indistinguishable from innate heteroplasmy and were most often observed as 1-2% of the total sequencing reads. Numerous examples of miscoding lesions above 2% were identified, including two complete changes in the nucleotide sequence, presenting a challenge when assessing the placement of reporting thresholds for heteroplasmy. To mitigate the impact, replication of miscoding lesions was not observed in stored samples, and was rarely seen in data associated with accelerated hydrolysis. In addition, a significant decrease in the expected transition:transversion ratio was observed, providing a useful tool for predicting the presence of damage-induced lesions. The results of this study directly impact MPS analysis of minor sequence variants from poorly preserved DNA extracts, and when biological samples have been exposed to agents that induce DNA damage. These findings are particularly relevant to clinical and forensic investigations. Copyright

  6. Toward a Better Compression for DNA Sequences Using Huffman Encoding

    National Research Council Canada - National Science Library

    Al-Okaily, Anas; Almarri, Badar; Al Yami, Sultan; Huang, Chun-Hsi

    ... to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data...

  7. Sequencing for complete rDNA sequences (18S, ITS1, 5.8S, ITS2, and 28S rDNA) of Demodex and phylogenetic analysis of Acari based on 18S and 28S rDNA.

    Science.gov (United States)

    Zhao, Ya-E; Wu, Li-Ping; Hu, Li; Xu, Yang; Wang, Zheng-Hang; Liu, Wen-Yan

    2012-11-01

    Due to the difficulty of DNA extraction for Demodex, few studies dealt with the identification and the phyletic evolution of Demodex at molecular level. In this study, we amplified, sequenced, and analyzed a complete (Demodex folliculorum) and an almost complete (D12 missing) (Demodex brevis) ribosomal DNA (rDNA) sequence and also analyzed the primary sequences of divergent domains in small-subunit ribosomal RNA (rRNA) of 51 species and in large-subunit rRNA of 43 species from four superfamilies in Acari (Cheyletoidea, Tetranychoidea, Analgoidea, and Ixodoidea). The results revealed that 18S rDNA sequence was relatively conserved in rDNA-coding regions and was not evolving as rapidly as 28S rDNA sequence. The evolutionary rates of transcribed spacer regions were much higher than those of the coding regions. The maximum parsimony trees of 18S and 28S rDNA appeared to be almost identical, consistent with their morphological classification. Based on the fact that the resolution capability of sequence length and the divergence of the 13 segments (D1-D6, D7a, D7b, and D8-D12) of 28S rDNA were stronger than that of the nine variable regions (V1-V9) of 18S rDNA, we were able to identify Demodex (Cheyletoidea) by the indels occurring in D2, D6, and D8.

  8. Transfection of the inner cell mass and lack of a unique DNA sequence affecting the uptake of exogenous DNA by sperm as shown by dideoxy sequencing analogues.

    Science.gov (United States)

    Cabrera, M; Chan, P J; Kalugdan, T H; King, A

    1997-02-01

    The purpose of this study was to determine whether exogenous DNA internalized into blastocysts after transference from DNA-carrier sperm are localized at the inner cell mass or trophoblast cells and to identify differences in uptake of exogenous DNA fragments by sperm due to unique DNA sequences. Mouse blastocysts at the hatching stage were exposed to migrating human sperm cells carrying exogenous DNA fragments synthesized from the E6-E7 conserved gene regions of human papillomavirus (HPV) types 16 and 18. After an interaction period of 2 hr, the transfected blastocysts were washed several times to remove extraneous sperm and the blastocysts were dissected into groups of cells derived from the inner cell mass and trophoblasts. The cells were analyzed by polymerase chain reaction (PCR) for the presence of HPV DNA fragments. In the second part of the experiment, thawed donor (N = 10) sperm cells were pooled, washed, and divided into two fractions. The first (control) fraction was added with formalin and further divided and added with a 35S-radiolabeled G, A, T, or C sequencing mixture. The second fraction was similarly treated but the formalin step was omitted from the treatment. After an hour of incubation at 37 degrees C, the sperm specimens were washed several times by centrifugation and DNA extracted by the GeneReleaser method. The extracted DNA were processed on sequence gels, and the autoradiographs analyzed. Mouse blastocysts transfected by carrier sperm with DNA from HPV types 16 and 18 showed localization of the HPV DNA to both the inner cell mass and trophoblast cells. Negative controls consisting of untreated human sperm and untreated mouse blastocysts did not reveal any evidence of HPV DNA. The positive sperm control generated expected DNA fragments from HPV types 16 and 18. In the second experiment, the intensities of the DNA fragments in the G, A, T, and C columns from low to high molecular weights were not different from the positive control bands

  9. Complete Genomic DNA Sequence of the East Asian Spotted Fever Disease Agent Rickettsia japonica

    Science.gov (United States)

    Matsutani, Minenosuke; Ogawa, Motohiko; Takaoka, Naohisa; Hanaoka, Nozomu; Toh, Hidehiro; Yamashita, Atsushi; Oshima, Kenshiro; Hirakawa, Hideki; Kuhara, Satoru; Suzuki, Harumi; Hattori, Masahira; Kishimoto, Toshio; Ando, Shuji; Azuma, Yoshinao; Shirai, Mutsunori

    2013-01-01

    Rickettsia japonica is an obligate intracellular alphaproteobacteria that causes tick-borne Japanese spotted fever, which has spread throughout East Asia. We determined the complete genomic DNA sequence of R. japonica type strain YH (VR-1363), which consists of 1,283,087 base pairs (bp) and 971 protein-coding genes. Comparison of the genomic DNA sequence of R. japonica with other rickettsiae in the public databases showed that 2 regions (4,323 and 216 bp) were conserved in a very narrow range of Rickettsia species, and the shorter one was inserted in, and disrupted, a preexisting open reading frame (ORF). While it is unknown how the DNA sequences were acquired in R. japonica genomes, it may be a useful signature for the diagnosis of Rickettsia species. Instead of the species-specific inserted DNA sequences, rickettsial genomes contain Rickettsia-specific palindromic elements (RPEs), which are also capable of locating in preexisting ORFs. Precise alignments of protein and DNA sequences involving RPEs showed that when a gene contains an inserted DNA sequence, each rickettsial ortholog carried an inserted DNA sequence at the same locus. The sequence, ATGAC, was shown to be highly frequent and thus characteristic in certain RPEs (RPE-4, RPE-6, and RPE-7). This finding implies that RPE-4, RPE-6, and RPE-7 were derived from a common inserted DNA sequence. PMID:24039725

  10. Potential use of DNA barcoding for the identification of Salvia based on cpDNA and nrDNA sequences.

    Science.gov (United States)

    Wang, Meng; Zhao, Hong-Xia; Wang, Long; Wang, Tao; Yang, Rui-Wu; Wang, Xiao-Li; Zhou, Yong-Hong; Ding, Chun-Bang; Zhang, Li

    2013-10-10

    An effective DNA marker for authenticating the genus Salvia was screened using seven DNA regions (rbcL, matK, trnL-F, and psbA-trnH from the chloroplast genome, and ITS, ITS1, and ITS2 from the nuclear genome) and three combinations (rbcL+matK, psbA-trnH+ITS1, and trnL-F+ITS1). The present study collected 232 sequences from 27 Salvia species through DNA sequencing and 77 sequences within the same taxa from the GenBank. The discriminatory capabilities of these regions were evaluated in terms of PCR amplification success, intraspecific and interspecific divergence, DNA barcoding gaps, and identification efficiency via a tree-based method. ITS1 was superior to the other marker for discriminating between species, with an accuracy of 81.48%. The three combinations did not increase species discrimination. Finally, we found that ITS1 is a powerful barcode for identifying Salvia species, especially Salvia miltiorrhiza. © 2013.

  11. Representation of DNA sequences in genetic codon context with applications in exon and intron prediction.

    Science.gov (United States)

    Yin, Changchuan

    2015-04-01

    To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.

  12. Genetic homogeneity in longtail tuna Thunnus tonggol (Bleeker, 1851) from the northwest coast of India inferred from direct sequencing analysis of the mitochondrial DNA D-loop region

    Digital Repository Service at National Institute of Oceanography (India)

    Kunal, S.P.; GirishKumar; Menezes, M.R.; Meena, R.M.

    Longtail tuna Thunnus tonggol is a neritic species of the family Scombridae, having a confined coastal distribution to tropical and temperate waters of the Indo-Pacific region In the present study, the population structure of longtail tuna...

  13. Methylation-capture and Next-Generation Sequencing of free circulating DNA from human plasma.

    Science.gov (United States)

    Warton, Kristina; Lin, Vita; Navin, Tina; Armstrong, Nicola J; Kaplan, Warren; Ying, Kevin; Gloss, Brian; Mangs, Helena; Nair, Shalima S; Hacker, Neville F; Sutherland, Robert L; Clark, Susan J; Samimi, Goli

    2014-06-15

    Free circulating DNA (fcDNA) has many potential clinical applications, due to the non-invasive way in which it is collected. However, because of the low concentration of fcDNA in blood, genome-wide analysis carries many technical challenges that must be overcome before fcDNA studies can reach their full potential. There are currently no definitive standards for fcDNA collection, processing and whole-genome sequencing. We report novel detailed methodology for the capture of high-quality methylated fcDNA, library preparation and downstream genome-wide Next-Generation Sequencing. We also describe the effects of sample storage, processing and scaling on fcDNA recovery and quality. Use of serum versus plasma, and storage of blood prior to separation resulted in genomic DNA contamination, likely due to leukocyte lysis. Methylated fcDNA fragments were isolated from 5 donors using a methyl-binding protein-based protocol and appear as a discrete band of ~180 bases. This discrete band allows minimal sample loss at the size restriction step in library preparation for Next-Generation Sequencing, allowing for high-quality sequencing from minimal amounts of fcDNA. Following sequencing, we obtained 37 × 10(6)-86 × 10(6) unique mappable reads, representing more than 50% of total mappable reads. The methylation status of 9 genomic regions as determined by DNA capture and sequencing was independently validated by clonal bisulphite sequencing. Our optimized methods provide high-quality methylated fcDNA suitable for whole-genome sequencing, and allow good library complexity and accurate sequencing, despite using less than half of the recommended minimum input DNA.

  14. An automated annotation tool for genomic DNA sequences using ...

    Indian Academy of Sciences (India)

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...

  15. Taxonomic identity of type E botulinum toxin-producing Clostridium butyricum strains by sequencing of a short 16S rDNA region.

    Science.gov (United States)

    Pourshaban, Manoocheher; Franciosa, Giovanna; Fenicia, Lucia; Aureli, Paolo

    2002-08-27

    Several micro-organisms capable of producing botulinum neurotoxin type E, though phenotypically similar to Clostridium butyricum (a normally non-neurotoxigenic organism), have recently been isolated in Italy and China. Some of these micro-organisms had been implicated in food-borne botulism, a serious neuroparalytic disease. The taxonomic identity of the type E botulinum toxin-producing strains is confirmed here, through sequencing of a genus- and species-specific segment of the 16S rRNA gene. Confirmation leads to the conclusion that neurotoxigenic C. butyricum must be regarded as an emergent food-borne pathogen.

  16. Random Coding Bounds for DNA Codes Based on Fibonacci Ensembles of DNA Sequences

    Science.gov (United States)

    2008-07-01

    Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington, DC...COVERED (From - To) 6 Jul 08 – 11 Jul 08 4. TITLE AND SUBTITLE RANDOM CODING BOUNDS FOR DNA CODES BASED ON FIBONACCI ENSEMBLES OF DNA SEQUENCES...sequences which are generalizations of the Fibonacci sequences. 15. SUBJECT TERMS DNA Codes, Fibonacci Ensembles, DNA Computing, Code Optimization 16

  17. [Heteroplasmy in human mtDNA control region].

    Science.gov (United States)

    Cao, Yang; Wan, Li-Hua; Gu, Lin-Gang; Huang, Ying-Xue; Xiu, Cong-Xian; Hu, Shu-Hui; Mi, Can

    2006-06-01

    To observe the length heteroplasmy and point heteroplasmy in human mtDNA control region. The peripheral blood, buccal cell, and single hair shaft from 50 individuals and 16 family members, related in their maternallineage were analyzed by direct sequencing, and clones from 20 individuals whose mtDNA sequences have a T-C transition at 16189 nt were sequenced. No point heteroplasmy were observed in peripheral blood, buccal cell, single hair shaft from the same individual, neither in maternally related individuals. Length heteroplasmy was observed in those individuals with a homopolymeric tract and the different clones from the same individual has different proportions of length variants, but the hair shafts from the same individual were very similar to the measurements made from blood DNA. No length heteroplasmy was observed between different tissues from the same individual. mtDNA sequences have a characteristic of high consistency and genetic stability, mtDNA sequencing is a suitable tool for forensic applications such as individual identification.

  18. Rediscovery of historical Vitis vinifera varieties from the South Anatolia region by using amplified fragment length polymorphism and simple sequence repeat DNA fingerprinting methods.

    Science.gov (United States)

    Yilancioglu, Kaan; Cetiner, Selim

    2013-05-01

    Anatolia played an important role in the diversification and spread of economically important Vitis vinifera varieties. Although several biodiversity studies have been conducted with local cultivars in different regions of Anatolia, our aim is to gain a better knowledge on the biodiversity of endangered historical V. vinifera varieties in the northern Adana region of southern Anatolia, particularly those potentially displaying viticulture characteristics. We also demonstrate the genetic relatedness in a selected subset of widely cultivated and commercialized V. vinifera collection cultivars, which were obtained from the National Grapevine Germplasm located at the Institute of Viticulture, Turkey. In the present study, microsatellites were used in narrowing the sample size from 72 accessions down to a collection of 27 varieties. Amplified fragment length polymorphisms were then employed to determine genetic relatedness among this collection and local V. vinifera cultivars. The unweighted pair group method with arithmetic mean cluster and principal component analyses revealed that Saimbeyli local cultivars form a distinct group, which is distantly related to a selected subset of V. vinifera collection varieties from all over Turkey. To our knowledge, this is the first study conducted with these cultivars. Further preservation and use of these potential viticultural varieties will be helpful to avoid genetic erosion and to promote continued agriculture in the region.

  19. Levenshtein error-correcting barcodes for multiplexed DNA sequencing

    NARCIS (Netherlands)

    Buschmann, Tilo; Bystrykh, Leonid V.

    2013-01-01

    Background: High-throughput sequencing technologies are improving in quality, capacity and costs, providing versatile applications in DNA and RNA research. For small genomes or fraction of larger genomes, DNA samples can be mixed and loaded together on the same sequencing track. This so-called

  20. Revealing large metagenomic regions through long DNA fragment hybridization capture.

    Science.gov (United States)

    Gasc, Cyrielle; Peyret, Pierre

    2017-03-14

    High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes from single organisms or metagenomic samples. However, due to the limited capacity of short-read sequence data to assemble complex or low coverage regions, genomes are typically fragmented, leading to draft genomes with numerous underexplored large genomic regions. Revealing these missing sequences is a major goal to resolve concerns in numerous biological studies. To overcome these limitations, we developed an innovative target enrichment method for the reconstruction of large unknown genomic regions. Based on a hybridization capture strategy, this approach enables the enrichment of large genomic regions allowing the reconstruction of tens of kilobase pairs flanking a short, targeted DNA sequence. Applied to a metagenomic soil sample targeting the linA gene, the biomarker of hexachlorocyclohexane (HCH) degradation, our method permitted the enrichment of the gene and its flanking regions leading to the reconstruction of several contigs and complete plasmids exceeding tens of kilobase pairs surrounding linA. Thus, through gene association and genome reconstruction, we identified microbial species involved in HCH degradation which constitute targets to improve biostimulation treatments. This new hybridization capture strategy makes surveying and deconvoluting complex genomic regions possible through large genomic regions enrichment and allows the efficient exploration of metagenomic diversity. Indeed, this approach enables to assign identity and function to microorganisms in natural environments, one of the ultimate goals of microbial ecology.

  1. Cloning of rat aorta lysyl oxidase cDNA: Complete codons and predicted amino acid sequence

    Energy Technology Data Exchange (ETDEWEB)

    Trackman, P.C.; Pratt, A.M.; Wolanski, A.; Tang, Shiowshih; Offner, G.D.; Troxler, R.F.; Kagan, H.M. (Boston Univ. School of Medicine, MA (USA))

    1990-05-22

    Lysyl oxidase cDNA clones were identified by their reactivity with anti-bovine lysyl oxidase in a neonatal rat aorta cDNA {lambda}gt11 expression library. A 500-bp cDNA sequence encoding four of six peptides derived from proteolytic digests of bovine aorta lysyl oxidase was found from the overlapping cDNA sequences of two positive clones. The library was rescreened with a radiolabeled cDNA probe made from one of these clones, thus identifying an additional 13 positive clones. Sequencing of the largest two of these overlapping clones resulted in 2,672 bp of cDNA sequence containing partial 5{prime}- and 3{prime}-untranslated sequences of 286 and 1,159 nucleotides, respectively, and a complete open reading frame of 1,227 bp encoding a polypeptide of 409 amino acids (46 kDa), consistent with the 48 {plus minus} 3 kDa cell-free translation product of rat smooth muscle cell RNA that was immunoprecipitated by anti-bovine lysyl oxidase. The rat aorta cDNA-derived amino acid sequence contains the sequence of each of the six peptides isolated and sequenced from the 32-kDa bovine aorta enzyme, including the C-terminal peptide with sequence identity of 96%. Southern blotting of rat genomic DNA with lysyl oxidase cDNA probes indicated that the lysyl oxidase gene is located at a single locus and does not appear to be a member of a multigene family. A potential stem-loop structure was found in the 3{prime}-untranslated region of the cDNA. The deduced amino acid sequence contains a putative signal peptide, in addition to sequences that are similar to those of other known copper proteins.

  2. Design of sequence-specific DNA-binding molecules.

    Science.gov (United States)

    Dervan, P B

    1986-04-25

    Base sequence information can be stored in the local structure of right-handed double-helical DNA (B-DNA). The question arises as to whether a set of rules for the three-dimensional readout of the B-DNA helix can be developed. This would allow the design of synthetic molecules that bind DNA of any specific sequence and site size. There are four stages of development for each new synthetic sequence-specific DNA-binding molecule: design, synthesis, testing for sequence specificity, and reevaluation of the design. This approach has produced bis(distamycin)fumaramide, a synthetic, crescent-shaped oligopeptide that binds nine contiguous adenine-thymine base pairs in the minor groove of double-helical DNA.

  3. Protocols for 16S rDNA Array Analyses of Microbial Communities by Sequence-Specific Labeling of DNA Probes

    Directory of Open Access Journals (Sweden)

    Knut Rudi

    2003-01-01

    Full Text Available Analyses of complex microbial communities are becoming increasingly important. Bottlenecks in these analyses, however, are the tools to actually describe the biodiversity. Novel protocols for DNA array-based analyses of microbial communities are presented. In these protocols, the specificity obtained by sequence-specific labeling of DNA probes is combined with the possibility of detecting several different probes simultaneously by DNA array hybridization. The gene encoding 16S ribosomal RNA was chosen as the target in these analyses. This gene contains both universally conserved regions and regions with relatively high variability. The universally conserved regions are used for PCR amplification primers, while the variable regions are used for the specific probes. Protocols are presented for DNA purification, probe construction, probe labeling, and DNA array hybridizations.

  4. Food Fish Identification from DNA Extraction through Sequence Analysis

    Science.gov (United States)

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  5. 18S rDNA sequences and the holometabolous insects.

    Science.gov (United States)

    Carmean, D; Kimsey, L S; Berbee, M L

    1992-12-01

    The Holometabola (insects with complete metamorphosis: beetles, wasps, flies, fleas, butterflies, lacewings, and others) is a monophyletic group that includes the majority of the world's animal species. Holometabolous orders are well defined by morphological characters, but relationships among orders are unclear. In a search for a region of DNA that will clarify the interordinal relationships we sequenced approximately 1080 nucleotides of the 5' end of the 18S ribosomal RNA gene from representatives of 14 families of insects in the orders Hymenoptera (sawflies and wasps), Neuroptera (lacewing and antlion), Siphonaptera (flea), and Mecoptera (scorpionfly). We aligned the sequences with the published sequences of insects from the orders Coleoptera (beetle) and Diptera (mosquito and Drosophila), and the outgroups aphid, shrimp, and spider. Unlike the other insects examined in this study, the neuropterans have A-T rich insertions or expansion regions: one in the antlion was approximately 260 bp long. The dipteran 18S rDNA evolved rapidly, with over 3 times as many substitutions among the aligned sequences, and 2-3 times more unalignable nucleotides than other Holometabola, in violation of an insect-wide molecular clock. When we excluded the long-branched taxa (Diptera, shrimp, and spider) from the analysis, the most parsimonious (minimum-length) trees placed the beetle basal to other holometabolous orders, and supported a morphologically monophyletic clade including the fleas+scorpionflies (96% bootstrap support). However, most interordinal relationships were not significantly supported when tested by maximum likelihood or bootstrapping and were sensitive to the taxa included in the analysis. The most parsimonious and maximum-likelihood trees both separated the Coleoptera and Neuroptera, but this separation was not statistically significant.(ABSTRACT TRUNCATED AT 250 WORDS)

  6. Affordable Hands-On DNA Sequencing and Genotyping: An Exercise for Teaching DNA Analysis to Undergraduates

    Science.gov (United States)

    Shah, Kushani; Thomas, Shelby; Stein, Arnold

    2013-01-01

    In this report, we describe a 5-week laboratory exercise for undergraduate biology and biochemistry students in which students learn to sequence DNA and to genotype their DNA for selected single nucleotide polymorphisms (SNPs). Students use miniaturized DNA sequencing gels that require approximately 8 min to run. The students perform G, A, T, C…

  7. Mesoscopic Model for Free Energy Landscape Analysis of DNA sequences

    CERN Document Server

    Tapia-Rojo, R; Mazo, J J; Falo, F; 10.1103/PhysRevE.86.021908

    2012-01-01

    A mesoscopic model which allows us to identify and quantify the strength of binding sites in DNA sequences is proposed. The model is based on the Peyrard-Bishop-Dauxois model for the DNA chain coupled to a Brownian particle which explores the sequence interacting more importantly with open base pairs of the DNA chain. We apply the model to promoter sequences of different organisms. The free energy landscape obtained for these promoters shows a complex structure that is strongly connected to their biological behavior. The analysis method used is able to quantify free energy differences of sites within genome sequences.

  8. A 28,000 years old Cro-Magnon mtDNA sequence differs from all potentially contaminating modern sequences.

    Directory of Open Access Journals (Sweden)

    David Caramelli

    Full Text Available BACKGROUND: DNA sequences from ancient specimens may in fact result from undetected contamination of the ancient specimens by modern DNA, and the problem is particularly challenging in studies of human fossils. Doubts on the authenticity of the available sequences have so far hampered genetic comparisons between anatomically archaic (Neandertal and early modern (Cro-Magnoid Europeans. METHODOLOGY/PRINCIPAL FINDINGS: We typed the mitochondrial DNA (mtDNA hypervariable region I in a 28,000 years old Cro-Magnoid individual from the Paglicci cave, in Italy (Paglicci 23 and in all the people who had contact with the sample since its discovery in 2003. The Paglicci 23 sequence, determined through the analysis of 152 clones, is the Cambridge reference sequence, and cannot possibly reflect contamination because it differs from all potentially contaminating modern sequences. CONCLUSIONS/SIGNIFICANCE: The Paglicci 23 individual carried a mtDNA sequence that is still common in Europe, and which radically differs from those of the almost contemporary Neandertals, demonstrating a genealogical continuity across 28,000 years, from Cro-Magnoid to modern Europeans. Because all potential sources of modern DNA contamination are known, the Paglicci 23 sample will offer a unique opportunity to get insight for the first time into the nuclear genes of early modern Europeans.

  9. DNA Polymerases Drive DNA Sequencing-by-Synthesis Technologies: Both Past and Present

    Directory of Open Access Journals (Sweden)

    Cheng-Yao eChen

    2014-06-01

    Full Text Available Next-generation sequencing (NGS technologies have revolutionized modern biological and biomedical research. The engines responsible for this innovation are DNA polymerases; they catalyze the biochemical reaction for deriving template sequence information. In fact, DNA polymerase has been a cornerstone of DNA sequencing from the very beginning. E. coli DNA polymerase I proteolytic (Klenow fragment was originally utilized in Sanger's dideoxy chain terminating DNA sequencing chemistry. From these humble beginnings followed an explosion of organism-specific, genome sequence information accessible via public database. Family A/B DNA polymerases from mesophilic/thermophilic bacteria/archaea were modified and tested in today's standard capillary electrophoresis (CE and NGS sequencing platforms. These enzymes were selected for their efficient incorporation of bulky dye-terminator and reversible dye-terminator nucleotides respectively. Third generation, real-time single molecule sequencing platform requires slightly different enzyme properties. Enterobacterial phage ⱷ29 DNA polymerase copies long stretches of DNA and possesses a unique capability to efficiently incorporate terminal phosphate-labeled nucleoside polyphosphates. Furthermore, ⱷ29 enzyme has also been utilized in emerging DNA sequencing technologies including nanopore-, and protein-transistor-based sequencing. DNA polymerase is, and will continue to be, a crucial component of sequencing technologies.

  10. DNA polymerases drive DNA sequencing-by-synthesis technologies: both past and present

    Science.gov (United States)

    Chen, Cheng-Yao

    2014-01-01

    Next-generation sequencing (NGS) technologies have revolutionized modern biological and biomedical research. The engines responsible for this innovation are DNA polymerases; they catalyze the biochemical reaction for deriving template sequence information. In fact, DNA polymerase has been a cornerstone of DNA sequencing from the very beginning. Escherichia coli DNA polymerase I proteolytic (Klenow) fragment was originally utilized in Sanger’s dideoxy chain-terminating DNA sequencing chemistry. From these humble beginnings followed an explosion of organism-specific, genome sequence information accessible via public database. Family A/B DNA polymerases from mesophilic/thermophilic bacteria/archaea were modified and tested in today’s standard capillary electrophoresis (CE) and NGS sequencing platforms. These enzymes were selected for their efficient incorporation of bulky dye-terminator and reversible dye-terminator nucleotides respectively. Third generation, real-time single molecule sequencing platform requires slightly different enzyme properties. Enterobacterial phage ϕ29 DNA polymerase copies long stretches of DNA and possesses a unique capability to efficiently incorporate terminal phosphate-labeled nucleoside polyphosphates. Furthermore, ϕ29 enzyme has also been utilized in emerging DNA sequencing technologies including nanopore-, and protein-transistor-based sequencing. DNA polymerase is, and will continue to be, a crucial component of sequencing technologies. PMID:25009536

  11. Retroviral DNA Sequences as a Means for Determining Ancient Diets.

    Directory of Open Access Journals (Sweden)

    Jessica I Rivera-Perez

    Full Text Available For ages, specialists from varying fields have studied the diets of the primeval inhabitants of our planet, detecting diet remains in archaeological specimens using a range of morphological and biochemical methods. As of recent, metagenomic ancient DNA studies have allowed for the comparison of the fecal and gut microbiomes associated to archaeological specimens from various regions of the world; however the complex dynamics represented in those microbial communities still remain unclear. Theoretically, similar to eukaryote DNA the presence of genes from key microbes or enzymes, as well as the presence of DNA from viruses specific to key organisms, may suggest the ingestion of specific diet components. In this study we demonstrate that ancient virus DNA obtained from coprolites also provides information reconstructing the host's diet, as inferred from sequences obtained from pre-Columbian coprolites. This depicts a novel and reliable approach to determine new components as well as validate the previously suggested diets of extinct cultures and animals. Furthermore, to our knowledge this represents the first description of the eukaryotic viral diversity found in paleofaeces belonging to pre-Columbian cultures.

  12. Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human

    Directory of Open Access Journals (Sweden)

    Chengchao Wu

    2017-02-01

    Full Text Available DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation.

  13. Next-generation sequencing technologies for environmental DNA research.

    Science.gov (United States)

    Shokralla, Shadi; Spall, Jennifer L; Gibson, Joel F; Hajibabaei, Mehrdad

    2012-04-01

    Since 2005, advances in next-generation sequencing technologies have revolutionized biological science. The analysis of environmental DNA through the use of specific gene markers such as species-specific DNA barcodes has been a key application of next-generation sequencing technologies in ecological and environmental research. Access to parallel, massive amounts of sequencing data, as well as subsequent improvements in read length and throughput of different sequencing platforms, is leading to a better representation of sample diversity at a reasonable cost. New technologies are being developed rapidly and have the potential to dramatically accelerate ecological and environmental research. The fast pace of development and improvements in next-generation sequencing technologies can reflect on broader and more robust applications in environmental DNA research. Here, we review the advantages and limitations of current next-generation sequencing technologies in regard to their application for environmental DNA analysis. © 2012 Blackwell Publishing Ltd.

  14. The Study of Correlation Structures of DNA Sequences A Critical Review

    CERN Document Server

    Li, W

    1997-01-01

    The study of correlation structure in the primary sequences of DNA is reviewed. The issues reviewed include: symmetries among 16 base-base correlation functions, accurate estimation of correlation measures, the relationship between $1/f$ and Lorentzian spectra, heterogeneity in DNA sequences, different modeling strategies of the correlation structure of DNA sequences, the difference of correlation structure between coding and non-coding regions (besides the period-3 pattern), and source of broad distribution of domain sizes. Although some of the results remain controversial, a body of work on this topic constitutes a good starting point for future studies.

  15. Phylogenetic Analysis of a ‘Jewel Orchid’ Genus Goodyera (Orchidaceae) Based on DNA Sequence Data from Nuclear and Plastid Regions

    Science.gov (United States)

    Hu, Chao; Tian, Huaizhen; Li, Hongqing; Hu, Aiqun; Xing, Fuwu; Bhattacharjee, Avishek; Hsu, Tianchuan; Kumar, Pankaj; Chung, Shihwen

    2016-01-01

    A molecular phylogeny of Asiatic species of Goodyera (Orchidaceae, Cranichideae, Goodyerinae) based on the nuclear ribosomal internal transcribed spacer (ITS) region and two chloroplast loci (matK and trnL-F) was presented. Thirty-five species represented by 132 samples of Goodyera were analyzed, along with other 27 genera/48 species, using Pterostylis longifolia and Chloraea gaudichaudii as outgroups. Bayesian inference, maximum parsimony and maximum likelihood methods were used to reveal the intrageneric relationships of Goodyera and its intergeneric relationships to related genera. The results indicate that: 1) Goodyera is not monophyletic; 2) Goodyera could be divided into four sections, viz., Goodyera, Otosepalum, Reticulum and a new section; 3) sect. Reticulum can be further divided into two subsections, viz., Reticulum and Foliosum, whereas sect. Goodyera can in turn be divided into subsections Goodyera and a new subsection. PMID:26927946

  16. Development of an automated procedure for fluorescent DNA sequencing.

    Science.gov (United States)

    Wilson, R K; Chen, C; Avdalovic, N; Burns, J; Hood, L

    1990-04-01

    We describe here the development of a procedure for complete automation of the dideoxynucleotide DNA sequencing chemistry using fluorescent dye-labeled oligonucleotide primers. This procedure combines rapid preparation of template DNA using a modification of the polymerase chain reaction, automation of the DNA sequencing reactions using a robotic laboratory workstation, and subsequent analysis of the fluorescent-labeled reaction products on a commercial automated fluorescent sequencer. Using this procedure, we were able to produce sufficient quantities of template DNA directly from bacterial colonies or bacteriophage plaques, perform the DNA sequencing reactions on these templates, and load the reaction products on the fluorescent DNA sequencer in a single work day. This scheme for automation of the fluorescent DNA sequencing method allows the fluorescent sequencer to be run at its full capacity every day and eliminates much of the labor required to obtain a high level of data output. Currently, we are able to perform and analyze 16 fluorescent-labeled reactions every day, with an average output of over 7000 bp per sequencer run.

  17. [Study on factors influencing DNA sequencing by automatic genetic analyzer].

    Science.gov (United States)

    Yan, Shaofei; Wang, Wei; Xu, Jin; Bai, Li; Gan, Xin; Li, Fengqin

    2015-05-01

    To acquire accurate and successful DNA sequencing in a cost-effective way by ABI3500xl automatic genetic analyzer. BigDye was diluted to 8, 16 and 32 times in PCR product sequencing. Three different methods including CENTRI-SEP kit, BigDye cleaning beads and ethanol-NaAc-EDTA were used to purify the sequencing PCR products. The results of DNA sequencing were correct when BigDye was diluted up to 16 times. The misreading of nucleic acid bases was found as BigDye was diluted to 32 times. All three purification methods provided acceptable DNA sequencing results. In terms of method for purification of PCR products, the CENTRI-SEP Kit was the most expensive but time-saving (0.5 h), while ethanol-NaAc-EDTA method was the most economical but time-consuming (2 h). The BigDye cleaning beads method was of a suitable purification time (1 h) but not fit for high-throughput DNA sequencing. BigDye should be diluted up to 16 times in DNA sequencing by ABI3500xl DNA analyzer. Although all three purification methods may promise DNA sequencing results with good quality, it is necessary to choose an appropriate one to keep the balance between time and cost on the basis of the lab condition.

  18. Collection and extraction of saliva DNA for next generation sequencing.

    Science.gov (United States)

    Goode, Michael R; Cheong, Soo Yeon; Li, Ning; Ray, William C; Bartlett, Christopher W

    2014-08-27

    The preferred source of DNA in human genetics research is blood, or cell lines derived from blood, as these sources yield large quantities of high quality DNA. However, DNA extraction from saliva can yield high quality DNA with little to no degradation/fragmentation that is suitable for a variety of DNA assays without the expense of a phlebotomist and can even be acquired through the mail. However, at present, no saliva DNA collection/extraction protocols for next generation sequencing have been presented in the literature. This protocol optimizes parameters of saliva collection/storage and DNA extraction to be of sufficient quality and quantity for DNA assays with the highest standards, including microarray genotyping and next generation sequencing.

  19. WEB-THERMODYN: sequence analysis software for profiling DNA helical stability

    OpenAIRE

    Huang, Yanlin; Kowalski, David

    2003-01-01

    WEB-THERMODYN analyzes DNA sequences and computes the DNA helical stability, i.e. the free energy required to unwind and separate the strands of the double helix. A helical stability profile across a selected DNA region or the entire sequence is generated by sliding-window analysis. WEB-THERMODYN can predict sites of low helical stability present at regulatory regions for transcription and replication and can be used to test the influence of mutations. The program can be accessed at: http://w...

  20. Simulating DNA coding sequence evolution with EvolveAGene 3.

    Science.gov (United States)

    Hall, Barry G

    2008-04-01

    Phylogenetic reconstruction based upon multiple alignments of molecular sequences is important to most branches of modern biology and is central to molecular evolution. Understanding the historical relationships among macromolecules depends upon computer programs that implement a variety of analytical methods. Because it is impossible to know those historical relationships with certainty, assessment of the accuracy of methods and the programs that implement them requires the use of programs that realistically simulate the evolution of DNA sequences. EvolveAGene 3 is a realistic coding sequence simulation program that separates mutation from selection and allows the user to set selection conditions, including variable regions of selection intensity within the sequence and variation in intensity of selection over branches. Variation includes base substitutions, insertions, and deletions. To the best of my knowledge, it is the only program available that simulates the evolution of intact coding sequences. Output includes the true tree and true alignments of the resulting coding sequence and corresponding protein sequences. A log file reports the frequencies of each kind of base substitution, the ratio of transition to transversion substitutions, the ratio of indel to base substitution mutations, and the numbers of silent and amino acid replacement mutations. The realism of the data sets has been assessed by comparing the d(N)/d(S) ratio, the ratio of transition to transversion substitutions, and the ratio of indel to base substitution mutations of the simulated data sets with those parameters of real data sets from the "gold standard" BaliBase collection of structural alignments. Results show that the data sets produced by EvolveAGene 3 are very similar to real data sets, and EvolveAGene 3 is therefore a realistic simulation program that can be used to evaluate a variety of programs and methods in molecular evolution.

  1. Cross-utilizing hyperchaotic and DNA sequences for image encryption

    Science.gov (United States)

    Zhan, Kun; Wei, Dong; Shi, Jinhui; Yu, Jun

    2017-01-01

    The hyperchaotic sequence and the DNA sequence are utilized jointly for image encryption. A four-dimensional hyperchaotic system is used to generate a pseudorandom sequence. The main idea is to apply the hyperchaotic sequence to almost all steps of the encryption. All intensity values of an input image are converted to a serial binary digit stream, and the bitstream is scrambled globally by the hyperchaotic sequence. DNA algebraic operation and complementation are performed between the hyperchaotic sequence and the DNA sequence to obtain a robust encryption performance. The experiment results demonstrate that the encryption algorithm achieves the performance of the state-of-the-art methods in term of quality, security, and robustness against noise and cropping attack.

  2. An auditory display tool for DNA sequence analysis.

    Science.gov (United States)

    Temple, Mark D

    2017-04-24

    DNA Sonification refers to the use of an auditory display to convey the information content of DNA sequence data. Six sonification algorithms are presented that each produce an auditory display. These algorithms are logically designed from the simple through to the more complex. Three of these parse individual nucleotides, nucleotide pairs or codons into musical notes to give rise to 4, 16 or 64 notes, respectively. Codons may also be parsed degenerately into 20 notes with respect to the genetic code. Lastly nucleotide pairs can be parsed as two separate frames or codons can be parsed as three reading frames giving rise to multiple streams of audio. The most informative sonification algorithm reads the DNA sequence as codons in three reading frames to produce three concurrent streams of audio in an auditory display. This approach is advantageous since start and stop codons in either frame have a direct affect to start or stop the audio in that frame, leaving the other frames unaffected. Using these methods, DNA sequences such as open reading frames or repetitive DNA sequences can be distinguished from one another. These sonification tools are available through a webpage interface in which an input DNA sequence can be processed in real time to produce an auditory display playable directly within the browser. The potential of this approach as an analytical tool is discussed with reference to auditory displays derived from test sequences including simple nucleotide sequences, repetitive DNA sequences and coding or non-coding genes. This study presents a proof-of-concept that some properties of a DNA sequence can be identified through sonification alone and argues for their inclusion within the toolkit of DNA sequence browsers as an adjunct to existing visual and analytical tools.

  3. Simulations Using Random-Generated DNA and RNA Sequences

    Science.gov (United States)

    Bryce, C. F. A.

    1977-01-01

    Using a very simple computer program written in BASIC, a very large number of random-generated DNA or RNA sequences are obtained. Students use these sequences to predict complementary sequences and translational products, evaluate base compositions, determine frequencies of particular triplet codons, and suggest possible secondary structures.…

  4. Illumina Sequencing of Bisulfite-Converted DNA Libraries.

    Science.gov (United States)

    Lizardi, Paul M; Yan, Qin; Wajapeyee, Narendra

    2017-11-01

    Here we describe a standard MethylC-seq protocol using single-read sequencing on an Illumina Genome Analyzer II platform. The protocol involves ligation of methylated sequencing adaptors to sonicated genomic DNA, gel purification, sodium bisulfite conversion, polymerase chain reaction (PCR) amplification, and sequencing. © 2017 Cold Spring Harbor Laboratory Press.

  5. Rapid and accurate identification of microorganisms contaminating cosmetic products based on DNA sequence homology.

    Science.gov (United States)

    Fujita, Y; Shibayama, H; Suzuki, Y; Karita, S; Takamatsu, S

    2005-12-01

    The aim of this study was to develop rapid and accurate procedures to identify microorganisms contaminating cosmetic products, based on the identity of the nucleotide sequences of the internal transcribed spacer (ITS) region of the ribosomal RNA coding DNA (rDNA). Five types of microorganisms were isolated from the inner portion of lotion bottle caps, skin care lotions, and cleansing gels. The rDNA ITS region of microorganisms was amplified through the use of colony-direct PCR or ordinal PCR using DNA extracts as templates. The nucleotide sequences of the amplified DNA were determined and subjected to homology search of a publicly available DNA database. Thereby, we obtained DNA sequences possessing high similarity with the query sequences from the databases of all the five organisms analyzed. The traditional identification procedure requires expert skills, and a time period of approximately 1 month to identify the microorganisms. On the contrary, 3-7 days were sufficient to complete all the procedures employed in the current method, including isolation and cultivation of organisms, DNA sequencing, and the database homology search. Moreover, it was possible to develop the skills necessary to perform the molecular techniques required for the identification procedures within 1 week. Consequently, the current method is useful for rapid and accurate identification of microorganisms, contaminating cosmetics.

  6. A mathematical model and numerical method for thermoelectric DNA sequencing

    Science.gov (United States)

    Shi, Liwei; Guilbeau, Eric J.; Nestorova, Gergana; Dai, Weizhong

    2014-05-01

    Single nucleotide polymorphisms (SNPs) are single base pair variations within the genome that are important indicators of genetic predisposition towards specific diseases. This study explores the feasibility of SNP detection using a thermoelectric sequencing method that measures the heat released when DNA polymerase inserts a deoxyribonucleoside triphosphate into a DNA strand. We propose a three-dimensional mathematical model that governs the DNA sequencing device with a reaction zone that contains DNA template/primer complex immobilized to the surface of the lower channel wall. The model is then solved numerically. Concentrations of reactants and the temperature distribution are obtained. Results indicate that when the nucleoside is complementary to the next base in the DNA template, polymerization occurs lengthening the complementary polymer and releasing thermal energy with a measurable temperature change, implying that the thermoelectric conceptual device for sequencing DNA may be feasible for identifying specific genes in individuals.

  7. Detection and mapping of mtDNA SNPs in Atlantic salmon using high throughput DNA sequencing

    Directory of Open Access Journals (Sweden)

    Olafsdottir Gudbjorg

    2011-04-01

    Full Text Available Abstract Background Approximately half of the mitochondrial genome inherent within 546 individual Atlantic salmon (Salmo salar derived from across the species' North Atlantic range, was selectively amplified with a novel combination of standard PCR and pyro-sequencing in a single run using 454 Titanium FLX technology (Roche, 454 Life Sciences. A unique combination of barcoded primers and a partitioned sequencing plate was employed to designate each sequence read to its original sample. The sequence reads were aligned according to the S. salar mitochondrial reference sequence (NC_001960.1, with the objective of identifying single nucleotide polymorphisms (SNPs. They were validated if they met with the following three stringent criteria: (i sequence reads were produced from both DNA strands; (ii SNPs were confirmed in a minimum of 90% of replicate sequence reads; and (iii SNPs occurred in more than one individual. Results Pyrosequencing generated a total of 179,826,884 bp of data, and 10,765 of the total 10,920 S. salar sequences (98.6% were assigned back to their original samples. The approach taken resulted in a total of 216 SNPs and 2 indels, which were validated and mapped onto the S. salar mitochondrial genome, including 107 SNPs and one indel not previously reported. An average of 27.3 sequence reads with a standard deviation of 11.7 supported each SNP per individual. Conclusion The study generated a mitochondrial SNP panel from a large sample group across a broad geographical area, reducing the potential for ascertainment bias, which has hampered previous studies. The SNPs identified here validate those identified in previous studies, and also contribute additional potentially informative loci for the future study of phylogeography and evolution in the Atlantic salmon. The overall success experienced with this novel application of HT sequencing of targeted regions suggests that the same approach could be successfully applied for SNP mining

  8. Next Generation Sequencing of Ancient DNA: Requirements, Strategies and Perspectives

    Directory of Open Access Journals (Sweden)

    Michael Knapp

    2010-07-01

    Full Text Available The invention of next-generation-sequencing has revolutionized almost all fields of genetics, but few have profited from it as much as the field of ancient DNA research. From its beginnings as an interesting but rather marginal discipline, ancient DNA research is now on its way into the centre of evolutionary biology. In less than a year from its invention next-generation-sequencing had increased the amount of DNA sequence data available from extinct organisms by several orders of magnitude. Ancient DNA  research is now not only adding a temporal aspect to evolutionary studies and allowing for the observation of evolution in real time, it also provides important data to help understand the origins of our own species. Here we review progress that has been made in next-generation-sequencing of ancient DNA over the past five years and evaluate sequencing strategies and future directions.

  9. An Evolution Based Biosensor Receptor DNA Sequence Generation Algorithm

    Directory of Open Access Journals (Sweden)

    Yupeng Zang

    2009-12-01

    Full Text Available A biosensor is composed of a bioreceptor, an associated recognition molecule, and a signal transducer that can selectively detect target substances for analysis. DNA based biosensors utilize receptor molecules that allow hybridization with the target analyte. However, most DNA biosensor research uses oligonucleotides as the target analytes and does not address the potential problems of real samples. The identification of recognition molecules suitable for real target analyte samples is an important step towards further development of DNA biosensors. This study examines the characteristics of DNA used as bioreceptors and proposes a hybrid evolution-based DNA sequence generating algorithm, based on DNA computing, to identify suitable DNA bioreceptor recognition molecules for stable hybridization with real target substances. The Traveling Salesman Problem (TSP approach is applied in the proposed algorithm to evaluate the safety and fitness of the generated DNA sequences. This approach improves efficiency and stability for enhanced and variable-length DNA sequence generation and allows extension to generation of variable-length DNA sequences with diverse receptor recognition requirements.

  10. An evolution based biosensor receptor DNA sequence generation algorithm.

    Science.gov (United States)

    Kim, Eungyeong; Lee, Malrey; Gatton, Thomas M; Lee, Jaewan; Zang, Yupeng

    2010-01-01

    A biosensor is composed of a bioreceptor, an associated recognition molecule, and a signal transducer that can selectively detect target substances for analysis. DNA based biosensors utilize receptor molecules that allow hybridization with the target analyte. However, most DNA biosensor research uses oligonucleotides as the target analytes and does not address the potential problems of real samples. The identification of recognition molecules suitable for real target analyte samples is an important step towards further development of DNA biosensors. This study examines the characteristics of DNA used as bioreceptors and proposes a hybrid evolution-based DNA sequence generating algorithm, based on DNA computing, to identify suitable DNA bioreceptor recognition molecules for stable hybridization with real target substances. The Traveling Salesman Problem (TSP) approach is applied in the proposed algorithm to evaluate the safety and fitness of the generated DNA sequences. This approach improves efficiency and stability for enhanced and variable-length DNA sequence generation and allows extension to generation of variable-length DNA sequences with diverse receptor recognition requirements.

  11. Mitochondrial DNA sequence-based phylogenetic relationship ...

    Indian Academy of Sciences (India)

    Introduction. Mitochondrial DNA (mtDNA) has been one of the most widely used molecular markers for phylogenetic studies in animals, because of its simple genomic structure (Avise. 2004). Among insects, the maximum .... 2007 Population structure of the malaria vector Anopheles dar- lingi in Rondonia, Brazilian Amazon, ...

  12. Sequence polymorphism of mitochondrial DNA in Japanese individuals from Gifu Prefecture.

    Science.gov (United States)

    Nagai, Atsushi; Nakamura, Isao; Shiraki, Futoru; Bunai, Yasuo; Ohya, Isao

    2003-03-01

    Sequence polymorphisms of the hypervariable region HV1 in mitochondrial DNA (mtDNA) were analyzed in a sample of 137 unrelated Japanese individuals living in Gifu Prefecture (central region of Japan) using polymerase chain reaction amplification and direct sequencing. Eighty-two different haplotypes resulting from 81 variable sites were found in the mtDNA HV1 region between positions 16061 and 16450. The most frequent haplotype (16223T, 16362C) was shared by ten individuals. The genetic diversity and the genetic identity were 0.985 and 0.022, respectively. The C-stretch region located around position 16189 was observed in 23.4% of this population sample. Sequence heteroplasmy at the position 16103 (A/G) was found in one individual.

  13. Repetitive sequence analysis and karyotyping reveals centromere-associated DNA sequences in radish (Raphanus sativus L.).

    Science.gov (United States)

    He, Qunyan; Cai, Zexi; Hu, Tianhua; Liu, Huijun; Bao, Chonglai; Mao, Weihai; Jin, Weiwei

    2015-04-18

    Radish (Raphanus sativus L., 2n = 2x = 18) is a major root vegetable crop especially in eastern Asia. Radish root contains various nutritions which play an important role in strengthening immunity. Repetitive elements are primary components of the genomic sequence and the most important factors in genome size variations in higher eukaryotes. To date, studies about repetitive elements of radish are still limited. To better understand genome structure of radish, we undertook a study to evaluate the proportion of repetitive elements and their distribution in radish. We conducted genome-wide characterization of repetitive elements in radish with low coverage genome sequencing followed by similarity-based cluster analysis. Results showed that about 31% of the genome was composed of repetitive sequences. Satellite repeats were the most dominating elements of the genome. The distribution pattern of three satellite repeat sequences (CL1, CL25, and CL43) on radish chromosomes was characterized using fluorescence in situ hybridization (FISH). CL1 was predominantly located at the centromeric region of all chromosomes, CL25 located at the subtelomeric region, and CL43 was a telomeric satellite. FISH signals of two satellite repeats, CL1 and CL25, together with 5S rDNA and 45S rDNA, provide useful cytogenetic markers to identify each individual somatic metaphase chromosome. The centromere-specific histone H3 (CENH3) has been used as a marker to identify centromere DNA sequences. One putative CENH3 (RsCENH3) was characterized and cloned from radish. Its deduced amino acid sequence shares high similarities to those of the CENH3s in Brassica species. An antibody against B. rapa CENH3, specifically stained radish centromeres. Immunostaining and chromatin immunoprecipitation (ChIP) tests with anti-BrCENH3 antibody demonstrated that both the centromere-specific retrotransposon (CR-Radish) and satellite repeat (CL1) are directly associated with RsCENH3 in radish. Proportions

  14. Mylodon darwinii DNA sequences from ancient fecal hair shafts.

    Science.gov (United States)

    Clack, Andrew A; MacPhee, Ross D E; Poinar, Hendrik N

    2012-01-20

    Preserved hair has been increasingly used as an ancient DNA source in high throughput sequencing endeavors, and it may actually offer several advantages compared to more traditional ancient DNA substrates like bone. However, cold environments have yielded the most informative ancient hair specimens, while its preservation, and thus utility, in temperate regions is not well documented. Coprolites could represent a previously underutilized preservation substrate for hairs, which, if present therein, represent macroscopic packages of specific cells that are relatively simple to separate, clean and process. In this pilot study, we report amplicons 147-152 base pairs in length (w/primers) from hair shafts preserved in a south Chilean coprolite attributed to Darwin's extinct ground sloth, Mylodon darwinii. Our results suggest that hairs preserved in coprolites from temperate cave environments can serve as an effective source of ancient DNA. This bodes well for potential molecular-based population and phylogeographic studies on sloths, several species of which have been understudied despite leaving numerous coprolites in caves across of the Americas. Copyright © 2011. Published by Elsevier GmbH.

  15. Efficiency of methylated DNA immunoprecipitation bisulphite sequencing for whole-genome DNA methylation analysis.

    Science.gov (United States)

    Jeong, Hae Min; Lee, Sangseon; Chae, Heejoon; Kim, RyongNam; Kwon, Mi Jeong; Oh, Ensel; Choi, Yoon-La; Kim, Sun; Shin, Young Kee

    2016-08-01

    We compared four common methods for measuring DNA methylation levels and recommended the most efficient method in terms of cost and coverage. The DNA methylation status of liver and stomach tissues was profiled using four different methods, whole-genome bisulphite sequencing (WG-BS), targeted bisulphite sequencing (Targeted-BS), methylated DNA immunoprecipitation sequencing (MeDIP-seq) and methylated DNA immunoprecipitation bisulphite sequencing (MeDIP-BS). We calculated DNA methylation levels using each method and compared the results. MeDIP-BS yielded the most similar DNA methylation profile to WG-BS, with 20 times less data, suggesting remarkable cost savings and coverage efficiency compared with the other methods. MeDIP-BS is a practical cost-effective method for analyzing whole-genome DNA methylation that is highly accurate at base-pair resolution.

  16. Gene Expression Divergence is Coupled to Evolution of DNA Structure in Coding Regions

    Science.gov (United States)

    Dai, Zhiming; Dai, Xianhua

    2011-01-01

    Sequence changes in coding region and regulatory region of the gene itself (cis) determine most of gene expression divergence between closely related species. But gene expression divergence between yeast species is not correlated with evolution of primary nucleotide sequence. This indicates that other factors in cis direct gene expression divergence. Here, we studied the contribution of DNA three-dimensional structural evolution as cis to gene expression divergence. We found that the evolution of DNA structure in coding regions and gene expression divergence are correlated in yeast. Similar result was also observed between Drosophila species. DNA structure is associated with the binding of chromatin remodelers and histone modifiers to DNA sequences in coding regions, which influence RNA polymerase II occupancy that controls gene expression level. We also found that genes with similar DNA structures are involved in the same biological process and function. These results reveal the previously unappreciated roles of DNA structure as cis-effects in gene expression. PMID:22125484

  17. Plasmonic Nanopores for Trapping, Controlling Displacement, and Sequencing of DNA.

    Science.gov (United States)

    Belkin, Maxim; Chao, Shu-Han; Jonsson, Magnus P; Dekker, Cees; Aksimentiev, Aleksei

    2015-11-24

    With the aim of developing a DNA sequencing methodology, we theoretically examine the feasibility of using nanoplasmonics to control the translocation of a DNA molecule through a solid-state nanopore and to read off sequence information using surface-enhanced Raman spectroscopy. Using molecular dynamics simulations, we show that high-intensity optical hot spots produced by a metallic nanostructure can arrest DNA translocation through a solid-state nanopore, thus providing a physical knob for controlling the DNA speed. Switching the plasmonic field on and off can displace the DNA molecule in discrete steps, sequentially exposing neighboring fragments of a DNA molecule to the pore as well as to the plasmonic hot spot. Surface-enhanced Raman scattering from the exposed DNA fragments contains information about their nucleotide composition, possibly allowing the identification of the nucleotide sequence of a DNA molecule transported through the hot spot. The principles of plasmonic nanopore sequencing can be extended to detection of DNA modifications and RNA characterization.

  18. Plasmonic Nanopores for Trapping, Controlling Displacement, and Sequencing of DNA

    Science.gov (United States)

    2015-01-01

    With the aim of developing a DNA sequencing methodology, we theoretically examine the feasibility of using nanoplasmonics to control the translocation of a DNA molecule through a solid-state nanopore and to read off sequence information using surface-enhanced Raman spectroscopy. Using molecular dynamics simulations, we show that high-intensity optical hot spots produced by a metallic nanostructure can arrest DNA translocation through a solid-state nanopore, thus providing a physical knob for controlling the DNA speed. Switching the plasmonic field on and off can displace the DNA molecule in discrete steps, sequentially exposing neighboring fragments of a DNA molecule to the pore as well as to the plasmonic hot spot. Surface-enhanced Raman scattering from the exposed DNA fragments contains information about their nucleotide composition, possibly allowing the identification of the nucleotide sequence of a DNA molecule transported through the hot spot. The principles of plasmonic nanopore sequencing can be extended to detection of DNA modifications and RNA characterization. PMID:26401685

  19. Detecting DNA modifications from SMRT sequencing data by modeling sequence context dependence of polymerase kinetic.

    Directory of Open Access Journals (Sweden)

    Zhixing Feng

    Full Text Available DNA modifications such as methylation and DNA damage can play critical regulatory roles in biological systems. Single molecule, real time (SMRT sequencing technology generates DNA sequences as well as DNA polymerase kinetic information that can be used for the direct detection of DNA modifications. We demonstrate that local sequence context has a strong impact on DNA polymerase kinetics in the neighborhood of the incorporation site during the DNA synthesis reaction, allowing for the possibility of estimating the expected kinetic rate of the enzyme at the incorporation site using kinetic rate information collected from existing SMRT sequencing data (historical data covering the same local sequence contexts of interest. We develop an Empirical Bayesian hierarchical model for incorporating historical data. Our results show that the model could greatly increase DNA modification detection accuracy, and reduce requirement of control data coverage. For some DNA modifications that have a strong signal, a control sample is not even needed by using historical data as alternative to control. Thus, sequencing costs can be greatly reduced by using the model. We implemented the model in a R package named seqPatch, which is available at https://github.com/zhixingfeng/seqPatch.

  20. Semiconductor-based DNA sequencing of histone modification states

    Science.gov (United States)

    Cheng, Christine S.; Rai, Kunal; Garber, Manuel; Hollinger, Andrew; Robbins, Dana; Anderson, Scott; Macbeth, Alyssa; Tzou, Austin; Carneiro, Mauricio O.; Raychowdhury, Raktima; Russ, Carsten; Hacohen, Nir; Gershenwald, Jeffrey E.; Lennon, Niall; Nusbaum, Chad; Chin, Lynda; Regev, Aviv; Amit, Ido

    2013-01-01

    The recent development of a semiconductor-based, non-optical DNA sequencing technology promises scalable, low-cost and rapid sequence data production. The technology has previously been applied mainly to genomic sequencing and targeted re-sequencing. Here we demonstrate the utility of Ion Torrent semiconductor-based sequencing for sensitive, efficient and rapid chromatin immunoprecipitation followed by sequencing (ChIP-seq) through the application of sample preparation methods that are optimized for ChIP-seq on the Ion Torrent platform. We leverage this method for epigenetic profiling of tumour tissues. PMID:24157732

  1. Sequences from first settlers reveal rapid evolution in Icelandic mtDNA pool.

    Science.gov (United States)

    Helgason, Agnar; Lalueza-Fox, Carles; Ghosh, Shyamali; Sigurethardóttir, Sigrún; Sampietro, Maria Lourdes; Gigli, Elena; Baker, Adam; Bertranpetit, Jaume; Arnadóttir, Lilja; Thornorsteinsdottir, Unnur; Stefánsson, Kári

    2009-01-01

    A major task in human genetics is to understand the nature of the evolutionary processes that have shaped the gene pools of contemporary populations. Ancient DNA studies have great potential to shed light on the evolution of populations because they provide the opportunity to sample from the same population at different points in time. Here, we show that a sample of mitochondrial DNA (mtDNA) control region sequences from 68 early medieval Icelandic skeletal remains is more closely related to sequences from contemporary inhabitants of Scotland, Ireland, and Scandinavia than to those from the modern Icelandic population. Due to a faster rate of genetic drift in the Icelandic mtDNA pool during the last 1,100 years, the sequences carried by the first settlers were better preserved in their ancestral gene pools than among their descendants in Iceland. These results demonstrate the inferential power gained in ancient DNA studies through the application of population genetics analyses to relatively large samples.

  2. ATRF Houses the Latest DNA Sequencing Technologies | Poster

    Science.gov (United States)

    By Ashley DeVine, Staff Writer By the end of October, the Advanced Technology Research Facility (ATRF) will be one of the few facilities in the world to house all of the latest DNA sequencing technologies.

  3. DNA sequence and prokaryotic expression analysis of vitellogenin ...

    African Journals Online (AJOL)

    USER

    2010-02-08

    Feb 8, 2010 ... In this study, the DNA sequence of vitellogenin from Antheraea pernyi (Ap-Vg) was identified and its functional ..... silk-producing insects based on 16S ribosomal RNA and cytochrome oxidase subunit I genes. J. Genet.

  4. DNA sequencing using polymerase substrate-binding kinetics.

    Science.gov (United States)

    Previte, Michael John Robert; Zhou, Chunhong; Kellinger, Matthew; Pantoja, Rigo; Chen, Cheng-Yao; Shi, Jin; Wang, BeiBei; Kia, Amirali; Etchin, Sergey; Vieceli, John; Nikoomanzar, Ali; Bomati, Erin; Gloeckner, Christian; Ronaghi, Mostafa; He, Molly Min

    2015-01-23

    Next-generation sequencing (NGS) has transformed genomic research by decreasing the cost of sequencing. However, whole-genome sequencing is still costly and complex for diagnostics purposes. In the clinical space, targeted sequencing has the advantage of allowing researchers to focus on specific genes of interest. Routine clinical use of targeted NGS mandates inexpensive instruments, fast turnaround time and an integrated and robust workflow. Here we demonstrate a version of the Sequencing by Synthesis (SBS) chemistry that potentially can become a preferred targeted sequencing method in the clinical space. This sequencing chemistry uses natural nucleotides and is based on real-time recording of the differential polymerase/DNA-binding kinetics in the presence of correct or mismatch nucleotides. This ensemble SBS chemistry has been implemented on an existing Illumina sequencing platform with integrated cluster amplification. We discuss the advantages of this sequencing chemistry for targeted sequencing as well as its limitations for other applications.

  5. Levenshtein error-correcting barcodes for multiplexed DNA sequencing.

    Science.gov (United States)

    Buschmann, Tilo; Bystrykh, Leonid V

    2013-09-11

    High-throughput sequencing technologies are improving in quality, capacity and costs, providing versatile applications in DNA and RNA research. For small genomes or fraction of larger genomes, DNA samples can be mixed and loaded together on the same sequencing track. This so-called multiplexing approach relies on a specific DNA tag or barcode that is attached to the sequencing or amplification primer and hence appears at the beginning of the sequence in every read. After sequencing, each sample read is identified on the basis of the respective barcode sequence.Alterations of DNA barcodes during synthesis, primer ligation, DNA amplification, or sequencing may lead to incorrect sample identification unless the error is revealed and corrected. This can be accomplished by implementing error correcting algorithms and codes. This barcoding strategy increases the total number of correctly identified samples, thus improving overall sequencing efficiency. Two popular sets of error-correcting codes are Hamming codes and Levenshtein codes. Levenshtein codes operate only on words of known length. Since a DNA sequence with an embedded barcode is essentially one continuous long word, application of the classical Levenshtein algorithm is problematic. In this paper we demonstrate the decreased error correction capability of Levenshtein codes in a DNA context and suggest an adaptation of Levenshtein codes that is proven of efficiently correcting nucleotide errors in DNA sequences. In our adaption we take the DNA context into account and redefine the word length whenever an insertion or deletion is revealed. In simulations we show the superior error correction capability of the new method compared to traditional Levenshtein and Hamming based codes in the presence of multiple errors. We present an adaptation of Levenshtein codes to DNA contexts capable of correction of a pre-defined number of insertion, deletion, and substitution mutations. Our improved method is additionally capable

  6. Levenshtein error-correcting barcodes for multiplexed DNA sequencing

    Science.gov (United States)

    2013-01-01

    Background High-throughput sequencing technologies are improving in quality, capacity and costs, providing versatile applications in DNA and RNA research. For small genomes or fraction of larger genomes, DNA samples can be mixed and loaded together on the same sequencing track. This so-called multiplexing approach relies on a specific DNA tag or barcode that is attached to the sequencing or amplification primer and hence appears at the beginning of the sequence in every read. After sequencing, each sample read is identified on the basis of the respective barcode sequence. Alterations of DNA barcodes during synthesis, primer ligation, DNA amplification, or sequencing may lead to incorrect sample identification unless the error is revealed and corrected. This can be accomplished by implementing error correcting algorithms and codes. This barcoding strategy increases the total number of correctly identified samples, thus improving overall sequencing efficiency. Two popular sets of error-correcting codes are Hamming codes and Levenshtein codes. Result Levenshtein codes operate only on words of known length. Since a DNA sequence with an embedded barcode is essentially one continuous long word, application of the classical Levenshtein algorithm is problematic. In this paper we demonstrate the decreased error correction capability of Levenshtein codes in a DNA context and suggest an adaptation of Levenshtein codes that is proven of efficiently correcting nucleotide errors in DNA sequences. In our adaption we take the DNA context into account and redefine the word length whenever an insertion or deletion is revealed. In simulations we show the superior error correction capability of the new method compared to traditional Levenshtein and Hamming based codes in the presence of multiple errors. Conclusion We present an adaptation of Levenshtein codes to DNA contexts capable of correction of a pre-defined number of insertion, deletion, and substitution mutations. Our improved

  7. Agarose Gel Size Selection for DNA Sequencing Libraries.

    Science.gov (United States)

    Mardis, Elaine; McCombie, W Richard

    2017-08-01

    Agarose gel electrophoresis may be used to purify fragmented genomic DNA after ligation of adaptors. After electrophoresis, the region of the gel containing the desired size range of DNA is excised, and the DNA is subsequently extracted from the gel and purified by passage through a spin column. © 2017 Cold Spring Harbor Laboratory Press.

  8. Isolation and sequence analysis of a chalcone synthase cDNA of Matthiola incana R. Br. (Brassicaceae).

    Science.gov (United States)

    Epping, B; Kittel, M; Ruhnau, B; Hemleben, V

    1990-06-01

    A cDNA clone (pcM12) of the chalcone synthase (CHS) of Matthiola incana R. Br. (Brassicaceae) was isolated from a cDNA library, sequenced and analysed. It comprises the complete coding sequence for the CHS and 5' and 3' untranslated regions. The deduced amino acid sequence shows that the Matthiola incana CHS consists of 394 amino acid residues. Comparison with CHS amino acid sequences of other plants indicates more than 82% homology.

  9. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system.

    Science.gov (United States)

    Schloss, Patrick D; Jenior, Matthew L; Koumpouras, Charles C; Westcott, Sarah L; Highlander, Sarah K

    2016-01-01

    Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.

  10. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system

    Directory of Open Access Journals (Sweden)

    Patrick D. Schloss

    2016-03-01

    Full Text Available Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina’s MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3–V5, V1–V3, V1–V5, V1–V6, and V1–V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1–V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina’s MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.

  11. Primary structure of a lipoxygenase from barley grain as deduced from its cDNA sequence

    NARCIS (Netherlands)

    Mechelen, J.R. van; Smits, M.; Douma, A.C.; Rouster, J.; Cameron-Mills, V.; Heidekamp, F.; Valk, B.E.

    1995-01-01

    A full length cDNA sequence for a barley grain lipoxygenase was obtained. It includes a 5' untranslated region of 69 nucleotides, an open reading frame of 2586 nucleotides encoding a protein of 862 amino acid residues and a 3' untranslated region of 142 nucleotides. The molecular mass of the encoded

  12. Directed Evolution of DNA Polymerases for Next Generation Sequencing

    Science.gov (United States)

    Leconte, Aaron M.; Patel, Maha P.; Sass, Lauryn E.; McInerney, Peter; Jarosz, Mirna; Kung, Li; Bowers, Jayson L.; Buzby, Philip R.; Efcavitch, J. William; Romesberg, Floyd E.

    2011-01-01

    We present the application of an activity-based phage display method to identify DNA polymerases tailored for next generation sequencing applications. Using this approach, we identify a mutant of Taq DNA polymerase that incorporates the fluorophore-labeled dA, dT, dC, and dG substrates ~50 to 400-fold more efficiently into scarred primers in solution and that also demonstrates significantly improved performance under actual sequencing conditions. PMID:20629059

  13. Local alignment of two-base encoded DNA sequence.

    Science.gov (United States)

    Homer, Nils; Merriman, Barry; Nelson, Stanley F

    2009-06-09

    DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data.

  14. Tracking of intercalary DNA sequences integrated into tandem repeat arrays in rye Secale vavilovii

    Directory of Open Access Journals (Sweden)

    Magdalena Achrem

    2017-06-01

    Full Text Available The structure of repetitive sequences of the JNK block present in the pericentromeric region of the 2RL chromosome was studied in Secale vavilovii. Amplification of sequences present between the JNK sequences led to the identification of seven abnormal DNA fragments. Two of these fragments showed high similarity to the glutamate 5-kinase gene and putative alcohol dehydrogenase gene of trypanosomatid from the genus Leishmania, whose presence can be explained by horizontal gene transfer (HGT. Other fragments were similar to mitochondrial gene for ribosomal protein S4 in plants and to the glycoprotein (G gene of the IHNV virus. Presumably, they are pseudogenes inserted into the JNK heterochromatin region. Within this region, also fragments similar to the rye repetitive sequence and chromosome 3B in wheat were found. There is no known mechanism that would explain how foreign sequences were inserted into the block region of tandem repetitive sequences of the JNK family.

  15. SWORDS: A statistical tool for analysing large DNA sequences

    Indian Academy of Sciences (India)

    Unknown

    Unusual frequencies of certain DNA words in. Escherichia coli and virus genomes and possible statis- tical and biological implications of such over- and under- representation of those words have been studied in the literature based on Markov chain models for DNA sequences (Phillips et al 1987a,b; Prum et al 1995; Leung.

  16. Maternal Plasma DNA and RNA Sequencing for Prenatal Testing

    NARCIS (Netherlands)

    Tamminga, Saskia; van Maarle, Merel; Henneman, Lidewij; Oudejans, Cees B. M.; Cornel, Martina C.; Sistermans, Erik A.

    2016-01-01

    Cell-free DNA (cf DNA) testing has recently become indispensable in diagnostic testing and screening. In the prenatal setting, this type of testing is often called noninvasive prenatal testing (NIPT). With a number of techniques, using either next-generation sequencing or single nucleotide

  17. PNA Directed Sequence Addressed Self-Assembly of DNA Nanostructures

    DEFF Research Database (Denmark)

    Nielsen, Peter E.

    2008-01-01

    sequence specifically recognize another PNA oligomer. We describe how such three domain PNAs have utility for assembling dsDNA grid and clover leaf structures, and in combination with SNAP-tag technol. of protein dsDNA structures. (c) 2008 American Institute of Physics. [on SciFinder (R)] Udgivelsesdato...

  18. DNA Sequences of RAPD Fragments in the Egyptian cotton ...

    African Journals Online (AJOL)

    Random Amplified Polymorphic DNAs (RAPDs) is a DNA polymorphism assay based on the amplification of random DNA segments with single primers of arbitrary nucleotide sequence. Despite the fact that the RAPD technique has become a very powerful tool and has found use in numerous applications, yet, the nature of ...

  19. A novel DNA sequence database for analyzing human demographic history.

    Science.gov (United States)

    Wall, Jeffrey D; Cox, Murray P; Mendez, Fernando L; Woerner, August; Severson, Tesa; Hammer, Michael F

    2008-08-01

    While there are now extensive databases of human genomic sequences from both private and public efforts to catalog human nucleotide variation, there are very few large-scale surveys designed for the purpose of analyzing human population history. Demographic inference from patterns of SNP variation in current large public databases is complicated by ascertainment biases associated with SNP discovery and the ways that populations and regions of the genome are sampled. Here, we present results from a resequencing survey of 40 independent intergenic regions on the autosomes and X chromosome comprising ~210 kb from each of 90 humans from six geographically diverse populations (i.e., a total of ~18.9 Mb). Unlike other public DNA sequence databases, we include multiple indigenous populations that serve as important reservoirs of human genetic diversity, such as the San of Namibia, the Biaka of the Central African Republic, and Melanesians from Papua New Guinea. In fact, only 20% of the SNPs that we find are contained in the HapMap database. We identify several key differences in patterns of variability in our database compared with other large public databases, including higher levels of nucleotide diversity within populations, greater levels of differentiation between populations, and significant differences in the frequency spectrum. Because variants at loci included in this database are less likely to be subject to ascertainment biases or linked to sites under selection, these data will be more useful for accurately reconstructing past changes in size and structure of human populations.

  20. Sequence dependence of electron-induced DNA strand breakage revealed by DNA nanoarrays

    DEFF Research Database (Denmark)

    Keller, Adrian; Rackwitz, Jenny; Cauët, Emilie

    2014-01-01

    The electronic structure of DNA is determined by its nucleotide sequence, which is for instance exploited in molecular electronics. Here we demonstrate that also the DNA strand breakage induced by low-energy electrons (18 eV) depends on the nucleotide sequence. To determine the absolute cross sec...

  1. Divergence between samples of chimpanzee and human DNA sequences is 5%, counting indels

    Science.gov (United States)

    Britten, Roy J.

    2002-01-01

    Five chimpanzee bacterial artificial chromosome (BAC) sequences (described in GenBank) have been compared with the best matching regions of the human genome sequence to assay the amount and kind of DNA divergence. The conclusion is the old saw that we share 98.5% of our DNA sequence with chimpanzee is probably in error. For this sample, a better estimate would be that 95% of the base pairs are exactly shared between chimpanzee and human DNA. In this sample of 779 kb, the divergence due to base substitution is 1.4%, and there is an additional 3.4% difference due to the presence of indels. The gaps in alignment are present in about equal amounts in the chimp and human sequences. They occur equally in repeated and nonrepeated sequences, as detected by repeatmasker (http://ftp.genome.washington.edu/RM/RepeatMasker.html). PMID:12368483

  2. A method for cloning and sequencing long palindromic DNA junctions.

    Science.gov (United States)

    Rattray, Alison J

    2004-11-08

    DNA sequences containing long adjacent inverted repeats (palindromes) are inherently unstable and are associated with many types of chromosomal rearrangements. The instability associated with palindromic sequences also creates difficulties in their molecular analysis: long palindromes (>250 bp/arm) are highly unstable in Escherichia coli, and cannot be directly PCR amplified or sequenced due to their propensity to form intra-strand hairpins. Here, we show that DNA molecules containing long palindromes (>900 bp/arm) can be transformed and stably maintained in Saccharomyces cerevisiae cells lacking a functional SAE2 gene. Treatment of the palindrome-containing DNA with sodium bisulfite at high temperature results in deamination of cytosine, converting it to uracil and thus reducing the propensity to form intra-strand hairpins. The bisulfite-treated DNA can then be PCR amplified, cloned and sequenced, allowing determination of the nucleotide sequence of the junctions. Our data demonstrates that long palindromes with either no spacer (perfect) or a 2 bp spacer can be stably maintained, recovered and sequenced from sae2Delta yeast cells. Since DNA sequences from mammalian cells can be gap repaired by their co-transformation into yeast cells with an appropriate vector, the methods described in this manuscript should provide some of the necessary tools to isolate and characterize palindromic junctions from mammalian cells.

  3. Sequence of the dog immunoglobulin alpha and epsilon constant region genes

    Energy Technology Data Exchange (ETDEWEB)

    Patel, M.; Selinger, D.; Mark, G.E.; Hollis, G.F.; Hickey, G.J. [Merck Research Labs., Rathway, NJ (United States)

    1995-03-01

    The immunoglobulin alpha (IGHAC) and epsilon (IGHEC) germline constant region genes were isolated from a dog liver genomic DNA library. Sequence analysis indicates that the dog IGHEC gene is encoded by four exons spread out over 1.7 kilobases (kb). The IGHAC sequence encompasses 1.5 kb and includes all three constant region coding exons. The complete exon/intron sequence of these genes is described. 28 refs., 2 figs., 2 tabs.

  4. Sequence Capture versus Restriction Site Associated DNA Sequencing for Shallow Systematics.

    Science.gov (United States)

    Harvey, Michael G; Smith, Brian Tilston; Glenn, Travis C; Faircloth, Brant C; Brumfield, Robb T

    2016-09-01

    Sequence capture and restriction site associated DNA sequencing (RAD-Seq) are two genomic enrichment strategies for applying next-generation sequencing technologies to systematics studies. At shallow timescales, such as within species, RAD-Seq has been widely adopted among researchers, although there has been little discussion of the potential limitations and benefits of RAD-Seq and sequence capture. We discuss a series of issues that may impact the utility of sequence capture and RAD-Seq data for shallow systematics in non-model species. We review prior studies that used both methods, and investigate differences between the methods by re-analyzing existing RAD-Seq and sequence capture data sets from a Neotropical bird (Xenops minutus). We suggest that the strengths of RAD-Seq data sets for shallow systematics are the wide dispersion of markers across the genome, the relative ease and cost of laboratory work, the deep coverage and read overlap at recovered loci, and the high overall information that results. Sequence capture's benefits include flexibility and repeatability in the genomic regions targeted, success using low-quality samples, more straightforward read orthology assessment, and higher per-locus information content. The utility of a method in systematics, however, rests not only on its performance within a study, but on the comparability of data sets and inferences with those of prior work. In RAD-Seq data sets, comparability is compromised by low overlap of orthologous markers across species and the sensitivity of genetic diversity in a data set to an interaction between the level of natural heterozygosity in the samples examined and the parameters used for orthology assessment. In contrast, sequence capture of conserved genomic regions permits interrogation of the same loci across divergent species, which is preferable for maintaining comparability among data sets and studies for the purpose of drawing general conclusions about the impact of

  5. Polymorphisms of mitochondrial DNA control region are associated to endometriosis.

    Science.gov (United States)

    Andres, Marina Paula; Cardena, Mari Maki Siria Godoy; Fridman, Cintia; Podgaec, Sergio

    2017-11-09

    Polymorphisms in the control region of mitochondrial DNA (mtDNA) can affect generation of reactive oxygen species and impact in the pathogenesis of endometriosis. This study investigated the association of mtDNA polymorphisms with endometriosis. Patients were divided in two groups: endometriosis (n = 90) and control (n = 92). Inclusion criteria were as follows: women between 18 and 50 years, with histological diagnosis and surgical staging of endometriosis (endometriosis group) or undergoing gynecological surgery for tubal ligation, leiomyoma, or ovarian cysts, with no evidence of endometriosis (control group). DNA extraction was performed from peripheral blood. Sanger sequencing of mtDNA control region was performed, and polymorphisms were determined comparing the sequences obtained with the Cambridge Reference Sequence. The frequency of polymorphisms T16217C (14.4 and 5.4% of endometriosis and control group, respectively; p = 0.049) and G499A (13.3 vs. 4.3%; p = 0.038) was higher in the endometriosis group, while T146C (32.6 vs. 18.9%; p = 0.042) and 573.2C (5.6 vs. 29.3%; p < 0.001) were lower. No difference was observed in haplogroups between groups. mtDNA polymorphisms T16217C and G499A were associated with endometriosis, while T416C and 573.2C were shown to be associated with an absence of disease.

  6. Z-DNA-forming sequences generate large-scale deletions in mammalian cells.

    Science.gov (United States)

    Wang, Guliang; Christensen, Laura A; Vasquez, Karen M

    2006-02-21

    Spontaneous chromosomal breakages frequently occur at genomic hot spots in the absence of DNA damage and can result in translocation-related human disease. Chromosomal breakpoints are often mapped near purine-pyrimidine Z-DNA-forming sequences in human tumors. However, it is not known whether Z-DNA plays a role in the generation of these chromosomal breakages. Here, we show that Z-DNA-forming sequences induce high levels of genetic instability in both bacterial and mammalian cells. In mammalian cells, the Z-DNA-forming sequences induce double-strand breaks nearby, resulting in large-scale deletions in 95% of the mutants. These Z-DNA-induced double-strand breaks in mammalian cells are not confined to a specific sequence but rather are dispersed over a 400-bp region, consistent with chromosomal breakpoints in human diseases. This observation is in contrast to the mutations generated in Escherichia coli that are predominantly small deletions within the repeats. We found that the frequency of small deletions is increased by replication in mammalian cell extracts. Surprisingly, the large-scale deletions generated in mammalian cells are, at least in part, replication-independent and are likely initiated by repair processing cleavages surrounding the Z-DNA-forming sequence. These results reveal that mammalian cells process Z-DNA-forming sequences in a strikingly different fashion from that used by bacteria. Our data suggest that Z-DNA-forming sequences may be causative factors for gene translocations found in leukemias and lymphomas and that certain cellular conditions such as active transcription may increase the risk of Z-DNA-related genetic instability.

  7. Sequencing and Analysis of Neanderthal Genomic DNA

    OpenAIRE

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo, Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library a...

  8. Cloning, sequencing and expression of cDNA encoding growth ...

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Biosciences; Volume 26; Issue 3. Cloning, sequencing ... The full-length cDNA clone is 1132 bp in length, coding for an open reading frame (ORF) of 603 bp; the reading frame encodes a putative polypeptide of 200 amino acids including the signal sequence of 22 amino acids. The 5′ and 3′ ...

  9. The properties and applications of single-molecule DNA sequencing

    Science.gov (United States)

    2011-01-01

    Single-molecule sequencing enables DNA or RNA to be sequenced directly from biological samples, making it well-suited for diagnostic and clinical applications. Here we review the properties and applications of this rapidly evolving and promising technology. PMID:21349208

  10. Mitochondrial DNA sequence-based phylogenetic relationship ...

    Indian Academy of Sciences (India)

    The phylogenetic relationships among flesh flies of the family Sarcophagidae has been based mainly on the morphology of male genitalia. However, the male genitalic character-based relationships are far from satisfactory. Therefore, in the present study mitochondrial DNA has been used as marker to unravel genetic ...

  11. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  12. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing

    Science.gov (United States)

    Lou, Dianne I.; Hussmann, Jeffrey A.; McBee, Ross M.; Acevedo, Ashley; Andino, Raul; Press, William H.; Sawyer, Sara L.

    2013-01-01

    A major limitation of high-throughput DNA sequencing is the high rate of erroneous base calls produced. For instance, Illumina sequencing machines produce errors at a rate of ∼0.1–1 × 10−2 per base sequenced. These technologies typically produce billions of base calls per experiment, translating to millions of errors. We have developed a unique library preparation strategy, “circle sequencing,” which allows for robust downstream computational correction of these errors. In this strategy, DNA templates are circularized, copied multiple times in tandem with a rolling circle polymerase, and then sequenced on any high-throughput sequencing machine. Each read produced is computationally processed to obtain a consensus sequence of all linked copies of the original molecule. Physically linking the copies ensures that each copy is independently derived from the original molecule and allows for efficient formation of consensus sequences. The circle-sequencing protocol precedes standard library preparations and is therefore suitable for a broad range of sequencing applications. We tested our method using the Illumina MiSeq platform and obtained errors in our processed sequencing reads at a rate as low as 7.6 × 10−6 per base sequenced, dramatically improving the error rate of Illumina sequencing and putting error on par with low-throughput, but highly accurate, Sanger sequencing. Circle sequencing also had substantially higher efficiency and lower cost than existing barcode-based schemes for correcting sequencing errors. PMID:24243955

  13. Comparison of DNA fragments as donor DNAs upon sequence conversion of cleaved target DNA.

    Science.gov (United States)

    Suzuki, Tetsuya; Imada, Takashi; Komatsu, Yasuo; Kamiya, Hiroyuki

    2017-06-03

    Pinpoint sequence alteration (genome editing) by the combination of the site-specific cleavage of a target DNA and a donor nucleic acid has attracted much attention and the sequence of the target DNA is expected to be changed to that of a donor nucleic acid. In most cases, oligodeoxyribonucleotides (ODNs) and plasmid DNAs have been used as donors. However, a several hundred-base single-stranded (ss) DNA fragment and a 5'-tailed duplex (TD) accomplished the desired sequence changes without DNA cleavage, and might serve as better donors for the cleaved target DNA than ODNs and plasmid DNAs. In this study, sequence conversion efficiencies were compared with various donor DNAs in model sequence alteration experiments, using episomal DNA. The efficiencies with the ss and TD fragments were higher than those with the ODN and plasmid DNA. The sequence change by the TD seemed somewhat less efficient but slightly more accurate than that by the ss DNA fragment. These results suggested that the ss and TD fragments are better donors for targeted sequence alteration.

  14. Sequencing of adenine in DNA by scanning tunneling microscopy

    Science.gov (United States)

    Tanaka, Hiroyuki; Taniguchi, Masateru

    2017-08-01

    The development of DNA sequencing technology utilizing the detection of a tunnel current is important for next-generation sequencer technologies based on single-molecule analysis technology. Using a scanning tunneling microscope, we previously reported that dI/dV measurements and dI/dV mapping revealed that the guanine base (purine base) of DNA adsorbed onto the Cu(111) surface has a characteristic peak at V s = -1.6 V. If, in addition to guanine, the other purine base of DNA, namely, adenine, can be distinguished, then by reading all the purine bases of each single strand of a DNA double helix, the entire base sequence of the original double helix can be determined due to the complementarity of the DNA base pair. Therefore, the ability to read adenine is important from the viewpoint of sequencing. Here, we report on the identification of adenine by STM topographic and spectroscopic measurements using a synthetic DNA oligomer and viral DNA.

  15. cDNA cloning and sequencing of ostrich Growth hormone

    Directory of Open Access Journals (Sweden)

    Doosti Abbas

    2012-01-01

    Full Text Available In recent years, industrial breeding of ostrich (Struthio camelus has been widely developed in Iran. Growth hormone (GH is a peptide hormone that stimulates growth and cell reproduction in different animals. The aim of this study was to clone and sequence the ostrich growth hormone gene in E. coli, done for the first time in Iran. The cDNA that encodes ostrich growth hormone was isolated from total mRNA of the pituitary gland and amplified by RT-PCR using GH specific PCR primers. Then GH cDNA was cloned by T/A cloning technique and the construct was transformed into E. coli. Finally, GH cDNA sequence was submitted to the GenBank (Accession number: JN559394. The results of present study showed that GH cDNA was successfully cloned in E. coli. Sequencing confirmed that GH cDNA was cloned and that the length of ostrich GH cDNA was 672 bp; BLAST search showed that the sequence of growth hormone cDNA of the ostrich from Iran has 100% homology with other records existing in GenBank.

  16. Sequence-Dependent Persistence Length of Long DNA

    Science.gov (United States)

    Chuang, Hui-Min; Reifenberger, Jeffrey G.; Cao, Han; Dorfman, Kevin D.

    2017-12-01

    Using a high-throughput genome-mapping approach, we obtained circa 50 million measurements of the extension of internal human DNA segments in a 41 nm ×41 nm nanochannel. The underlying DNA sequences, obtained by mapping to the reference human genome, are 2.5-393 kilobase pairs long and contain percent GC contents between 32.5% and 60%. Using Odijk's theory for a channel-confined wormlike chain, these data reveal that the DNA persistence length increases by almost 20% as the percent GC content increases. The increased persistence length is rationalized by a model, containing no adjustable parameters, that treats the DNA as a statistical terpolymer with a sequence-dependent intrinsic persistence length and a sequence-independent electrostatic persistence length.

  17. Thermodynamics of sequence-specific binding of PNA to DNA

    DEFF Research Database (Denmark)

    Ratilainen, T; Holmén, A; Tuite, E

    2000-01-01

    For further characterization of the hybridization properties of peptide nucleic acids (PNAs), the thermodynamics of hybridization of mixed sequence PNA-DNA duplexes have been studied. We have characterized the binding of PNA to DNA in terms of binding affinity (perfectly matched duplexes) and seq......For further characterization of the hybridization properties of peptide nucleic acids (PNAs), the thermodynamics of hybridization of mixed sequence PNA-DNA duplexes have been studied. We have characterized the binding of PNA to DNA in terms of binding affinity (perfectly matched duplexes...... relative to that of the perfectly matched sequence with a corresponding free energy penalty of about 15 kJ mol(-1) bp(-1). The average cost of a single mismatch is therefore estimated to be on the order of or larger than the gain of two matched base pairs, resulting in an apparent binding constant of only...

  18. Reduced-stringency DNA reassociation: sequence specific duplex formation.

    OpenAIRE

    Burr, H E; Schimke, R T

    1982-01-01

    Reduced-stringency DNA reassociation conditions allow low stability duplexes to be detected in prokaryotic, plant, fish, avian, mammalian, and primate genomes. Highly diverged families of sequences can be detected in avian, mouse, and human unique sequence dNAs. Such a family has been described among twelve species of birds; based on species specific melting profiles and fractionation of sequences belonging to this family, it was concluded that permissive reassociation conditions did not arti...

  19. Is photocleavage of DNA by YOYO-1 using a synchrotron radiation light source sequence dependent?

    Science.gov (United States)

    Gilroy, Emma L; Hoffmann, Søren Vrønning; Jones, Nykola C; Rodger, Alison

    2011-10-01

    The photocleavage of double-stranded and single-stranded DNA by the fluorescent dye YOYO-1 was investigated in real time by using the synchrotron radiation light source ASTRID (ISA, Denmark) both to initiate the reaction and to monitor its progress using Couette flow linear dichroism (LD) throughout the irradiation period. The dependence of LD signals on DNA sequences and on time in the intense light beam was explored and quantified for single-stranded poly(dA), poly[(dA-dT)(2)], calf thymus DNA (ctDNA) and Micrococcus luteus DNA (mlDNA). The DNA and ligand regions of the spectrum showed different LD kinetic behaviors, and there was significant sequence dependence of the kinetics. However, in contrast to expectations from the literature, we found that poly(dA), mlDNA, low salt ctDNA and low salt poly[(dA-dT)(2)] all had significant populations of groove-bound YOYO. It seems that this mode was predominantly responsible for the catalysis of DNA cleavage. In homopolymeric DNAs, intercalated YOYO was unable to cleave DNA. In mixed-sequence DNAs the data suggest that YOYO in some but not all intercalated binding sites can cause cleavage. It is also likely that cleavage occurs at transient single-stranded regions. The reaction rates for a 100 mA beam current of 0.5-μW power varied from 0.6 h(-1) for single-stranded poly(dA) to essentially zero for low salt poly[(dG-dC)(2)] and high salt poly[(dA-dT)(2)]. At the conclusion of the experiments with each kind of DNA, uncleaved DNA with intercalated YOYO remained.

  20. Efficient depletion of host DNA contamination in malaria clinical sequencing.

    Science.gov (United States)

    Oyola, Samuel O; Gu, Yong; Manske, Magnus; Otto, Thomas D; O'Brien, John; Alcock, Daniel; Macinnis, Bronwyn; Berriman, Matthew; Newbold, Chris I; Kwiatkowski, Dominic P; Swerdlow, Harold P; Quail, Michael A

    2013-03-01

    The cost of whole-genome sequencing (WGS) is decreasing rapidly as next-generation sequencing technology continues to advance, and the prospect of making WGS available for public health applications is becoming a reality. So far, a number of studies have demonstrated the use of WGS as an epidemiological tool for typing and controlling outbreaks of microbial pathogens. Success of these applications is hugely dependent on efficient generation of clean genetic material that is free from host DNA contamination for rapid preparation of sequencing libraries. The presence of large amounts of host DNA severely affects the efficiency of characterizing pathogens using WGS and is therefore a serious impediment to clinical and epidemiological sequencing for health care and public health applications. We have developed a simple enzymatic treatment method that takes advantage of the methylation of human DNA to selectively deplete host contamination from clinical samples prior to sequencing. Using malaria clinical samples with over 80% human host DNA contamination, we show that the enzymatic treatment enriches Plasmodium falciparum DNA up to ∼9-fold and generates high-quality, nonbiased sequence reads covering >98% of 86,158 catalogued typeable single-nucleotide polymorphism loci.

  1. Directed PCR-free engineering of highly repetitive DNA sequences

    Directory of Open Access Journals (Sweden)

    Preissler Steffen

    2011-09-01

    Full Text Available Abstract Background Highly repetitive nucleotide sequences are commonly found in nature e.g. in telomeres, microsatellite DNA, polyadenine (poly(A tails of eukaryotic messenger RNA as well as in several inherited human disorders linked to trinucleotide repeat expansions in the genome. Therefore, studying repetitive sequences is of biological, biotechnological and medical relevance. However, cloning of such repetitive DNA sequences is challenging because specific PCR-based amplification is hampered by the lack of unique primer binding sites resulting in unspecific products. Results For the PCR-free generation of repetitive DNA sequences we used antiparallel oligonucleotides flanked by restriction sites of Type IIS endonucleases. The arrangement of recognition sites allowed for stepwise and seamless elongation of repetitive sequences. This facilitated the assembly of repetitive DNA segments and open reading frames encoding polypeptides with periodic amino acid sequences of any desired length. By this strategy we cloned a series of polyglutamine encoding sequences as well as highly repetitive polyadenine tracts. Such repetitive sequences can be used for diverse biotechnological applications. As an example, the polyglutamine sequences were expressed as His6-SUMO fusion proteins in Escherichia coli cells to study their aggregation behavior in vitro. The His6-SUMO moiety enabled affinity purification of the polyglutamine proteins, increased their solubility, and allowed controlled induction of the aggregation process. We successfully purified the fusions proteins and provide an example for their applicability in filter retardation assays. Conclusion Our seamless cloning strategy is PCR-free and allows the directed and efficient generation of highly repetitive DNA sequences of defined lengths by simple standard cloning procedures.

  2. Experimental and theoretical studies of sequence effects on the fluctuation and melting of short DNA molecules

    Energy Technology Data Exchange (ETDEWEB)

    Peyrard, M; Cuesta-Lopez, S [Universite de Lyon, Ecole Normale Superieure de Lyon, Laboratoire de Physique, CNRS UMR 5672, 46 allee d' Italie, F-69364 Lyon Cedex 07 (France); Angelov, D [Universite de Lyon, Ecole Normale Superieure de Lyon, Laboratoire de Biologie Moleculaire de la Cellule, CNRS UMR 5239, 46 allee d' Italie, F-69364 Lyon Cedex 07 (France)], E-mail: Michel.Peyrard@ens-lyon.fr

    2009-01-21

    Understanding the melting of short DNA sequences probes DNA at the scale of the genetic code and raises questions which are very different from those posed by very long sequences, which have been extensively studied. We investigate this problem by combining experiments and theory. A new experimental method allows us to make a mapping of the opening of the guanines along the sequence as a function of temperature. The results indicate that non-local effects may be important in DNA because an AT-rich region is able to influence the opening of a base pair which is about 10 base pairs away. An earlier mesoscopic model of DNA is modified to correctly describe the timescales associated with the opening of individual base pairs well below melting, and to properly take into account the sequence. Using this model to analyze some characteristic sequences for which detailed experimental data on the melting is available (Montrichok et al 2003 Europhys. Lett. 62 452), we show that we have to introduce non-local effects of AT-rich regions to get acceptable results. This brings a second indication that the influence of these highly fluctuating regions of DNA on their neighborhood can extend to some distance.

  3. Low fluorescence background electroblotting membrane for DNA sequencing.

    Science.gov (United States)

    Chu, T J; Caldwell, K D; Weiss, R B; Gesteland, R F; Pitt, W G

    1992-03-01

    A low fluorescence background polypropylene (PP) membrane has been developed for ultimate use as an electroblotting membrane in DNA sequencing based on fluorescence detection. The DNA binding capacity of this membrane is improved by a surface modification using radio frequency plasma discharge (RFPD) in ammonia gas. The RFPD operational parameters are evaluated both in terms of membrane nitrogen content and in terms of the product's capacity for binding radioisotope-labeled DNA fragments. The surface morphologies of the derivatized membranes are examined by scanning electron microscopy; their mechanical and electrical properties, which are important for the subsequent sequencing procedures, are likewise established. Due to the goal of developing a membrane suitable for multiplex processing, in which the electroblotted DNA must withstand dozens of hybridization/stripping cycles, special attention is given the covalent attachment of DNA to the membrane. The modified PP membrane is evaluated in a multiplex sequencing application using radioisotope-labeled DNA probes, and found to yield somewhat better binding of a given amount of electroblotted DNA than the commonly used GeneScreen membrane. A tenfold repetition of the probing indicates little loss of signal; the membrane-bound DNA is stable upon storage and shows no detectable loss in probing efficiency after one month.

  4. Bioinformatics analysis of circulating cell-free DNA sequencing data.

    Science.gov (United States)

    Chan, Landon L; Jiang, Peiyong

    2015-10-01

    The discovery of cell-free DNA molecules in plasma has opened up numerous opportunities in noninvasive diagnosis. Cell-free DNA molecules have become increasingly recognized as promising biomarkers for detection and management of many diseases. The advent of next generation sequencing has provided unprecedented opportunities to scrutinize the characteristics of cell-free DNA molecules in plasma in a genome-wide fashion and at single-base resolution. Consequently, clinical applications of circulating cell-free DNA analysis have not only revolutionized noninvasive prenatal diagnosis but also facilitated cancer detection and monitoring toward an era of blood-based personalized medicine. With the remarkably increasing throughput and lowering cost of next generation sequencing, bioinformatics analysis becomes increasingly demanding to understand the large amount of data generated by these sequencing platforms. In this Review, we highlight the major bioinformatics algorithms involved in the analysis of cell-free DNA sequencing data. Firstly, we briefly describe the biological properties of these molecules and provide an overview of the general bioinformatics approach for the analysis of cell-free DNA. Then, we discuss the specific upstream bioinformatics considerations concerning the analysis of sequencing data of circulating cell-free DNA, followed by further detailed elaboration on each key clinical situation in noninvasive prenatal diagnosis and cancer management where downstream bioinformatics analysis is heavily involved. We also discuss bioinformatics analysis as well as clinical applications of the newly developed massively parallel bisulfite sequencing of cell-free DNA. Finally, we offer our perspectives on the future development of bioinformatics in noninvasive diagnosis. Copyright © 2015 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

  5. Identification of repeats in DNA sequences using nucleotide distribution uniformity.

    Science.gov (United States)

    Yin, Changchuan

    2017-01-07

    Repetitive elements are important in genomic structures, functions and regulations, yet effective methods in precisely identifying repetitive elements in DNA sequences are not fully accessible, and the relationship between repetitive elements and periodicities of genomes is not clearly understood. We present an ab initio method to quantitatively detect repetitive elements and infer the consensus repeat pattern in repetitive elements. The method uses the measure of the distribution uniformity of nucleotides at periodic positions in DNA sequences or genomes. It can identify periodicities, consensus repeat patterns, copy numbers and perfect levels of repetitive elements. The results of using the method on different DNA sequences and genomes demonstrate efficacy and accuracy in identifying repeat patterns and periodicities. The complexity of the method is linear with respect to the lengths of the analyzed sequences. The Python programs in this study are freely available to the public upon request or at https://github.com/cyinbox/DNADU. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. Properties of CENP-B and its target sequence in a satellite DNA

    Energy Technology Data Exchange (ETDEWEB)

    Masumoto, H.; Yoda, K.; Ikeno, M.; Kitagawa, K.; Muro, Y.; Okazaki, T. [Nagoya Univ. (Japan)

    1993-12-31

    The centromere plays an essential role in the proper segregation of eukaryotic chromosomes at mitosis and meiosis. The centromere is the multifunctional domain of chromosome responsible for sister chromatid association at the inner site and for microtubule attachment at the outer surface. It also acts as a mechanochemical motor for chromosome movement. These multiple centromere functions must, in some way, be directed by a cis-acting DNA sequence located in the centromere region. Indeed, specific centromere DNA sequences (CEN-DNA) were identified in two yeast species. In Saccharomyces cerevisiae, CEN-DNA consists of roughly 125 bp sequence composed of three conserved elements. In contrast, the centromere sequence of S. pombe is quite different from S. cerevisiae in length and sequence organization. The molecular bases for understanding the structure and function of the centromere/kinetochore domain have not been elucidated in higher eukaryotes. In mammalian cells, satellite DNA`s are localized in the centromeric heterochromatin or heterochromatic arm. In all human chromosomes, the alpha satellite or alphoid DNA family, a highly repetitive DNA composed of about 170 bp fundamental monomer repeating units, is found at the primary constriction. Its function, however, has not been established.

  7. Palindromic sequence artifacts generated during next generation sequencing library preparation from historic and ancient DNA.

    Directory of Open Access Journals (Sweden)

    Bastiaan Star

    Full Text Available Degradation-specific processes and variation in laboratory protocols can bias the DNA sequence composition from samples of ancient or historic origin. Here, we identify a novel artifact in sequences from historic samples of Atlantic cod (Gadus morhua, which forms interrupted palindromes consisting of reverse complementary sequence at the 5' and 3'-ends of sequencing reads. The palindromic sequences themselves have specific properties - the bases at the 5'-end align well to the reference genome, whereas extensive misalignments exists among the bases at the terminal 3'-end. The terminal 3' bases are artificial extensions likely caused by the occurrence of hairpin loops in single stranded DNA (ssDNA, which can be ligated and amplified in particular library creation protocols. We propose that such hairpin loops allow the inclusion of erroneous nucleotides, specifically at the 3'-end of DNA strands, with the 5'-end of the same strand providing the template. We also find these palindromes in previously published ancient DNA (aDNA datasets, albeit at varying and substantially lower frequencies. This artifact can negatively affect the yield of endogenous DNA in these types of samples and introduces sequence bias.

  8. Isolation and enrichment of Cryptosporidium DNA and verification of DNA purity for whole-genome sequencing.

    Science.gov (United States)

    Guo, Yaqiong; Li, Na; Lysén, Colleen; Frace, Michael; Tang, Kevin; Sammons, Scott; Roellig, Dawn M; Feng, Yaoyu; Xiao, Lihua

    2015-02-01

    Whole-genome sequencing of Cryptosporidium spp. is hampered by difficulties in obtaining sufficient, highly pure genomic DNA from clinical specimens. In this study, we developed procedures for the isolation and enrichment of Cryptosporidium genomic DNA from fecal specimens and verification of DNA purity for whole-genome sequencing. The isolation and enrichment of genomic DNA were achieved by a combination of three oocyst purification steps and whole-genome amplification (WGA) of DNA from purified oocysts. Quantitative PCR (qPCR) analysis of WGA products was used as an initial quality assessment of amplified genomic DNA. The purity of WGA products was assessed by Sanger sequencing of cloned products. Next-generation sequencing tools were used in final evaluations of genome coverage and of the extent of contamination. Altogether, 24 fecal specimens of Cryptosporidium parvum, C. hominis, C. andersoni, C. ubiquitum, C. tyzzeri, and Cryptosporidium chipmunk genotype I were processed with the procedures. As expected, WGA products with low (sequences in Sanger sequencing. The cloning-sequencing analysis, however, showed significant contamination in 5 WGA products (proportion of positive colonies derived from Cryptosporidium genomic DNA, ≤25%). Following this strategy, 20 WGA products from six Cryptosporidium species or genotypes with low (mostly sequencing, generating sequence data covering 94.5% to 99.7% of Cryptosporidium genomes, with mostly minor contamination from bacterial, fungal, and host DNA. These results suggest that the described strategy can be used effectively for the isolation and enrichment of Cryptosporidium DNA from fecal specimens for whole-genome sequencing. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  9. Structural biology of disease-associated repetitive DNA sequences and protein-DNA complexes involved in DNA damage and repair

    Energy Technology Data Exchange (ETDEWEB)

    Gupta, G.; Santhana Mariappan, S.V.; Chen, X.; Catasti, P.; Silks, L.A. III; Moyzis, R.K.; Bradbury, E.M.; Garcia, A.E.

    1997-07-01

    This project is aimed at formulating the sequence-structure-function correlations of various microsatellites in the human (and other eukaryotic) genomes. Here the authors have been able to develop and apply structure biology tools to understand the following: the molecular mechanism of length polymorphism microsatellites; the molecular mechanism by which the microsatellites in the noncoding regions alter the regulation of the associated gene; and finally, the molecular mechanism by which the expansion of these microsatellites impairs gene expression and causes the disease. Their multidisciplinary structural biology approach is quantitative and can be applied to all coding and noncoding DNA sequences associated with any gene. Both NIH and DOE are interested in developing quantitative tools for understanding the function of various human genes for prevention against diseases caused by genetic and environmental effects.

  10. Mitochondrial DNA sequence diversity in a sedentary population from Egypt.

    Science.gov (United States)

    Stevanovitch, A; Gilles, A; Bouzaid, E; Kefi, R; Paris, F; Gayraud, R P; Spadoni, J L; El-Chenawi, F; Béraud-Colomb, E

    2004-01-01

    The mitochondrial DNA (mtDNA) diversity of 58 individuals from Upper Egypt, more than half (34 individuals) from Gurna, whose population has an ancient cultural history, were studied by sequencing the control-region and screening diagnostic RFLP markers. This sedentary population presented similarities to the Ethiopian population by the L1 and L2 macrohaplogroup frequency (20.6%), by the West Eurasian component (defined by haplogroups H to K and T to X) and particularly by a high frequency (17.6%) of haplogroup M1. We statistically and phylogenetically analysed and compared the Gurna population with other Egyptian, Near East and sub-Saharan Africa populations; AMOVA and Minimum Spanning Network analysis showed that the Gurna population was not isolated from neighbouring populations. Our results suggest that the Gurna population has conserved the trace of an ancestral genetic structure from an ancestral East African population, characterized by a high M1 haplogroup frequency. The current structure of the Egyptian population may be the result of further influence of neighbouring populations on this ancestral population.

  11. Genomic DNA sequence and cytosine methylation changes of adult rice leaves after seeds space flight

    Science.gov (United States)

    Shi, Jinming

    In this study, cytosine methylation on CCGG site and genomic DNA sequence changes of adult leaves of rice after seeds space flight were detected by methylation-sensitive amplification polymorphism (MSAP) and Amplified fragment length polymorphism (AFLP) technique respectively. Rice seeds were planted in the trial field after 4 days space flight on the shenzhou-6 Spaceship of China. Adult leaves of space-treated rice including 8 plants chosen randomly and 2 plants with phenotypic mutation were used for AFLP and MSAP analysis. Polymorphism of both DNA sequence and cytosine methylation were detected. For MSAP analysis, the average polymorphic frequency of the on-ground controls, space-treated plants and mutants are 1.3%, 3.1% and 11% respectively. For AFLP analysis, the average polymorphic frequencies are 1.4%, 2.9%and 8%respectively. Total 27 and 22 polymorphic fragments were cloned sequenced from MSAP and AFLP analysis respectively. Nine of the 27 fragments from MSAP analysis show homology to coding sequence. For the 22 polymorphic fragments from AFLP analysis, no one shows homology to mRNA sequence and eight fragments show homology to repeat region or retrotransposon sequence. These results suggest that although both genomic DNA sequence and cytosine methylation status can be effected by space flight, the genomic region homology to the fragments from genome DNA and cytosine methylation analysis were different.

  12. Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform.

    Science.gov (United States)

    Shokralla, Shadi; Porter, Teresita M; Gibson, Joel F; Dobosz, Rafal; Janzen, Daniel H; Hallwachs, Winnie; Golding, G Brian; Hajibabaei, Mehrdad

    2015-04-17

    Genetic information is a valuable component of biosystematics, especially specimen identification through the use of species-specific DNA barcodes. Although many genomics applications have shifted to High-Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies, sample identification (e.g., via DNA barcoding) is still most often done with Sanger sequencing. Here, we present a scalable double dual-indexing approach using an Illumina Miseq platform to sequence DNA barcode markers. We achieved 97.3% success by using half of an Illumina Miseq flowcell to obtain 658 base pairs of the cytochrome c oxidase I DNA barcode in 1,010 specimens from eleven orders of arthropods. Our approach recovers a greater proportion of DNA barcode sequences from individuals than does conventional Sanger sequencing, while at the same time reducing both per specimen costs and labor time by nearly 80%. In addition, the use of HTS allows the recovery of multiple sequences per specimen, for deeper analysis of genetic variation in target gene regions.

  13. Automated tools for comparative sequence analysis of genic regions using the GenePalette application.

    Science.gov (United States)

    Smith, Andrew F; Posakony, James W; Rebeiz, Mark

    2017-09-01

    Comparative sequence analysis methods, such as phylogenetic footprinting, represent one of the most effective ways to decode regulatory sequence functions based upon DNA sequence information alone. The laborious task of assembling orthologous sequences to perform these comparisons is a hurdle to these analyses, which is further aggravated by the relative paucity of tools for visualization of sequence comparisons in large genic regions. Here, we describe a second-generation implementation of the GenePalette DNA sequence analysis software to facilitate comparative studies of gene function and regulation. We have developed an automated module called OrthologGrabber (OG) that performs BLAT searches against the UC Santa Cruz genome database to identify and retrieve segments homologous to a region of interest. Upon acquisition, sequences are compared to identify high-confidence anchor-points, which are graphically displayed. The visualization of anchor-points alongside other DNA features, such as transcription factor binding sites, allows users to precisely examine whether a binding site of interest is conserved, even if the surrounding region exhibits poor sequence identity. This approach also aids in identifying orthologous segments of regulatory DNA, facilitating studies of regulatory sequence evolution. As with previous versions of the software, GenePalette 2.1 takes the form of a platform-independent, single-windowed interface that is simple to use. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Probabilistic models for semisupervised discriminative motif discovery in DNA sequences.

    Science.gov (United States)

    Kim, Jong Kyoung; Choi, Seungjin

    2011-01-01

    Methods for discriminative motif discovery in DNA sequences identify transcription factor binding sites (TFBSs), searching only for patterns that differentiate two sets (positive and negative sets) of sequences. On one hand, discriminative methods increase the sensitivity and specificity of motif discovery, compared to generative models. On the other hand, generative models can easily exploit unlabeled sequences to better detect functional motifs when labeled training samples are limited. In this paper, we develop a hybrid generative/discriminative model which enables us to make use of unlabeled sequences in the framework of discriminative motif discovery, leading to semisupervised discriminative motif discovery. Numerical experiments on yeast ChIP-chip data for discovering DNA motifs demonstrate that the best performance is obtained between the purely-generative and the purely-discriminative and the semisupervised learning improves the performance when labeled sequences are limited.

  15. High-throughput sequencing in mitochondrial DNA research.

    Science.gov (United States)

    Ye, Fei; Samuels, David C; Clark, Travis; Guo, Yan

    2014-07-01

    Next-generation sequencing, also known as high-throughput sequencing, has greatly enhanced researchers' ability to conduct biomedical research on all levels. Mitochondrial research has also benefitted greatly from high-throughput sequencing; sequencing technology now allows for screening of all 16,569 base pairs of the mitochondrial genome simultaneously for SNPs and low level heteroplasmy and, in some cases, the estimation of mitochondrial DNA copy number. It is important to realize the full potential of high-throughput sequencing for the advancement of mitochondrial research. To this end, we review how high-throughput sequencing has impacted mitochondrial research in the categories of SNPs, low level heteroplasmy, copy number, and structural variants. We also discuss the different types of mitochondrial DNA sequencing and their pros and cons. Based on previous studies conducted by various groups, we provide strategies for processing mitochondrial DNA sequencing data, including assembly, variant calling, and quality control. Copyright © 2014 Elsevier B.V. and Mitochondria Research Society. All rights reserved.

  16. [PCR, clone and sequence analysis of rDNA-ITS of Nelumbo nucifera from different geographical origins in China].

    Science.gov (United States)

    Lin, Shan; Zheng, Wei-wen; Wu, Jin-zhong; Zhou, Li-juan; Song, Ya-na

    2007-04-01

    To provide DNA molecular marker for identification of Nelumbo nucifera by exploring the differences of nrDNA-ITS sequence of N. nucifera originated from different habitats. To compare nrDNA-ITS base sequence using specific PCR-ITS. The completed sequence of ITS and 5.8 S rDNA, and the partial sequences of 18S rDNA and 26S rDNA, totally 750 bp, from N. nucifera were obtained. The differences among N. nucifera from different habitats and from different cultivars were found. The method can be used to identify N. nucifera among different species and to distinguish their fakes. It provided the basis for identifying N. nucifera from different geographical regions by comparison of their ITS sequences.

  17. Predicting target DNA sequences of DNA-binding proteins based on unbound structures.

    Directory of Open Access Journals (Sweden)

    Chien-Yu Chen

    Full Text Available DNA-binding proteins such as transcription factors use DNA-binding domains (DBDs to bind to specific sequences in the genome to initiate many important biological functions. Accurate prediction of such target sequences, often represented by position weight matrices (PWMs, is an important step to understand many biological processes. Recent studies have shown that knowledge-based potential functions can be applied on protein-DNA co-crystallized structures to generate PWMs that are considerably consistent with experimental data. However, this success has not been extended to DNA-binding proteins lacking co-crystallized structures. This study aims at investigating the possibility of predicting the DNA sequences bound by DNA-binding proteins from the proteins' unbound structures (structures of the unbound state. Given an unbound query protein and a template complex, the proposed method first employs structure alignment to generate synthetic protein-DNA complexes for the query protein. Once a complex is available, an atomic-level knowledge-based potential function is employed to predict PWMs characterizing the sequences to which the query protein can bind. The evaluation of the proposed method is based on seven DNA-binding proteins, which have structures of both DNA-bound and unbound forms for prediction as well as annotated PWMs for validation. Since this work is the first attempt to predict target sequences of DNA-binding proteins from their unbound structures, three types of structural variations that presumably influence the prediction accuracy were examined and discussed. Based on the analyses conducted in this study, the conformational change of proteins upon binding DNA was shown to be the key factor. This study sheds light on the challenge of predicting the target DNA sequences of a protein lacking co-crystallized structures, which encourages more efforts on the structure alignment-based approaches in addition to docking- and homology

  18. How effective is graphene nanopore geometry on DNA sequencing?

    CERN Document Server

    Satarifard, Vahid; Ejtehadi, Mohammad Reza

    2015-01-01

    In this paper we investigate the effects of graphene nanopore geometry on homopolymer ssDNA pulling process through nanopore using steered molecular dynamic (SMD) simulations. Different graphene nanopores are examined including axially symmetric and asymmetric monolayer graphene nanopores as well as five layer graphene polyhedral crystals (GPC). The pulling force profile, moving fashion of ssDNA, work done in irreversible DNA pulling and orientations of DNA bases near the nanopore are assessed. Simulation results demonstrate the strong effect of the pore shape as well as geometrical symmetry on free energy barrier, orientations and dynamic of DNA translocation through graphene nanopore. Our study proposes that the symmetric circular geometry of monolayer graphene nanopore with high pulling velocity can be used for DNA sequencing.

  19. Development of Active DNA Control Technique for DNA Sequencer With a Solid-state Nanopore

    Science.gov (United States)

    Akahori, Rena; Harada, Kunio; Goto, Yusuke; Yanagi, Itaru; Yokoi, Takahide; Oura, Takeshi; Shibahara, Masashi; Takeda, Ken-Ichi

    We have developed a technique that can control the arbitrary speeds of DNA passing through a solid-state nanopore of a DNA sequencer. For this active DNA control technique, we used a DNA-immobilized Si probe, larger than the membrane with a nanopore, and used a piezoelectric actuator and stepper motor to drive the probe. This probe enables a user to adjust the relative position between the nanopore and DNA immobilized on the probe without the need for precise lateral control. In this presentation, we demonstrate how DNA (block copolymer ([(dT)25-(dC)25-(dA)50]m)), immobilized on the probe, slid through a nanopore and was pulled out using the active DNA control technique. As the DNA-immobilized probe was being pulled out, we obtained various ion-current signal levels corresponding to the number of different nucleotides in a single strand of DNA.

  20. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    Directory of Open Access Journals (Sweden)

    Baldwin Stephen A

    2011-03-01

    Full Text Available Abstract Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  1. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities.

    Science.gov (United States)

    Troshin, Peter V; Postis, Vincent Lg; Ashworth, Denise; Baldwin, Stephen A; McPherson, Michael J; Barton, Geoffrey J

    2011-03-07

    Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  2. VoSeq: a voucher and DNA sequence web application.

    Science.gov (United States)

    Peña, Carlos; Malm, Tobias

    2012-01-01

    There is an ever growing number of molecular phylogenetic studies published, due to, in part, the advent of new techniques that allow cheap and quick DNA sequencing. Hence, the demand for relational databases with which to manage and annotate the amassing DNA sequences, genes, voucher specimens and associated biological data is increasing. In addition, a user-friendly interface is necessary for easy integration and management of the data stored in the database back-end. Available databases allow management of a wide variety of biological data. However, most database systems are not specifically constructed with the aim of being an organizational tool for researchers working in phylogenetic inference. We here report a new software facilitating easy management of voucher and sequence data, consisting of a relational database as back-end for a graphic user interface accessed via a web browser. The application, VoSeq, includes tools for creating molecular datasets of DNA or amino acid sequences ready to be used in commonly used phylogenetic software such as RAxML, TNT, MrBayes and PAUP, as well as for creating tables ready for publishing. It also has inbuilt BLAST capabilities against all DNA sequences stored in VoSeq as well as sequences in NCBI GenBank. By using mash-ups and calls to web services, VoSeq allows easy integration with public services such as Yahoo! Maps, Flickr, Encyclopedia of Life (EOL) and GBIF (by generating data-dumps that can be processed with GBIF's Integrated Publishing Toolkit).

  3. Analysis of human accelerated DNA regions using archaic hominin genomes.

    Directory of Open Access Journals (Sweden)

    Hernán A Burbano

    Full Text Available Several previous comparisons of the human genome with other primate and vertebrate genomes identified genomic regions that are highly conserved in vertebrate evolution but fast-evolving on the human lineage. These human accelerated regions (HARs may be regions of past adaptive evolution in humans. Alternatively, they may be the result of non-adaptive processes, such as biased gene conversion. We captured and sequenced DNA from a collection of previously published HARs using DNA from an Iberian Neandertal. Combining these new data with shotgun sequence from the Neandertal and Denisova draft genomes, we determine at least one archaic hominin allele for 84% of all positions within HARs. We find that 8% of HAR substitutions are not observed in the archaic hominins and are thus recent in the sense that the derived allele had not come to fixation in the common ancestor of modern humans and archaic hominins. Further, we find that recent substitutions in HARs tend to have come to fixation faster than substitutions elsewhere in the genome and that substitutions in HARs tend to cluster in time, consistent with an episodic rather than a clock-like process underlying HAR evolution. Our catalog of sequence changes in HARs will help prioritize them for functional studies of genomic elements potentially responsible for modern human adaptations.

  4. Analysis of Human Accelerated DNA Regions Using Archaic Hominin Genomes

    Science.gov (United States)

    Burbano, Hernán A.; Green, Richard E.; Maricic, Tomislav; Lalueza-Fox, Carles; de la Rasilla, Marco; Rosas, Antonio; Kelso, Janet; Pollard, Katherine S.; Lachmann, Michael; Pääbo, Svante

    2012-01-01

    Several previous comparisons of the human genome with other primate and vertebrate genomes identified genomic regions that are highly conserved in vertebrate evolution but fast-evolving on the human lineage. These human accelerated regions (HARs) may be regions of past adaptive evolution in humans. Alternatively, they may be the result of non-adaptive processes, such as biased gene conversion. We captured and sequenced DNA from a collection of previously published HARs using DNA from an Iberian Neandertal. Combining these new data with shotgun sequence from the Neandertal and Denisova draft genomes, we determine at least one archaic hominin allele for 84% of all positions within HARs. We find that 8% of HAR substitutions are not observed in the archaic hominins and are thus recent in the sense that the derived allele had not come to fixation in the common ancestor of modern humans and archaic hominins. Further, we find that recent substitutions in HARs tend to have come to fixation faster than substitutions elsewhere in the genome and that substitutions in HARs tend to cluster in time, consistent with an episodic rather than a clock-like process underlying HAR evolution. Our catalog of sequence changes in HARs will help prioritize them for functional studies of genomic elements potentially responsible for modern human adaptations. PMID:22412940

  5. Methylome-wide Sequencing Detects DNA Hypermethylation Distinguishing Indolent from Aggressive Prostate Cancer

    Directory of Open Access Journals (Sweden)

    Jeffrey M. Bhasin

    2015-12-01

    Full Text Available A critical need in understanding the biology of prostate cancer is characterizing the molecular differences between indolent and aggressive cases. Because DNA methylation can capture the regulatory state of tumors, we analyzed differential methylation patterns genome-wide among benign prostatic tissue and low-grade and high-grade prostate cancer and found extensive, focal hypermethylation regions unique to high-grade disease. These hypermethylation regions occurred not only in the promoters of genes but also in gene bodies and at intergenic regions that are enriched for DNA-protein binding sites. Integration with existing RNA-sequencing (RNA-seq and survival data revealed regions where DNA methylation correlates with reduced gene expression associated with poor outcome. Regions specific to aggressive disease are proximal to genes with distinct functions from regions shared by indolent and aggressive disease. Our compendium of methylation changes reveals crucial molecular distinctions between indolent and aggressive prostate cancer.

  6. Comparison of sequencing (barcode region) and sequence-tagged-site PCR for Blastocystis subtyping.

    Science.gov (United States)

    Stensvold, Christen Rune

    2013-01-01

    Blastocystis is the most common nonfungal microeukaryote of the human intestinal tract and comprises numerous subtypes (STs), nine of which have been found in humans (ST1 to ST9). While efforts continue to explore the relationship between human health status and subtypes, no consensus regarding subtyping methodology exists. It has been speculated that differences detected in subtype distribution in various cohorts may to some extent reflect different approaches. Blastocystis subtypes have been determined primarily in one of two ways: (i) sequencing of small subunit rRNA gene (SSU-rDNA) PCR products and (ii) PCR with subtype-specific sequence-tagged-site (STS) diagnostic primers. Here, STS primers were evaluated against a panel of samples (n = 58) already subtyped by SSU-rDNA sequencing (barcode region), including subtypes for which STS primers are not available, and a small panel of DNAs from four other eukaryotes often present in feces (n = 18). Although the STS primers appeared to be highly specific, their sensitivity was only moderate, and the results indicated that some infections may go undetected when this method is used. False-negative STS results were not linked exclusively to certain subtypes or alleles, and evidence of substantial genetic variation in STS loci was obtained. Since the majority of DNAs included here were extracted from feces, it is possible that STS primers may generally work better with DNAs extracted from Blastocystis cultures. In conclusion, due to its higher applicability and sensitivity, and since sequence information is useful for other forms of research, SSU-rDNA barcoding is recommended as the method of choice for Blastocystis subtyping.

  7. Dialects of the DNA uptake sequence in Neisseriaceae.

    Directory of Open Access Journals (Sweden)

    Stephan A Frye

    2013-04-01

    Full Text Available In all sexual organisms, adaptations exist that secure the safe reassortment of homologous alleles and prevent the intrusion of potentially hazardous alien DNA. Some bacteria engage in a simple form of sex known as transformation. In the human pathogen Neisseria meningitidis and in related bacterial species, transformation by exogenous DNA is regulated by the presence of a specific DNA Uptake Sequence (DUS, which is present in thousands of copies in the respective genomes. DUS affects transformation by limiting DNA uptake and recombination in favour of homologous DNA. The specific mechanisms of DUS-dependent genetic transformation have remained elusive. Bioinformatic analyses of family Neisseriaceae genomes reveal eight distinct variants of DUS. These variants are here termed DUS dialects, and their effect on interspecies commutation is demonstrated. Each of the DUS dialects is remarkably conserved within each species and is distributed consistent with a robust Neisseriaceae phylogeny based on core genome sequences. The impact of individual single nucleotide transversions in DUS on meningococcal transformation and on DNA binding and uptake is analysed. The results show that a DUS core 5'-CTG-3' is required for transformation and that transversions in this core reduce DNA uptake more than two orders of magnitude although the level of DNA binding remains less affected. Distinct DUS dialects are efficient barriers to interspecies recombination in N. meningitidis, N. elongata, Kingella denitrificans, and Eikenella corrodens, despite the presence of the core sequence. The degree of similarity between the DUS dialect of the recipient species and the donor DNA directly correlates with the level of transformation and DNA binding and uptake. Finally, DUS-dependent transformation is documented in the genera Eikenella and Kingella for the first time. The results presented here advance our understanding of the function and evolution of DUS and genetic

  8. Dialects of the DNA Uptake Sequence in Neisseriaceae

    Science.gov (United States)

    Frye, Stephan A.; Nilsen, Mariann; Tønjum, Tone; Ambur, Ole Herman

    2013-01-01

    In all sexual organisms, adaptations exist that secure the safe reassortment of homologous alleles and prevent the intrusion of potentially hazardous alien DNA. Some bacteria engage in a simple form of sex known as transformation. In the human pathogen Neisseria meningitidis and in related bacterial species, transformation by exogenous DNA is regulated by the presence of a specific DNA Uptake Sequence (DUS), which is present in thousands of copies in the respective genomes. DUS affects transformation by limiting DNA uptake and recombination in favour of homologous DNA. The specific mechanisms of DUS–dependent genetic transformation have remained elusive. Bioinformatic analyses of family Neisseriaceae genomes reveal eight distinct variants of DUS. These variants are here termed DUS dialects, and their effect on interspecies commutation is demonstrated. Each of the DUS dialects is remarkably conserved within each species and is distributed consistent with a robust Neisseriaceae phylogeny based on core genome sequences. The impact of individual single nucleotide transversions in DUS on meningococcal transformation and on DNA binding and uptake is analysed. The results show that a DUS core 5′-CTG-3′ is required for transformation and that transversions in this core reduce DNA uptake more than two orders of magnitude although the level of DNA binding remains less affected. Distinct DUS dialects are efficient barriers to interspecies recombination in N. meningitidis, N. elongata, Kingella denitrificans, and Eikenella corrodens, despite the presence of the core sequence. The degree of similarity between the DUS dialect of the recipient species and the donor DNA directly correlates with the level of transformation and DNA binding and uptake. Finally, DUS–dependent transformation is documented in the genera Eikenella and Kingella for the first time. The results presented here advance our understanding of the function and evolution of DUS and genetic transformation

  9. Statistical assignment of DNA sequences using Bayesian phylogenetics

    DEFF Research Database (Denmark)

    Terkelsen, Kasper Munch; Boomsma, Wouter Krogh; Huelsenbeck, John P.

    2008-01-01

    We provide a new automated statistical method for DNA barcoding based on a Bayesian phylogenetic analysis. The method is based on automated database sequence retrieval, alignment, and phylogenetic analysis using a custom-built program for Bayesian phylogenetic analysis. We show on real data...... that the method outperforms Blast searches as a measure of confidence and can help eliminate 80% of all false assignment based on best Blast hit. However, the most important advance of the method is that it provides statistically meaningful measures of confidence. We apply the method to a re......-analysis of previously published ancient DNA data and show that, with high statistical confidence, most of the published sequences are in fact of Neanderthal origin. However, there are several cases of chimeric sequences that are comprised of a combination of both Neanderthal and modern human DNA....

  10. Multiplexed DNA sequence capture of mitochondrial genomes using PCR products.

    Directory of Open Access Journals (Sweden)

    Tomislav Maricic

    Full Text Available BACKGROUND: To utilize the power of high-throughput sequencers, target enrichment methods have been developed. The majority of these require reagents and equipment that are only available from commercial vendors and are not suitable for the targets that are a few kilobases in length. METHODOLOGY/PRINCIPAL FINDINGS: We describe a novel and economical method in which custom made long-range PCR products are used to capture complete human mitochondrial genomes from complex DNA mixtures. We use the method to capture 46 complete mitochondrial genomes in parallel and we sequence them on a single lane of an Illumina GA(II instrument. CONCLUSIONS/SIGNIFICANCE: This method is economical and simple and particularly suitable for targets that can be amplified by PCR and do not contain highly repetitive sequences such as mtDNA. It has applications in population genetics and forensics, as well as studies of ancient DNA.

  11. Noninvasive prenatal paternity testing (NIPAT) through maternal plasma DNA sequencing

    DEFF Research Database (Denmark)

    Jiang, Haojun; Xie, Yifan; Li, Xuchao

    2016-01-01

    Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) have been already used to perform noninvasive prenatal paternity testing from maternal plasma DNA. The frequently used technologies were PCR followed by capillary electrophoresis and SNP typing array, respectively. Here, we...... developed a noninvasive prenatal paternity testing (NIPAT) based on SNP typing with maternal plasma DNA sequencing. We evaluated the influence factors (minor allele frequency (MAF), the number of total SNP, fetal fraction and effective sequencing depth) and designed three different selective SNP panels...... paternity test using STR multiplex system. Our study here proved that the maternal plasma DNA sequencing-based technology is feasible and accurate in determining paternity, which may provide an alternative in forensic application in the future....

  12. Identification of Chinese Herbs Using a Sequencing-Free Nanostructured Electrochemical DNA Biosensor

    Directory of Open Access Journals (Sweden)

    Yan Lei

    2015-11-01

    Full Text Available Due to the nearly identical phenotypes and chemical constituents, it is often very challenging to accurately differentiate diverse species of a Chinese herbal genus. Although technologies including DNA barcoding have been introduced to help address this problem, they are generally time-consuming and require expensive sequencing. Herein, we present a simple sequencing-free electrochemical biosensor, which enables easy differentiation between two closely related Fritillaria species. To improve its differentiation capability using trace amounts of DNA sample available from herbal extracts, a stepwise electrochemical deposition of reduced graphene oxide (RGO and gold nanoparticles (AuNPs was adopted to engineer a synergistic nanostructured sensing interface. By using such a nanofeatured electrochemical DNA (E-DNA biosensor, two Chinese herbal species of Fritillaria (F. thunbergii and F. cirrhosa were successfully discriminated at the DNA level, because a fragment of 16-mer sequence at the spacer region of the 5S-rRNA only exists in F. thunbergii. This E-DNA sensor was capable of identifying the target sequence in the range from 100 fM to 10 nM, and a detection limit as low as 11.7 fM (S/N = 3 was obtained. Importantly, this sensor was applied to detect the unique fragment of the PCR products amplified from F. thunbergii and F. cirrhosa, respectively. We anticipate that such a direct, sequencing-free sensing mode will ultimately pave the way towards a new generation of herb-identification strategies.

  13. Clinical DNA Sequencer for Ultra-Low Cost Testing

    Science.gov (United States)

    Church, George; Olejnik, Jerzy; Werner, Martina; Guggenheim, Evan; DiMeo, James; Marma, Mong Sano; Visalakshi, Visa; Hagerott, Thomas; Golaski, Edmund; Veatch, Philip; Stoops, David; Gordon, Steven

    2012-01-01

    We present a new sequencing instrument, the MINI, for sequencing DNA in the clinic or core research laboratory. Unlike all other DNA sequencing systems, which run only one or two samples at a time, the MINI can simultaneously run any number of flow cells between one and twenty. Each flow cell is designed to be disposable, low-cost and use very little reagent; thus, DNA from a single patient or specimen may be cost effectively sequenced without the need for indexing multiple samples in a single flow cell. This is an important feature for the clinic, as in addition to simplifying the sample preparation process, different sample may be kept physically separate (meters) from one another, thereby significantly reducing the chance of contamination or false diagnosis. Low cost (about $100 per sequencing test) is achieved through a unique sequencing by synthesis chemistry and low reagent consumption. Parallel flow cell processing and fluidics design results in high throughput (tens of tests per day). In addition to sequence-based clinical testing, the system supports targeted resequencing up to an exome per flow cell. Read lengths are driven by application requirements and are between 35-100 bp.

  14. Genome DNA Sequence Variation, Evolution, and Function in Bacteria and Archaea.

    Science.gov (United States)

    Nishida, Hiromi

    2013-01-01

    Comparative genomics has revealed that variations in bacterial and archaeal genome DNA sequences cannot be explained by only neutral mutations. Virus resistance and plasmid distribution systems have resulted in changes in bacterial and archaeal genome sequences during evolution. The restriction-modification system, a virus resistance system, leads to avoidance of palindromic DNA sequences in genomes. Clustered, regularly interspaced, short palindromic repeats (CRISPRs) found in genomes represent yet another virus resistance system. Comparative genomics has shown that bacteria and archaea have failed to gain any DNA with GC content higher than the GC content of their chromosomes. Thus, horizontally transferred DNA regions have lower GC content than the host chromosomal DNA does. Some nucleoid-associated proteins bind DNA regions with low GC content and inhibit the expression of genes contained in those regions. This form of gene repression is another type of virus resistance system. On the other hand, bacteria and archaea have used plasmids to gain additional genes. Virus resistance systems influence plasmid distribution. Interestingly, the restriction-modification system and nucleoid-associated protein genes have been distributed via plasmids. Thus, GC content and genomic signatures do not reflect bacterial and archaeal evolutionary relationships.

  15. Comment on "Linguistic features of noncoding DNA sequences"

    CERN Document Server

    Israeloff, N E; Chan, K; Israeloff, N E; Kagalenko, M; Chan, K

    1995-01-01

    In a recent Physical Review Letter, Mantegna et. al., report that certain statistical signatures of natural language can be found in non-coding DNA sequences. In this comment we show that random noise with power-law correlation similar to 1/f noise, exhibits the same "linguistic" signature as those found in non-coding DNA. We conclude that these signa- tures cannot distinguish languages from noise.

  16. Anaplasma phagocytophilum in Danish sheep: confirmation by DNA sequencing

    Directory of Open Access Journals (Sweden)

    Thamsborg Stig M

    2009-12-01

    Full Text Available Abstract Background The presence of Anaplasma phagocytophilum, an Ixodes ricinus transmitted bacterium, was investigated in two flocks of Danish grazing lambs. Direct PCR detection was performed on DNA extracted from blood and serum with subsequent confirmation by DNA sequencing. Methods 31 samples obtained from clinically normal lambs in 2000 from Fussingø, Jutland and 12 samples from ten lambs and two ewes from a clinical outbreak at Feddet, Zealand in 2006 were included in the study. Some of the animals from Feddet had shown clinical signs of polyarthritis and general unthriftiness prior to sampling. DNA extraction was optimized from blood and serum and detection achieved by a 16S rRNA targeted PCR with verification of the product by DNA sequencing. Results Five DNA extracts were found positive by PCR, including two samples from 2000 and three from 2006. For both series of samples the product was verified as A. phagocytophilum by DNA sequencing. Conclusions A. phagocytophilum was detected by molecular methods for the first time in Danish grazing lambs during the two seasons investigated (2000 and 2006.

  17. Identification of Bacterial Species in Kuwaiti Waters Through DNA Sequencing

    Science.gov (United States)

    Chen, K.

    2017-01-01

    With an objective of identifying the bacterial diversity associated with ecosystem of various Kuwaiti Seas, bacteria were cultured and isolated from 3 water samples. Due to the difficulties for cultured and isolated fecal coliforms on the selective agar plates, bacterial isolates from marine agar plates were selected for molecular identification. 16S rRNA genes were successfully amplified from the genome of the selected isolates using Universal Eubacterial 16S rRNA primers. The resulted amplification products were subjected to automated DNA sequencing. Partial 16S rDNA sequences obtained were compared directly with sequences in the NCBI database using BLAST as well as with the sequences available with Ribosomal Database Project (RDP).

  18. Performance Characteristics of the TRUGENE HIV-1 Genotyping Kit and the Opengene DNA Sequencing System

    OpenAIRE

    Kuritzkes, Daniel R.; Grant, Robert M.; Feorino, Paul; Griswold, Marshal; Hoover, Marie; Young, Russell; DAY, Stephen; Lloyd, Jr., Robert M.; Reid, Caroline; Morgan, Gillian F.; Winslow, Dean L.

    2003-01-01

    The TRUGENE HIV-1 Genotyping Kit and OpenGene DNA Sequencing System are designed to sequence the protease (PR)- and reverse transcriptase (RT)-coding regions of human immunodeficiency virus type 1 (HIV-1) pol. Studies were undertaken to determine the accuracy of this assay system in detecting resistance-associated mutations and to determine the effects of RNA extraction methods, anticoagulants, specimen handling, and potentially interfering substances. Samples were plasma obtained from HIV-in...

  19. Cladistic biogeography of Juglans (Juglandaceae) based on chloroplast DNA intergenic spacer sequences

    Science.gov (United States)

    The phylogenetic utility of sequence variation from five chloroplast DNA intergenic spacer (IGS) regions: trnT-trnF, psbA-trnH, atpB-rbcL, trnV-16S rRNA, and trnS-trnfM was examined in the genus Juglans. A total of seventeen taxa representing the four sections within Juglans and an outgroup taxon, ...

  20. Base-pair resolution DNA methylation sequencing reveals profoundly divergent epigenetic landscapes in acute myeloid leukemia

    NARCIS (Netherlands)

    A. Akalin (Altuna); F.E. Garrett-Bakelman (Francine); M. Kormaksson (Matthias); J. Busuttil (Jennifer); L. Zhang (Lingling); I. Khrebtukova (Irina); T.A. Milne (Thomas); Y. Huang (Yongsheng); R.S. Biswas (Rajat); J.L. Hess (Jay); C.D. Allis (C. David); R.G. Roeder (Robert); P.J.M. Valk (Peter); B. Löwenberg (Bob); H.R. Delwel (Ruud); H.F. Fernandez (Hugo); E. Paietta (Elisabeth); M.S. Tallman (Martin); G.P. Schroth (Gary P); C.E. Mason (Christopher); A. Melnick (Ari); M.E. Figueroa (Maria Eugenia)

    2012-01-01

    textabstractWe have developed an enhanced form of reduced representation bisulfite sequencing with extended genomic coverage, which resulted in greater capture of DNA methylation information of regions lying outside of traditional CpG islands. Applying this method to primary human bone marrow

  1. Reduced representation bisulphite sequencing of the cattle genome reveals DNA methylation patterns

    Science.gov (United States)

    Using reduced representation bisulphite sequencing (RRBS), we obtained the first single-base-resolution maps of bovine DNA methylation in ten somatic tissues. In total, we observed 1,868,049 cytosines in the CG-enriched regions. Similar to the methylation patterns in other species, the CG context wa...

  2. Genetic selection and DNA sequences of 4.5S RNA homologs

    DEFF Research Database (Denmark)

    Brown, S; Thon, G; Tolentino, E

    1989-01-01

    A general strategy for cloning the functional homologs of an Escherichia coli gene was used to clone homologs of 4.5S RNA from other bacteria. The genes encoding these homologs were selected by their ability to complement a deletion of the gene for 4.5S RNA. DNA sequences of the regions encoding...

  3. Massively parallel DNA sequencing: the new frontier in biogeography

    Directory of Open Access Journals (Sweden)

    Luiz A. Rocha

    2013-04-01

    Full Text Available The advent of Sanger sequencing represented a scientific break-through that greatly advanced biogeographic studies. However, this technology has several limitations that have hampered more advanced studies in the field. The development of novel techniques which more fully exploit the potential of Massively Parallel Sequencing (MPS to deliver sequence data at a fraction of the cost of Sanger sequencing promises to revolutionize biogeographic studies. Approaches like Restriction-site Associated DNA sequencing (RADseq and UltraConserved Element (UCE sequencing enable the collection of unprecedented amounts of data for multi-locus studies of population genetics and phylogenetics respectively, which in turn can be used for biogeographic analysis. Here we review those and other methods related to MPS, and provide examples of how they can be used in tropical Atlantic biogeography.

  4. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    Science.gov (United States)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  5. Efficiency of ITS Sequences for DNA Barcoding in Passiflora (Passifloraceae

    Directory of Open Access Journals (Sweden)

    Giovanna Câmara Giudicelli

    2015-04-01

    Full Text Available DNA barcoding is a technique for discriminating and identifying species using short, variable, and standardized DNA regions. Here, we tested for the first time the performance of plastid and nuclear regions as DNA barcodes in Passiflora. This genus is a largely variable, with more than 900 species of high ecological, commercial, and ornamental importance. We analyzed 1034 accessions of 222 species representing the four subgenera of Passiflora and evaluated the effectiveness of five plastid regions and three nuclear datasets currently employed as DNA barcodes in plants using barcoding gap, applied similarity-, and tree-based methods. The plastid regions were able to identify less than 45% of species, whereas the nuclear datasets were efficient for more than 50% using “best match” and “best close match” methods of TaxonDNA software. All subgenera presented higher interspecific pairwise distances and did not fully overlap with the intraspecific distance, and similarity-based methods showed better results than tree-based methods. The nuclear ribosomal internal transcribed spacer 1 (ITS1 region presented a higher discrimination power than the other datasets and also showed other desirable characteristics as a DNA barcode for this genus. Therefore, we suggest that this region should be used as a starting point to identify Passiflora species.

  6. A new family of satellite DNA sequences as a major component of centromeric heterochromatin in owls (Strigiformes).

    Science.gov (United States)

    Yamada, Kazuhiko; Nishida-Umehara, Chizuko; Matsuda, Yoichi

    2004-03-01

    We isolated a new family of satellite DNA sequences from HaeIII- and EcoRI-digested genomic DNA of the Blakiston's fish owl ( Ketupa blakistoni). The repetitive sequences were organized in tandem arrays of the 174 bp element, and localized to the centromeric regions of all macrochromosomes, including the Z and W chromosomes, and microchromosomes. This hybridization pattern was consistent with the distribution of C-band-positive centromeric heterochromatin, and the satellite DNA sequences occupied 10% of the total genome as a major component of centromeric heterochromatin. The sequences were homogenized between macro- and microchromosomes in this species, and therefore intraspecific divergence of the nucleotide sequences was low. The 174 bp element cross-hybridized to the genomic DNA of six other Strigidae species, but not to that of the Tytonidae, suggesting that the satellite DNA sequences are conserved in the same family but fairly divergent between the different families in the Strigiformes. Secondly, the centromeric satellite DNAs were cloned from eight Strigidae species, and the nucleotide sequences of 41 monomer fragments were compared within and between species. Molecular phylogenetic relationships of the nucleotide sequences were highly correlated with both the taxonomy based on morphological traits and the phylogenetic tree constructed by DNA-DNA hybridization. These results suggest that the satellite DNA sequence has evolved by concerted evolution in the Strigidae and that it is a good taxonomic and phylogenetic marker to examine genetic diversity between Strigiformes species.

  7. Alignment of DNA and protein sequences containing frameshift errors

    Energy Technology Data Exchange (ETDEWEB)

    Guan, X.; Uberbacher, E.C.

    1995-04-01

    Molecular sequences, like all experimental data, are subject to error. Many current DNA sequencing protocols have very significant error rates and often generate artifactual insertions and deletions of bases (indels) which corrupt the translation of sequences and compromise the detection of protein homologies. The impact of these errors on the utility of molecular sequence data is dependent on the analytic technique used to interpret the data. In the presence of frameshift errors, standard algorithms using six frame translation can miss important homologies because only sub-fragments of the correct translation are available in any given frame. We present a new algorithm which can detect and correct frameshift errors in DNA sequences during comparison of translated sequences with protein sequences in the databases. This algorithm can recognize homologous proteins sharing 30% identity even in the presence of a 7% frameshift error rate. Our algorithm uses dynamic programming, producing a guaranteed optimal alignment in the presence of frameshifts, and has a sensitivity equivalent to Smith-Waterman. The computational efficiency of the algorithm is O(nm) where n and m are the sizes of two sequences being compared. The algorithm does not rely on prior knowledge or heuristic rules and performs significantly better than any previously reported method.

  8. A Transposon-Based Strategy for Sequencing Repetitive DNA in Eukaryotic Genomes

    Science.gov (United States)

    Devine, Scott E.; Chissoe, Stephanie L.; Eby, Yolanda; Wilson, Richard K.; Boeke, Jef D.

    1997-01-01

    Repetitive DNA is a significant component of eukaryotic genomes. We have developed a strategy to efficiently and accurately sequence repetitive DNA in the nematode Caenorhabditis elegans using integrated artificial transposons and automated fluorescent sequencing. Mapping and assembly tools represent important components of this strategy and facilitate sequence assembly in complex regions. We have applied the strategy to several cosmid assembly gaps resulting from repetitive DNA and have accurately recovered the sequences of these regions. Analysis of these regions revealed six novel transposon-like repetitive elements, IR-1, IR-2, IR-3, IR-4, IR-5, and TR-1. Each of these elements represents a middle-repetitive DNA family in C. elegans containing at least 3–140 copies per genome. Copies of IR-1, IR-2, IR-4, and IR-5 are located on all (or most) of the six nematode chromosomes, whereas IR-3 is predominantly located on chromosome X. These elements are almost exclusively interspersed between predicted genes or within the predicted introns of these genes, with the exception of a single IR-5 element, which is located within a predicted exon. IR-1, IR-2, and IR-3 are flanked by short sequence duplications resembling the target site duplications of transposons. We have established a website database (http://www.welch.jhu.edu/~devine/RepDNAdb.html) to track and cross-reference these transposon-like repetitive elements that contains detailed information on individual element copies and provides links to appropriate GenBank records. This set of tools may be used to sequence, track, and study repetitive DNA in model organisms and humans. [The sequences reported in this paper have been deposited in GenBank under accession nos. U53139 and U86946–U86951.] PMID:9149950

  9. Validation of human papillomavirus genotyping by signature DNA sequence analysis.

    Science.gov (United States)

    Lee, Sin Hang; Vigliotti, Veronica S; Vigliotti, Jessica S; Pappu, Suri

    2009-05-22

    Screening with combined cytologic and HPV testing has led to the highest number of excessive colposcopic referrals due to high false positive rates of the current HPV testing in the USA. How best to capitalize on the enhanced sensitivity of HPV DNA testing while minimizing false-positive results from its lower specificity is an important task for the clinical pathologists. The HPV L1 gene DNA in liquid-based Pap cytology specimens was initially amplified by the degenerate MY09/MY11 PCR primers and then re-amplified by the nested GP5+/GP6+ primers, or the heminested GP6/MY11, heminested GP5/MY09 primers or their modified equivalent without sample purification or DNA extraction. The nested PCR products were used for direct automated DNA sequencing. A 34- to 50-base sequence including the GP5+ priming site was selected as the signature sequence for routine genotyping by online BLAST sequence alignment algorithms. Of 3,222 specimens, 352 were found to contain HPV DNA, with 92% of the positive samples infected by only 1 of the 35 HPV genotypes detected and 8% by more than 1 HPV genotype. The most common genotype was HPV-16 (68 isolates), followed by HPV-52 (25 isolates). More than half (53.7%) of the total number of HPV isolates relied on a nested PCR for detection although the majority of HPV-16, -18, -31, -33 -35 and -58 isolates were detected by a single MY09/MY11 PCR. Alignment of a 34-base sequence downstream of the GP5+ site failed to distinguish some isolates of HPV-16, -31 and -33. Novel variants of HPV with less than "100% identities" signature sequence match with those stored in the Genbank database were also detected by signature DNA sequencing in this rural and suburban population of the United States. Laboratory staff must be familiar with the limitations of the consensus PCR primers, the locations of the signature sequence in the L1 gene for some HPV genotypes, and HPV genotype sequence variants in order to perform accurate HPV genotyping.

  10. Validation of human papillomavirus genotyping by signature DNA sequence analysis

    Directory of Open Access Journals (Sweden)

    Vigliotti Jessica S

    2009-05-01

    Full Text Available Abstract Background Screening with combined cytologic and HPV testing has led to the highest number of excessive colposcopic referrals due to high false positive rates of the current HPV testing in the USA. How best to capitalize on the enhanced sensitivity of HPV DNA testing while minimizing false-positive results from its lower specificity is an important task for the clinical pathologists. Methods The HPV L1 gene DNA in liquid-based Pap cytology specimens was initially amplified by the degenerate MY09/MY11 PCR primers and then re-amplified by the nested GP5+/GP6+ primers, or the heminested GP6/MY11, heminested GP5/MY09 primers or their modified equivalent without sample purification or DNA extraction. The nested PCR products were used for direct automated DNA sequencing. A 34- to 50-base sequence including the GP5+ priming site was selected as the signature sequence for routine genotyping by online BLAST sequence alignment algorithms. Results Of 3,222 specimens, 352 were found to contain HPV DNA, with 92% of the positive samples infected by only 1 of the 35 HPV genotypes detected and 8% by more than 1 HPV genotype. The most common genotype was HPV-16 (68 isolates, followed by HPV-52 (25 isolates. More than half (53.7% of the total number of HPV isolates relied on a nested PCR for detection although the majority of HPV-16, -18, -31, -33 -35 and -58 isolates were detected by a single MY09/MY11 PCR. Alignment of a 34-base sequence downstream of the GP5+ site failed to distinguish some isolates of HPV-16, -31 and -33. Novel variants of HPV with less than "100% identities" signature sequence match with those stored in the Genbank database were also detected by signature DNA sequencing in this rural and suburban population of the United States. Conclusion Laboratory staff must be familiar with the limitations of the consensus PCR primers, the locations of the signature sequence in the L1 gene for some HPV genotypes, and HPV genotype sequence

  11. DNA shotgun sequencing analysis of Garcinia mangostana L. variety Mesta

    Directory of Open Access Journals (Sweden)

    Syuhaidah Abu Bakar

    2017-06-01

    Full Text Available Mangosteen (Garcinia mangostana Linn. is an ultra-tropical tree characterized by its unique dark purple fruits with white flesh. The xanthone-rich purple pericarp tissue contains valuable compounds with medicinal properties. Following previously reported genome sequencing of a common variety of mangosteen [1], we performed another whole genome sequencing of a commercially popular variety of this fruit species (var. Mesta for comparative analysis of its genome composition. Raw reads of the DNA sequencing project were deposited to SRA database with the accession number SRX2709728.

  12. High-throughput DNA sequencing: a genomic data manufacturing process.

    Science.gov (United States)

    Huang, G M

    1999-01-01

    The progress trends in automated DNA sequencing operation are reviewed. Technological development in sequencing instruments, enzymatic chemistry and robotic stations has resulted in ever-increasing capacity of sequence data production. This progress leads to a higher demand on laboratory information management and data quality assessment. High-throughput laboratories face the challenge of organizational management, as well as technology management. Engineering principles of process control should be adopted in this biological data manufacturing procedure. While various systems attempt to provide solutions to automate different parts of, or even the entire process, new technical advances will continue to change the paradigm and provide new challenges.

  13. DNA recognition by F factor TraI36: highly sequence-specific binding of single-stranded DNA.

    Science.gov (United States)

    Stern, J C; Schildbach, J F

    2001-09-25

    The TraI protein has two essential roles in transfer of conjugative plasmid F Factor. As part of a complex of DNA-binding proteins, TraI introduces a site- and strand-specific nick at the plasmid origin of transfer (oriT), cutting the DNA strand that is transferred to the recipient cell. TraI also acts as a helicase, presumably unwinding the plasmid strands prior to transfer. As an essential feature of its nicking activity, TraI is capable of binding and cleaving single-stranded DNA oligonucleotides containing an oriT sequence. The specificity of TraI DNA recognition was examined by measuring the binding of oriT oligonucleotide variants to TraI36, a 36-kD amino-terminal domain of TraI that retains the sequence-specific nucleolytic activity. TraI36 recognition is highly sequence-specific for an 11-base region of oriT, with single base changes reducing affinity by as much as 8000-fold. The binding data correlate with plasmid mobilization efficiencies: plasmids containing sequences bound with lower affinities by TraI36 are transferred between cells at reduced frequencies. In addition to the requirement for high affinity binding to oriT, efficient in vitro nicking and in vivo plasmid mobilization requires a pyrimidine immediately 5' of the nick site. The high sequence specificity of TraI single-stranded DNA recognition suggests that despite its recognition of single-stranded DNA, TraI is capable of playing a major regulatory role in initiation and/or termination of plasmid transfer.

  14. The nucleotide sequence of two restriction fragments located in the gene AB region of bacteriophage S13.

    NARCIS (Netherlands)

    F.G. Grosveld (Frank); J.H. Spencer

    1977-01-01

    textabstractThe nucleotide sequence of a double stranded DNA fragment from the gene AB region of bacteriophage S13 DNA has been determined. The fragment was isolated as two adjacent shorter fragments by cleavage of S13 replicative form (RF) DNA with restriction endonuclease III from Hemophilus

  15. Direct evidence for sequence-dependent attraction between double-stranded DNA controlled by methylation

    Science.gov (United States)

    Yoo, Jejoong; Kim, Hajin; Aksimentiev, Aleksei; Ha, Taekjip

    2016-03-01

    Although proteins mediate highly ordered DNA organization in vivo, theoretical studies suggest that homologous DNA duplexes can preferentially associate with one another even in the absence of proteins. Here we combine molecular dynamics simulations with single-molecule fluorescence resonance energy transfer experiments to examine the interactions between duplex DNA in the presence of spermine, a biological polycation. We find that AT-rich DNA duplexes associate more strongly than GC-rich duplexes, regardless of the sequence homology. Methyl groups of thymine acts as a steric block, relocating spermine from major grooves to interhelical regions, thereby increasing DNA-DNA attraction. Indeed, methylation of cytosines makes attraction between GC-rich DNA as strong as that between AT-rich DNA. Recent genome-wide chromosome organization studies showed that remote contact frequencies are higher for AT-rich and methylated DNA, suggesting that direct DNA-DNA interactions that we report here may play a role in the chromosome organization and gene regulation.

  16. cDNA, genomic sequence cloning and overexpression of ribosomal ...

    African Journals Online (AJOL)

    RPS16 of eukaryote is a component of the 40S small ribosomal subunit encoded by RPS16 gene and is also a homolog of prokaryotic RPS9. The cDNA and genomic sequence of RPS16 was cloned successfully for the first time from the Giant Panda (Ailuropoda melanoleuca) using reverse transcription-polymerase chain ...

  17. cDNA, genomic sequence cloning and overexpression of ribosomal ...

    African Journals Online (AJOL)

    PRECIOUS

    2009-11-02

    Nov 2, 2009 ... RPS20 is a component of the 40S small ribosomal subunit encoded by RPS20 gene, which is conserved between eukaryotes, prokaryotes and archaebacteria. The cDNA and the genomic sequence of RPS20 were cloned successfully from the Giant Panda (Ailuropoda melanoleuca) using RT-PCR ...

  18. POSA : Perl objects for DNA sequencing data analysis

    NARCIS (Netherlands)

    Aerts, JA; Jungerius, BJ; Groenen, MA

    2004-01-01

    Background: Capillary DNA sequencing machines allow the generation of vast amounts of data with little hands-on time. With this expansion of data generation, there is a growing need for automated data processing. Most available software solutions, however, still require user intervention or provide

  19. POSA: perl objects for DNA sequencing data analysis

    NARCIS (Netherlands)

    Aerts, J.A.; Jungerius, B.J.; Groenen, M.A.M.

    2004-01-01

    Background - Capillary DNA sequencing machines allow the generation of vast amounts of data with little hands-on time. With this expansion of data generation, there is a growing need for automated data processing. Most available software solutions, however, still require user intervention or provide

  20. DNA sequence and prokaryotic expression analysis of vitellogenin ...

    African Journals Online (AJOL)

    In this study, the DNA sequence of vitellogenin from Antheraea pernyi (Ap-Vg) was identified and its functional domain (30-740 aa, Ap-Vg-1) was expressed in Escherichia coli BL21 (DE3) cells. The recombinant Ap-Vg-1 proteins were purified and used for antibody preparation. The results showed that the intact DNA ...

  1. RNA-DNA sequence differences spell genetic code ambiguities

    DEFF Research Database (Denmark)

    Bentin, Thomas; Nielsen, Michael L

    2013-01-01

    A recent paper in Science by Li et al. 2011(1) reports widespread sequence differences in the human transcriptome between RNAs and their encoding genes termed RNA-DNA differences (RDDs). The findings could add a new layer of complexity to gene expression but the study has been criticized. ...

  2. cDNA, genomic sequence cloning and overexpression of ...

    African Journals Online (AJOL)

    Cytochrome c oxidase (COX) is a component of the mitochondria respiratory chain. COX6b1 is one of the COX small subunits encoded by nuclear genes. In currently study, the cDNA and the genomic sequence of COX6b1 were successfully cloned from the Ailuropoda melanoleuca with the RT-PCR technology and ...

  3. Direct multiplex sequencing (DMPS)--a novel method for targeted high-throughput sequencing of ancient and highly degraded DNA

    National Research Council Canada - National Science Library

    Stiller, Mathias; Knapp, Michael; Stenzel, Udo; Hofreiter, Michael; Meyer, Matthias

    2009-01-01

    Although the emergence of high-throughput sequencing technologies has enabled whole-genome sequencing from extinct organisms, little progress has been made in accelerating targeted sequencing from highly degraded DNA...

  4. Comparative analysis of complete mitochondrial DNA control region of four species of Strigiformes.

    Science.gov (United States)

    Xiao, Bing; Ma, Fei; Sun, Yi; Li, Qing-Wei

    2006-11-01

    The sequence of the whole mitochondrial (mt) DNA control region (CR) of four species of Strigiformes was obtained. Length of the CR was 3,290 bp, 2,848 bp, 2,444 bp, and 1,771 bp for Asio flammeus, Asio otus, Athene noctua, and Strix aluco, respectively. Interestingly, the length of the control region was maximum in Asio flammeus among all the avian mtDNA control regions sequenced thus far. In addition, the base composition and organization of mtDNA CR of Asio flammeus were identical to those reported for other birds. On the basis of the differential frequencies of base substitutions, the CR may be divided two variable domains, I and III, and a central conserved domain, II. The 3' end of the CR contained many tandem repeats of varying lengths and repeat numbers. In Asio flammeus, the repeated sequences consisted of a 126 bp sequence that was repeated seven times and a 78 bp sequence that was repeated 14 times. In Asio otus, there were also two repeated sequences, namely a 127 bp sequence that was repeated eight times and a 78 bp sequence that was repeated six times. The control region of Athene noctua contained three sets of repeats: a 89 bp sequence that was repeated three times, a 77 bp sequence that was repeated four times, and a 71 bp sequence that was repeated six times. Strix aluco, however, had only one repeated sequence, a 78 bp sequence that was repeated five times. The results of this study seem to indicate that these tandem repeats may have resulted from slipped-strand mispairing during mtDNA replication. Moreover, there are many conserved motifs within the repeated units. These sequences could form stable stem-loop secondary structures, which suggests that these repeated sequences play an important role in regulating transcription and replication of the mitochondrial genome.

  5. Sequence heterogeneity accelerates protein search for targets on DNA

    Energy Technology Data Exchange (ETDEWEB)

    Shvets, Alexey A.; Kolomeisky, Anatoly B., E-mail: tolya@rice.edu [Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005 (United States)

    2015-12-28

    The process of protein search for specific binding sites on DNA is fundamentally important since it marks the beginning of all major biological processes. We present a theoretical investigation that probes the role of DNA sequence symmetry, heterogeneity, and chemical composition in the protein search dynamics. Using a discrete-state stochastic approach with a first-passage events analysis, which takes into account the most relevant physical-chemical processes, a full analytical description of the search dynamics is obtained. It is found that, contrary to existing views, the protein search is generally faster on DNA with more heterogeneous sequences. In addition, the search dynamics might be affected by the chemical composition near the target site. The physical origins of these phenomena are discussed. Our results suggest that biological processes might be effectively regulated by modifying chemical composition, symmetry, and heterogeneity of a genome.

  6. mtDNA sequence diversity of Hazara ethnic group from Pakistan.

    Science.gov (United States)

    Rakha, Allah; Fatima; Peng, Min-Sheng; Adan, Atif; Bi, Rui; Yasmin, Memona; Yao, Yong-Gang

    2017-09-01

    The present study was undertaken to investigate mitochondrial DNA (mtDNA) control region sequences of Hazaras from Pakistan, so as to generate mtDNA reference database for forensic casework in Pakistan and to analyze phylogenetic relationship of this particular ethnic group with geographically proximal populations. Complete mtDNA control region (nt 16024-576) sequences were generated through Sanger Sequencing for 319 Hazara individuals from Quetta, Baluchistan. The population sample set showed a total of 189 distinct haplotypes, belonging mainly to West Eurasian (51.72%), East & Southeast Asian (29.78%) and South Asian (18.50%) haplogroups. Compared with other populations from Pakistan, the Hazara population had a relatively high haplotype diversity (0.9945) and a lower random match probability (0.0085). The dataset has been incorporated into EMPOP database under accession number EMP00680. The data herein comprises the largest, and likely most thoroughly examined, control region mtDNA dataset from Hazaras of Pakistan. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Quality assessment of DNA sequence data: autopsy of a mis-sequenced mtDNA population sample.

    Science.gov (United States)

    Bandelt, H-J; Kivisild, T

    2006-05-01

    Published DNA data sets constitute a body of sequencing results resting in silico that are supposed to reflect the variation of (once) living cells. In cases where the DNA variation reported is suspected to be fraught with artefacts, an autopsy of the full body of data is needed to clarify the amount and causes of mis-sequencing. In this paper we elaborate on strategies that allow a clear-cut identification of the problems in severely flawed mtDNA data. This approach is applied, by way of example, to a data set of HVS-I sequences from the Caucasus, published by Nasidze & Stoneking in 2001. These data bear numerous ambiguous nucleotide positions and suffer from an even higher number of phantom mutations, indicating that severe biochemical problems adversely influenced those sequencing results at the time. Furthermore, systematic omission of sequences with a long C-stretch (incurred by a transition at position 16189) must have severely biased the data set. Since no complete correction of these data has appeared to date, this example of mis-sequencing necessitates circumstantial evidence that is bullet-proof.

  8. Population subdivision in Europe's great bustard inferred from mitochondrial and nuclear DNA sequence variation.

    Science.gov (United States)

    Pitra, C; Lieckfeldt, D; Alonso, J C

    2000-08-01

    A continent-wide survey of sequence variation in mitochondrial (mt) and nuclear (n) DNA of the endangered great bustard (Otis tarda) was conducted to assess the extent of phylogeographic structure in a morphologically monotypic bird. DNA sequence variation in a combined 809 bp segment of the mtDNA genome from 66 individuals from the last six breeding regions showed relatively low levels of intraspecific sequence diversity (n = 0.32%) but significant differences in the regional distribution of 11 haplotypes (phiST = 0.49). Despite their exceptional potential for dispersal, a complete and long-term historical separation between the populations from the Iberian Peninsula (Spain) and mainland Europe (Hungary, Slovakia, Germany, and Russia) was demonstrated. Divergence between populations based on a 3-bp insertion-deletion polymorphism within the intron region of the nuclear CHD-Z gene was geographically concordant with the primary subdivision identified within the mtDNA sequences. Inferred aspects of phylogeography were used to formulate conservation recommendations for this endangered species.

  9. The DNA sequence and biology of human chromosome 19

    Energy Technology Data Exchange (ETDEWEB)

    Grimwood, J; Gordon, L A; Olsen, A; Terry, A; Schmutz, J; Lamerdin, J; Hellsten, U; Goodstein, D; Couronne, O; Tran-Gyamfi, M

    2004-04-06

    Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high GC content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9% of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in Mendelian disorders, including familial hypercholesterolemia and insulin-resistant diabetes. Nearly one quarter of these genes belong to tandemly arranged families, encompassing more than 25% of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.

  10. Ultra-deep sequencing of mouse mitochondrial DNA: mutational patterns and their origins.

    Directory of Open Access Journals (Sweden)

    Adam Ameur

    2011-03-01

    Full Text Available Somatic mutations of mtDNA are implicated in the aging process, but there is no universally accepted method for their accurate quantification. We have used ultra-deep sequencing to study genome-wide mtDNA mutation load in the liver of normally- and prematurely-aging mice. Mice that are homozygous for an allele expressing a proof-reading-deficient mtDNA polymerase (mtDNA mutator mice have 10-times-higher point mutation loads than their wildtype siblings. In addition, the mtDNA mutator mice have increased levels of a truncated linear mtDNA molecule, resulting in decreased sequence coverage in the deleted region. In contrast, circular mtDNA molecules with large deletions occur at extremely low frequencies in mtDNA mutator mice and can therefore not drive the premature aging phenotype. Sequence analysis shows that the main proportion of the mutation load in heterozygous mtDNA mutator mice and their wildtype siblings is inherited from their heterozygous mothers consistent with germline transmission. We found no increase in levels of point mutations or deletions in wildtype C57Bl/6N mice with increasing age, thus questioning the causative role of these changes in aging. In addition, there was no increased frequency of transversion mutations with time in any of the studied genotypes, arguing against oxidative damage as a major cause of mtDNA mutations. Our results from studies of mice thus indicate that most somatic mtDNA mutations occur as replication errors during development and do not result from damage accumulation in adult life.

  11. Environmental DNA sequencing primers for eutardigrades and bdelloid rotifers.

    Science.gov (United States)

    Robeson, Michael S; Costello, Elizabeth K; Freeman, Kristen R; Whiting, Jeremy; Adams, Byron; Martin, Andrew P; Schmidt, Steve K

    2009-12-11

    The time it takes to isolate individuals from environmental samples and then extract DNA from each individual is one of the problems with generating molecular data from meiofauna such as eutardigrades and bdelloid rotifers. The lack of consistent morphological information and the extreme abundance of these classes makes morphological identification of rare, or even common cryptic taxa a large and unwieldy task. This limits the ability to perform large-scale surveys of the diversity of these organisms.Here we demonstrate a culture-independent molecular survey approach that enables the generation of large amounts of eutardigrade and bdelloid rotifer sequence data directly from soil. Our PCR primers, specific to the 18s small-subunit rRNA gene, were developed for both eutardigrades and bdelloid rotifers. The developed primers successfully amplified DNA of their target organism from various soil DNA extracts. This was confirmed by both the BLAST similarity searches and phylogenetic analyses. Tardigrades showed much better phylogenetic resolution than bdelloids. Both groups of organisms exhibited varying levels of endemism. The development of clade-specific primers for characterizing eutardigrades and bdelloid rotifers from environmental samples should greatly increase our ability to characterize the composition of these taxa in environmental samples. Environmental sequencing as shown here differs from other molecular survey methods in that there is no need to pre-isolate the organisms of interest from soil in order to amplify their DNA. The DNA sequences obtained from methods that do not require culturing can be identified post-hoc and placed phylogenetically as additional closely related sequences are obtained from morphologically identified conspecifics. Our non-cultured environmental sequence based approach will be able to provide a rapid and large-scale screening of the presence, absence and diversity of Bdelloidea and Eutardigrada in a variety of soils.

  12. DNA sequence analysis using hierarchical ART-based classification networks

    Energy Technology Data Exchange (ETDEWEB)

    LeBlanc, C.; Hruska, S.I. [Florida State Univ., Tallahassee, FL (United States); Katholi, C.R.; Unnasch, T.R. [Univ. of Alabama, Birmingham, AL (United States)

    1994-12-31

    Adaptive resonance theory (ART) describes a class of artificial neural network architectures that act as classification tools which self-organize, work in real-time, and require no retraining to classify novel sequences. We have adapted ART networks to provide support to scientists attempting to categorize tandem repeat DNA fragments from Onchocerca volvulus. In this approach, sequences of DNA fragments are presented to multiple ART-based networks which are linked together into two (or more) tiers; the first provides coarse sequence classification while the sub- sequent tiers refine the classifications as needed. The overall rating of the resulting classification of fragments is measured using statistical techniques based on those introduced to validate results from traditional phylogenetic analysis. Tests of the Hierarchical ART-based Classification Network, or HABclass network, indicate its value as a fast, easy-to-use classification tool which adapts to new data without retraining on previously classified data.

  13. DNA barcode regions for differentiating Cattleya walkeriana and C. loddigesii

    Directory of Open Access Journals (Sweden)

    Hernando Rivera-Jiménez

    2017-05-01

    Full Text Available Growers appreciate Cattleya walkeriana and C. loddigesii due to striking shape and rarity. Thus, this study aimed to evaluate the feasibility of DNA barcode regions, namely ITS1, ITS2 and rpoC1, to discriminate between C. walkeriana and C. loddigesii species. DNA barcode regions were successfully amplified using primers designed to amplify plants. We also included sequences from public databases in order to test if these regions were able to discriminate C. walkeriana and C. loddigesii from other Cattleya species. These regions, and their combinations, demonstrated that the ITS1+ITS2 had the highest average interspecific distance (11.1%, followed by rpoC1 (1.06%. For species discrimination, ITS1+ITS2 provided the best results. The combined data set of ITS1+ITS2+rpoC1 also discriminated both species, but did not result in higher rates of discrimination. These results indicate that ITS region is the best option for molecular identification of these two species and from some other species of this genus.

  14. VoSeq: a voucher and DNA sequence web application.

    Directory of Open Access Journals (Sweden)

    Carlos Peña

    Full Text Available There is an ever growing number of molecular phylogenetic studies published, due to, in part, the advent of new techniques that allow cheap and quick DNA sequencing. Hence, the demand for relational databases with which to manage and annotate the amassing DNA sequences, genes, voucher specimens and associated biological data is increasing. In addition, a user-friendly interface is necessary for easy integration and management of the data stored in the database back-end. Available databases allow management of a wide variety of biological data. However, most database systems are not specifically constructed with the aim of being an organizational tool for researchers working in phylogenetic inference. We here report a new software facilitating easy management of voucher and sequence data, consisting of a relational database as back-end for a graphic user interface accessed via a web browser. The application, VoSeq, includes tools for creating molecular datasets of DNA or amino acid sequences ready to be used in commonly used phylogenetic software such as RAxML, TNT, MrBayes and PAUP, as well as for creating tables ready for publishing. It also has inbuilt BLAST capabilities against all DNA sequences stored in VoSeq as well as sequences in NCBI GenBank. By using mash-ups and calls to web services, VoSeq allows easy integration with public services such as Yahoo! Maps, Flickr, Encyclopedia of Life (EOL and GBIF (by generating data-dumps that can be processed with GBIF's Integrated Publishing Toolkit.

  15. VoSeq: A Voucher and DNA Sequence Web Application

    Science.gov (United States)

    Peña, Carlos; Malm, Tobias

    2012-01-01

    There is an ever growing number of molecular phylogenetic studies published, due to, in part, the advent of new techniques that allow cheap and quick DNA sequencing. Hence, the demand for relational databases with which to manage and annotate the amassing DNA sequences, genes, voucher specimens and associated biological data is increasing. In addition, a user-friendly interface is necessary for easy integration and management of the data stored in the database back-end. Available databases allow management of a wide variety of biological data. However, most database systems are not specifically constructed with the aim of being an organizational tool for researchers working in phylogenetic inference. We here report a new software facilitating easy management of voucher and sequence data, consisting of a relational database as back-end for a graphic user interface accessed via a web browser. The application, VoSeq, includes tools for creating molecular datasets of DNA or amino acid sequences ready to be used in commonly used phylogenetic software such as RAxML, TNT, MrBayes and PAUP, as well as for creating tables ready for publishing. It also has inbuilt BLAST capabilities against all DNA sequences stored in VoSeq as well as sequences in NCBI GenBank. By using mash-ups and calls to web services, VoSeq allows easy integration with public services such as Yahoo! Maps, Flickr, Encyclopedia of Life (EOL) and GBIF (by generating data-dumps that can be processed with GBIF's Integrated Publishing Toolkit). PMID:22720030

  16. Routine human papillomavirus genotyping by DNA sequencing in community hospital laboratories

    Directory of Open Access Journals (Sweden)

    Vigliotti Jessica S

    2007-06-01

    amplification by nested PCR and for preparation of clinical materials for genotyping by direct DNA sequencing. HPV genotyping is performed by on-line BLAST algorithm of a hypervariable L1 region. The DNA sequence is included in each report to the physician for comparison in following up patients with persistent HPV infection, a recognized tumor promoter in cancer induction.

  17. Using Next-Generation Sequencing for DNA Barcoding: Capturing Allelic Variation in ITS2.

    Science.gov (United States)

    Batovska, Jana; Cogan, Noel O I; Lynch, Stacey E; Blacket, Mark J

    2017-01-05

    Internal Transcribed Spacer 2 (ITS2) is a popular DNA barcoding marker; however, in some animal species it is hypervariable and therefore difficult to sequence with traditional methods. With next-generation sequencing (NGS) it is possible to sequence all gene variants despite the presence of single nucleotide polymorphisms (SNPs), insertions/deletions (indels), homopolymeric regions, and microsatellites. Our aim was to compare the performance of Sanger sequencing and NGS amplicon sequencing in characterizing ITS2 in 26 mosquito species represented by 88 samples. The suitability of ITS2 as a DNA barcoding marker for mosquitoes, and its allelic diversity in individuals and species, was also assessed. Compared to Sanger sequencing, NGS was able to characterize the ITS2 region to a greater extent, with resolution within and between individuals and species that was previously not possible. A total of 382 unique sequences (alleles) were generated from the 88 mosquito specimens, demonstrating the diversity present that has been overlooked by traditional sequencing methods. Multiple indels and microsatellites were present in the ITS2 alleles, which were often specific to species or genera, causing variation in sequence length. As a barcoding marker, ITS2 was able to separate all of the species, apart from members of the Culex pipiens complex, providing the same resolution as the commonly used Cytochrome Oxidase I (COI). The ability to cost-effectively sequence hypervariable markers makes NGS an invaluable tool with many applications in the DNA barcoding field, and provides insights into the limitations of previous studies and techniques. Copyright © 2017 Batovska et al.

  18. Using Next-Generation Sequencing for DNA Barcoding: Capturing Allelic Variation in ITS2

    Directory of Open Access Journals (Sweden)

    Jana Batovska

    2017-01-01

    Full Text Available Internal Transcribed Spacer 2 (ITS2 is a popular DNA barcoding marker; however, in some animal species it is hypervariable and therefore difficult to sequence with traditional methods. With next-generation sequencing (NGS it is possible to sequence all gene variants despite the presence of single nucleotide polymorphisms (SNPs, insertions/deletions (indels, homopolymeric regions, and microsatellites. Our aim was to compare the performance of Sanger sequencing and NGS amplicon sequencing in characterizing ITS2 in 26 mosquito species represented by 88 samples. The suitability of ITS2 as a DNA barcoding marker for mosquitoes, and its allelic diversity in individuals and species, was also assessed. Compared to Sanger sequencing, NGS was able to characterize the ITS2 region to a greater extent, with resolution within and between individuals and species that was previously not possible. A total of 382 unique sequences (alleles were generated from the 88 mosquito specimens, demonstrating the diversity present that has been overlooked by traditional sequencing methods. Multiple indels and microsatellites were present in the ITS2 alleles, which were often specific to species or genera, causing variation in sequence length. As a barcoding marker, ITS2 was able to separate all of the species, apart from members of the Culex pipiens complex, providing the same resolution as the commonly used Cytochrome Oxidase I (COI. The ability to cost-effectively sequence hypervariable markers makes NGS an invaluable tool with many applications in the DNA barcoding field, and provides insights into the limitations of previous studies and techniques.

  19. Assessing the fidelity of ancient DNA sequences amplified from nuclear genes

    DEFF Research Database (Denmark)

    Binladen, Jonas; Wiuf, Carsten Henrik; Gilbert, M. Thomas P.

    2006-01-01

    To date, the field of ancient DNA has relied almost exclusively on mitochondrial DNA (mtDNA) sequences. However, a number of recent studies have reported the successful recovery of ancient nuclear DNA (nuDNA) sequences, thereby allowing the characterization of genetic loci directly involved in ph...

  20. Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.

    Science.gov (United States)

    Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C

    2018-01-10

    Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing

  1. [Cloning and analyzing of the cDNA sequence of CHS-A gene of Narcissus].

    Science.gov (United States)

    Huang, Yin Yi; Shen, Ming Shan; Chen, Liang; Li, Peng; Chen, Mu Zhuan

    2002-09-01

    Chalcone synthase (CHS) is a key enzyme in the biosynthesis of all classes of flavonoids. The production of flower pigment is specifically regulated by the activity of CHS. We cloned the cDNA sequence of CHS-A gene from Narcissus by PCR and analyzed the coding sequence of gene. The result demonstrated that the sequence of the coding region was 1167bp, encoding a protein of 389 amino acid which was more than 80% homology with CHS of the other 8 plants, such as Nicotine abacus and Solana tuberosum.

  2. The influence of DNA sequence on epigenome-induced pathologies

    Science.gov (United States)

    2012-01-01

    Clear cause-and-effect relationships are commonly established between genotype and the inherited risk of acquiring human and plant diseases and aberrant phenotypes. By contrast, few such cause-and-effect relationships are established linking a chromatin structure (that is, the epitype) with the transgenerational risk of acquiring a disease or abnormal phenotype. It is not entirely clear how epitypes are inherited from parent to offspring as populations evolve, even though epigenetics is proposed to be fundamental to evolution and the likelihood of acquiring many diseases. This article explores the hypothesis that, for transgenerationally inherited chromatin structures, “genotype predisposes epitype”, and that epitype functions as a modifier of gene expression within the classical central dogma of molecular biology. Evidence for the causal contribution of genotype to inherited epitypes and epigenetic risk comes primarily from two different kinds of studies discussed herein. The first and direct method of research proceeds by the examination of the transgenerational inheritance of epitype and the penetrance of phenotype among genetically related individuals. The second approach identifies epitypes that are duplicated (as DNA sequences are duplicated) and evolutionarily conserved among repeated patterns in the DNA sequence. The body of this article summarizes particularly robust examples of these studies from humans, mice, Arabidopsis, and other organisms. The bulk of the data from both areas of research support the hypothesis that genotypes predispose the likelihood of displaying various epitypes, but for only a few classes of epitype. This analysis suggests that renewed efforts are needed in identifying polymorphic DNA sequences that determine variable nucleosome positioning and DNA methylation as the primary cause of inherited epigenome-induced pathologies. By contrast, there is very little evidence that DNA sequence directly determines the inherited

  3. The influence of DNA sequence on epigenome-induced pathologies

    Directory of Open Access Journals (Sweden)

    Meagher Richard B

    2012-07-01

    Full Text Available Abstract Clear cause-and-effect relationships are commonly established between genotype and the inherited risk of acquiring human and plant diseases and aberrant phenotypes. By contrast, few such cause-and-effect relationships are established linking a chromatin structure (that is, the epitype with the transgenerational risk of acquiring a disease or abnormal phenotype. It is not entirely clear how epitypes are inherited from parent to offspring as populations evolve, even though epigenetics is proposed to be fundamental to evolution and the likelihood of acquiring many diseases. This article explores the hypothesis that, for transgenerationally inherited chromatin structures, “genotype predisposes epitype”, and that epitype functions as a modifier of gene expression within the classical central dogma of molecular biology. Evidence for the causal contribution of genotype to inherited epitypes and epigenetic risk comes primarily from two different kinds of studies discussed herein. The first and direct method of research proceeds by the examination of the transgenerational inheritance of epitype and the penetrance of phenotype among genetically related individuals. The second approach identifies epitypes that are duplicated (as DNA sequences are duplicated and evolutionarily conserved among repeated patterns in the DNA sequence. The body of this article summarizes particularly robust examples of these studies from humans, mice, Arabidopsis, and other organisms. The bulk of the data from both areas of research support the hypothesis that genotypes predispose the likelihood of displaying various epitypes, but for only a few classes of epitype. This analysis suggests that renewed efforts are needed in identifying polymorphic DNA sequences that determine variable nucleosome positioning and DNA methylation as the primary cause of inherited epigenome-induced pathologies. By contrast, there is very little evidence that DNA sequence directly

  4. Temporal stability of epigenetic markers: sequence characteristics and predictors of short-term DNA methylation variations.

    Directory of Open Access Journals (Sweden)

    Hyang-Min Byun

    Full Text Available BACKGROUND: DNA methylation is an epigenetic mechanism that has been increasingly investigated in observational human studies, particularly on blood leukocyte DNA. Characterizing the degree and determinants of DNA methylation stability can provide critical information for the design and conduction of human epigenetic studies. METHODS: We measured DNA methylation in 12 gene-promoter regions (APC, p16, p53, RASSF1A, CDH13, eNOS, ET-1, IFNγ, IL-6, TNFα, iNOS, and hTERT and 2 of non-long terminal repeat elements, i.e., L1 and Alu in blood samples obtained from 63 healthy individuals at baseline (Day 1 and after three days (Day 4. DNA methylation was measured by bisulfite-PCR-Pyrosequencing. We calculated intraclass correlation coefficients (ICCs to measure the within-individual stability of DNA methylation between Day 1 and 4, subtracted of pyrosequencing error and adjusted for multiple covariates. RESULTS: Methylation markers showed different temporal behaviors ranging from high (IL-6, ICC = 0.89 to low stability (APC, ICC = 0.08 between Day 1 and 4. Multiple sequence and marker characteristics were associated with the degree of variation. Density of CpG dinucleotides nearby the sequence analyzed (measured as CpG(o/e or G+C content within ±200 bp was positively associated with DNA methylation stability. The 3' proximity to repeat elements and range of DNA methylation on Day 1 were also positively associated with methylation stability. An inverted U-shaped correlation was observed between mean DNA methylation on Day 1 and stability. CONCLUSIONS: The degree of short-term DNA methylation stability is marker-dependent and associated with sequence characteristics and methylation levels.

  5. Early Lyme disease with spirochetemia - diagnosed by DNA sequencing

    Directory of Open Access Journals (Sweden)

    Jones William

    2010-11-01

    Full Text Available Abstract Background A sensitive and analytically specific nucleic acid amplification test (NAAT is valuable in confirming the diagnosis of early Lyme disease at the stage of spirochetemia. Findings Venous blood drawn from patients with clinical presentations of Lyme disease was tested for the standard 2-tier screen and Western Blot serology assay for Lyme disease, and also by a nested polymerase chain reaction (PCR for B. burgdorferi sensu lato 16S ribosomal DNA. The PCR amplicon was sequenced for B. burgdorferi genomic DNA validation. A total of 130 patients visiting emergency room (ER or Walk-in clinic (WALKIN, and 333 patients referred through the private physicians' offices were studied. While 5.4% of the ER/WALKIN patients showed DNA evidence of spirochetemia, none (0% of the patients referred from private physicians' offices were DNA-positive. In contrast, while 8.4% of the patients referred from private physicians' offices were positive for the 2-tier Lyme serology assay, only 1.5% of the ER/WALKIN patients were positive for this antibody test. The 2-tier serology assay missed 85.7% of the cases of early Lyme disease with spirochetemia. The latter diagnosis was confirmed by DNA sequencing. Conclusion Nested PCR followed by automated DNA sequencing is a valuable supplement to the standard 2-tier antibody assay in the diagnosis of early Lyme disease with spirochetemia. The best time to test for Lyme spirochetemia is when the patients living in the Lyme disease endemic areas develop unexplained symptoms or clinical manifestations that are consistent with Lyme disease early in the course of their illness.

  6. PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context

    OpenAIRE

    Zhou, Jiyun; Xu, Ruifeng; He, Yulan; Lu, Qin; Wang, Hongpeng; Kong, Bing

    2016-01-01

    Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predi...

  7. Cloning and sequencing of cDNA and genomic DNA encoding PDM phosphatase of Fusarium moniliforme.

    Science.gov (United States)

    Yoshida, Hiroshi; Iizuka, Mari; Narita, Takao; Norioka, Naoko; Norioka, Shigemi

    2006-12-01

    PDM phosphatase was purified approximately 500-fold through six steps from the extract of dried powder of the culture filtrate of Fusarium moniliforme. The purified preparation appeared homogeneous on SDS-PAGE although the protein band was broad. Amino acid sequence information was collected on tryptic peptides from this preparation. cDNA cloning was carried out based on the information. A full-length cDNA was obtained and sequenced. The sequence had an open reading frame of 651 amino acid residues with a molecular mass of 69,988 Da. Cloning and sequencing of the genomic DNA corresponding to the cDNA was also conducted. The deduced amino acid sequence could account for many but not all of the tryptic peptides, suggesting presence of contaminant protein(s). SDS-PAGE analysis after chemical deglycosylation showed two proteins with molecular masses of 58 and 68 kDa. This implied that the 58 kDa protein had been copurified with PDM phosphatase. Homology search showed that PDM phosphatase belongs to the purple acid phosphatase family, which is widely distributed in the biosphere. Sequence data of fungal purple acid phosphatases were collected from the database. Processing of the data revealed presence of two types, whose evolutionary relationships were discussed.

  8. Phylogenetic relationships of the Gomphales based on nuc-25S-rDNA, mit-12S-rDNA, and mit-atp6-DNA combined sequences

    Science.gov (United States)

    Admir J. Giachini; Kentaro Hosaka; Eduardo Nouhra; Joseph Spatafora; James M. Trappe

    2010-01-01

    Phylogenetic relationships among Geastrales, Gomphales, Hysterangiales, and Phallales were estimated via combined sequences: nuclear large subunit ribosomal DNA (nuc-25S-rDNA), mitochondrial small subunit ribosomal DNA (mit-12S-rDNA), and mitochondrial atp6 DNA (mit-atp6-DNA). Eighty-one taxa comprising 19 genera and 58 species...

  9. Moloney murine sarcoma virus MuSVts110 DNA: cloning, nucleotide sequence, and gene expression.

    Science.gov (United States)

    Huai, L; Chiocca, S M; Gilbreth, M A; Ainsworth, J R; Bishop, L A; Murphy, E C

    1992-09-01

    We have cloned Moloney murine sarcoma virus (MuSV) MuSVts110 DNA by assembly of polymerase chain reaction (PCR)-amplified segments of integrated viral DNA from infected NRK cells (6m2 cells) and determined its complete sequence. Previously, by direct sequencing of MuSVts110 RNA transcribed in 6m2 cells, we established that the thermosensitive RNA splicing phenotype uniquely characteristic of MuSVts110 results from a deletion of 1,487 nucleotides of progenitor MuSV-124 sequences. As anticipated, the sequence obtained in this study contained precisely this same deletion. In addition, several other unexpected sequence differences were found between MuSVts110 and MuSV-124. For example, in the noncoding region upstream of the gag gene, MuSVts110 DNA contained a 52-nucleotide tract typical of murine leukemia virus rather than MuSV-124, suggesting that MuSVts110 originated as a MuSV-helper murine leukemia virus recombinant during reverse transcription rather than from a straightforward deletion within MuSV-124. In addition, both MuSVts110 long terminal repeats contained head-to-tail duplications of eight nucleotides in the U3 region. Finally, seven single-nucleotide substitutions were found scattered throughout MuSVts110 DNA. Three of the nucleotide substitutions were in the gag gene, resulting in one coding change in p15 and one in p30. All of the remaining nucleotide changes were found in the noncoding region between the 5' long terminal repeat and the gag gene. In NIH 3T3 cells transfected with the cloned MuSVts110 DNA, the pattern of viral RNA expression conformed with that observed in cells infected with authentic MuSVts110 virus in that viral RNA splicing was 30 to 40% efficient at growth temperatures between 28 and 33 degrees C but reduced to trace levels above 37 degrees C.

  10. Mouse tetranectin: cDNA sequence, tissue-specific expression, and chromosomal mapping

    DEFF Research Database (Denmark)

    Ibaraki, K; Kozak, C A; Wewer, U M

    1995-01-01

    % identity and 87% similarity at the amino acid level. Sequence comparisons between mouse and human tetranectin and some C-type lectins confirmed a complete conservation in the position of six cysteines as well as numerous other amino acid residues, indicating an essential structure for potential function...... regulation, mouse tetranectin cDNA was cloned from a 16-day-old mouse embryo library. Sequence analysis revealed a 992-bp cDNA with an open reading frame of 606 bp, which is identical in length to the human tetranectin cDNA. The deduced amino acid sequence showed high homology to the human cDNA with 76......(s) of tetranectin. The sequence analysis revealed a difference in both sequence and size of the noncoding regions between mouse and human cDNAs. Northern analysis of the various tissues from mouse, rat, and cow showed the major transcript(s) to be approximately 1 kb, which is similar in size to that observed...

  11. Standardization of DNA extraction from sand flies: Application to genotyping by next generation sequencing.

    Science.gov (United States)

    Casaril, Aline Etelvina; de Oliveira, Liliane Prado; Alonso, Diego Peres; de Oliveira, Everton Falcão; Gomes Barrios, Suellem Petilim; de Oliveira Moura Infran, Jucelei; Fernandes, Wagner de Souza; Oshiro, Elisa Teruya; Ferreira, Alda Maria Teixeira; Ribolla, Paulo Eduardo Martins; de Oliveira, Alessandra Gutierrez

    2017-06-01

    Standardization of the methods for extraction of DNA from sand flies is essential for obtaining high efficiency during subsequent molecular analyses, such as the new sequencing methods. Information obtained using these methods may contribute substantially to taxonomic, evolutionary, and eco-epidemiological studies. The aim of the present study was to standardize and compare two methods for the extraction of genomic DNA from sand flies for obtaining DNA in sufficient quantities for next-generation sequencing. Sand flies were collected from the municipalities of Campo Grande, Camapuã, Corumbá and Miranda, state of Mato Grosso do Sul, Brazil. Three protocols using a silica column-based commercial kit (ReliaPrep™ Blood gDNA Miniprep System kit, Promega(®)), and three protocols based on the classical phenol-chloroform extraction method (Uliana et al., 1991), were compared with respect to the yield and quality of the extracted DNA. DNA was quantified using a Qubit 2.0 fluorometer. The presence of sand fly DNA was confirmed by PCR amplification of the IVS6 region (constitutive gene), followed by electrophoresis on a 1.5% agarose gel. A total of 144 male specimens were analyzed, 72 per method. Significant differences were observed between the two methods tested. Protocols 2 and 3 of phenol-chloroform extraction presented significantly better performance than all commercial kit extraction protocols tested. For phenol-chloroform extraction, protocol 3 presented significantly better performance than protocols 1 and 2. The IVS6 region was detected in 70 of 72 (97.22%) samples extracted with phenol, including all samples for protocols 2 and 3. This is the first study on the standardization of methods for the extraction of DNA from sand flies for application to next-generation sequencing, which is a promising tool for entomological and molecular studies of sand flies. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Phylogenetic analysis of the genus Hordeum using repetitive DNA sequences

    DEFF Research Database (Denmark)

    Svitashev, S.; Bryngelsson, T.; Vershinin, A.

    1994-01-01

    over all chromosomes of H. vulgare and the wild barley species H. bulbosum, H. marinum and H. murinum. Southern blot hybridization revealed different levels of polymorphism among barley species and the RFLP data were used to generate a phylogenetic tree for the genus Hordeum. Our data are in a good......A set of six cloned barley (Hordeum vulgare) repetitive DNA sequences was used for the analysis of phylogenetic relationships among 31 species (46 taxa) of the genus Hordeum, using molecular hybridization techniques. In situ hybridization experiments showed dispersed organization of the sequences...

  13. The implementation of bit-parallelism for DNA sequence alignment

    Science.gov (United States)

    Setyorini; Kuspriyanto; Widyantoro, D. H.; Pancoro, A.

    2017-05-01

    Dynamic Programming (DP) remain the central algorithm of biological sequence alignment. Matching score computation is the most time-consuming process. Bit-parallelism is one of approximate string matching techniques that transform DP matrix cell unit processing into word unit (groups of cell). Bit-parallelism computate the scores column-wise. Adopting from word processing in computer system work, this technique promise reducing time in score computing process in DP matrix. In this paper, we implement bit-parallelism technique for DNA sequence alignment. Our bit-parallelism implementation have less time for score computational process but still need improvement for there construction process.

  14. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within...... in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H. influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions....

  15. Using RNase sequence specificity to refine the identification of RNA-protein binding regions

    OpenAIRE

    Wang Xinguo; Li Lang; Shen Changyu; Wang Guohua; Wang Xin; Mooney Sean D; Edenberg Howard J; Sanford Jeremy R; Liu Yunlong

    2008-01-01

    Abstract Massively parallel pyrosequencing is a high-throughput technology that can sequence hundreds of thousands of DNA/RNA fragments in a single experiment. Combining it with immunoprecipitation-based biochemical assays, such as cross-linking immunoprecipitation (CLIP), provides a genome-wide method to detect the sites at which proteins bind DNA or RNA. In a CLIP-pyrosequencing experiment, the resolutions of the detected protein binding regions are partially determined by the length of the...

  16. Evaluation of intra- and interspecific divergence of satellite DNA sequences by nucleotide frequency calculation and pairwise sequence comparison

    Directory of Open Access Journals (Sweden)

    Kato Mikio

    2003-01-01

    Full Text Available Satellite DNA sequences are known to be highly variable and to have been subjected to concerted evolution that homogenizes member sequences within species. We have analyzed the mode of evolution of satellite DNA sequences in four fishes from the genus Diplodus by calculating the nucleotide frequency of the sequence array and the phylogenetic distances between member sequences. Calculation of nucleotide frequency and pairwise sequence comparison enabled us to characterize the divergence among member sequences in this satellite DNA family. The results suggest that the evolutionary rate of satellite DNA in D. bellottii is about two-fold greater than the average of the other three fishes, and that the sequence homogenization event occurred in D. puntazzo more recently than in the others. The procedures described here are effective to characterize mode of evolution of satellite DNA.

  17. A Statistical Thermodynamic Model for Investigating the Stability of DNA Sequences from Oligonucleotides to Genomes

    Science.gov (United States)

    Khandelwal, Garima; Lee, Rebecca A.; Jayaram, B.; Beveridge, David L.

    2014-01-01

    We describe the development and testing of a simple statistical mechanics methodology for duplex DNA applicable to sequences of any composition and extensible to genomes. The microstates of a DNA sequence are modeled in terms of blocks of basepairs that are assumed to be fully closed (paired) or open. This approach generates an ensemble of bubblelike microstates that are used to calculate the corresponding partition function. The energies of the microstates are calculated as additive contributions from hydrogen bonding, basepair stacking, and solvation terms parameterized from a comprehensive series of molecular dynamics simulations including solvent and ions. Thermodynamic properties and nucleotide stability constants for DNA sequences follow directly from the partition function. The methodology was tested by comparing computed free energies per basepair with the experimental melting temperatures of 60 oligonucleotides, yielding a correlation coefficient of −0.96. The thermodynamic stability of genic/nongenic regions was tested in terms of nucleotide stability constants versus sequence for the Escherichia coli K-12 genome. It showed clear differentiation of the genes from promoters and captures genic regions with a sensitivity of 0.94. The statistical thermodynamic model presented here provides a seemingly new handle on the challenging problem of interpreting genomic sequences. PMID:24896126

  18. A statistical thermodynamic model for investigating the stability of DNA sequences from oligonucleotides to genomes.

    Science.gov (United States)

    Khandelwal, Garima; Lee, Rebecca A; Jayaram, B; Beveridge, David L

    2014-06-03

    We describe the development and testing of a simple statistical mechanics methodology for duplex DNA applicable to sequences of any composition and extensible to genomes. The microstates of a DNA sequence are modeled in terms of blocks of basepairs that are assumed to be fully closed (paired) or open. This approach generates an ensemble of bubblelike microstates that are used to calculate the corresponding partition function. The energies of the microstates are calculated as additive contributions from hydrogen bonding, basepair stacking, and solvation terms parameterized from a comprehensive series of molecular dynamics simulations including solvent and ions. Thermodynamic properties and nucleotide stability constants for DNA sequences follow directly from the partition function. The methodology was tested by comparing computed free energies per basepair with the experimental melting temperatures of 60 oligonucleotides, yielding a correlation coefficient of -0.96. The thermodynamic stability of genic/nongenic regions was tested in terms of nucleotide stability constants versus sequence for the Escherichia coli K-12 genome. It showed clear differentiation of the genes from promoters and captures genic regions with a sensitivity of 0.94. The statistical thermodynamic model presented here provides a seemingly new handle on the challenging problem of interpreting genomic sequences. Copyright © 2014 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  19. Nonrepetitive DNA Sequence Representation in Sea Urchin Embryo Messenger RNA

    Science.gov (United States)

    Goldberg, Robert B.; Galau, Glenn A.; Britten, Roy J.; Davidson, Eric H.

    1973-01-01

    Messenger RNA was prepared from developing sea urchin gastrulae by puromycin release from polyribosomes. Approximately 60% of the total mRNA radioactivity of the postnuclear supernatant was recovered and shown to be free of any other labeled RNA species such as ribosomal and nuclear RNA. The mRNA was examined by hybridization to DNA present in great excess. The mRNA hybridizes almost exclusively with nonrepetitive DNA. Almost all of the messenger RNA molecules of sea urchin gastrulae therefore consist of transcripts from nonrepetitive sequences. It appears that the structural genes expressed at this stage are typically not repeated in the genome and the mRNA does not include recognizable repetitive sequence. PMID:4519642

  20. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA

    DEFF Research Database (Denmark)

    Alquezar-Planas, David E; Fordyce, Sarah Louise

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure tha...

  1. High-Throughput Sequencing of Three Lemnoideae (Duckweeds) Chloroplast Genomes from Total DNA

    Science.gov (United States)

    Wang, Wenqin; Messing, Joachim

    2011-01-01

    Background Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. Methods We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. Conclusions This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power. PMID:21931804

  2. High-throughput sequencing of three Lemnoideae (duckweeds chloroplast genomes from total DNA.

    Directory of Open Access Journals (Sweden)

    Wenqin Wang

    Full Text Available BACKGROUND: Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. METHODS: We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. CONCLUSIONS: This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power.

  3. DNA sequence heterogeneity of Campylobacter jejuni CJIE4 prophages and expression of prophage genes.

    Directory of Open Access Journals (Sweden)

    Clifford G Clark

    Full Text Available Campylobacter jejuni carry temperate bacteriophages that can affect the biology or virulence of the host bacterium. Known effects include genomic rearrangements and resistance to DNA transformation. C. jejuni prophage CJIE1 shows sequence variability and variability in the content of morons. Homologs of the CJIE1 prophage enhance both adherence and invasion to cells in culture and increase the expression of a specific subset of bacterial genes. Other C. jejuni temperate phages have so far not been well characterized. In this study we describe investigations into the DNA sequence variability and protein expression in a second prophage, CJIE4. CJIE4 sequences were obtained de novo from DNA sequencing of five C. jejuni isolates, as well as from whole genome sequences submitted to GenBank by other research groups. These CJIE4 DNA sequences were heterogenous, with several different insertions/deletions (indels in different parts of the prophage genome. Two variants of a 3-4 kb region inserted within CJIE4 had different gene content that distinguished two major conserved CJIE4 prophage families. Additional indels were detected throughout the prophage. Detection of proteins in the five isolates characterized in our laboratory in isobaric Tags for Relative and Absolute Quantitation (iTRAQ experiments indicated that prophage proteins within each of the two large indel variants were expressed during growth of the bacteria on Mueller Hinton agar plates. These proteins included the extracellular DNase associated with resistance to DNA transformation and prophage repressor proteins. Other proteins associated with known or suspected roles in prophage biology were also expressed from CJIE4, including capsid protein, the phage integrase, and MazF, a type II toxin-antitoxin system protein. Together with the results previously obtained for the CJIE1 prophage these results demonstrate that sequence variability and expression of moron genes are both general

  4. The exceptional genomic word symmetry along DNA sequences

    OpenAIRE

    Afreixo, Vera; Rodrigues, Jo?o M. O. S.; Carlos A. C. Bastos; Silva, Raquel M.

    2016-01-01

    Background The second Chargaff?s parity rule and its extensions are recognized as universal phenomena in DNA sequences. However, parity of the frequencies of reverse complementary oligonucleotides could be a mere consequence of the single nucleotide parity rule, if nucleotide independence is assumed. Exceptional symmetry (symmetry beyond that expected under an independent nucleotide assumption) was proposed previously as a meaningful measure of the extension of the second parity rule to oligo...

  5. A two-locus DNA sequence database for typing plant and human pathogens within the Fusarium oxysporum species complex

    DEFF Research Database (Denmark)

    O'Donnell, Kerry; Gueidan, C; Sink, S

    2009-01-01

    We constructed a two-locus database, comprising partial translation elongation factor (EF-1alpha) gene sequences and nearly full-length sequences of the nuclear ribosomal intergenic spacer region (IGS rDNA) for 850 isolates spanning the phylogenetic breadth of the Fusarium oxysporum species compl...

  6. Changes in DNA methylation patterns and repetitive sequences in blood lymphocytes of aged horses.

    Science.gov (United States)

    Wnuk, Maciej; Lewinska, Anna; Gurgul, Artur; Zabek, Tomasz; Potocki, Leszek; Oklejewicz, Bernadetta; Bugno-Poniewierska, Monika; Wegrzyn, Magdalena; Slota, Ewa

    2014-02-01

    It is known that aged organisms have modified epigenomes. Epigenetic modifications, such as changes in global and locus-specific DNA methylation, and histone modifications are suspected to play an important role in cancer development and aging. In the present study, with the well-established horse aging model, we showed the global loss of DNA methylation in blood lymphocytes during juvenile-to-aged period. Additionally, we tested a pattern of DNA methylation of ribosomal DNA and selected genes such as IGF2 and found no significant changes during development and aging. We asked if genetic components such as polymorphisms within DNA methyltransferase genes, DNMT1, DNMT3a, and DNMT3b, may contribute to observed changes in global DNA methylation status. The analysis of seven intragenic polymorphisms did not reveal any significant association with changes in global DNA methylation. Telomere shortage and a loss of pericentromeric heterochromatin during juvenile-to-aged period were also observed. Transcriptional rDNA activity, assessed as the number and size of nucleolar organizer regions, reflecting physiological state of the cell, and mitotic index were decreased with increasing horse donor age. Moreover, changes during juvenile-to-aged period and adult-to-aged period were compared and discussed. Taken together, changes in global DNA methylation status originating in development and affecting the stability of repetitive sequences may be associated with previously reported genomic instability during horse aging.

  7. Comparison of DNA Quantification Methods for Next Generation Sequencing.

    Science.gov (United States)

    Robin, Jérôme D; Ludlow, Andrew T; LaRanger, Ryan; Wright, Woodring E; Shay, Jerry W

    2016-04-06

    Next Generation Sequencing (NGS) is a powerful tool that depends on loading a precise amount of DNA onto a flowcell. NGS strategies have expanded our ability to investigate genomic phenomena by referencing mutations in cancer and diseases through large-scale genotyping, developing methods to map rare chromatin interactions (4C; 5C and Hi-C) and identifying chromatin features associated with regulatory elements (ChIP-seq, Bis-Seq, ChiA-PET). While many methods are available for DNA library quantification, there is no unambiguous gold standard. Most techniques use PCR to amplify DNA libraries to obtain sufficient quantities for optical density measurement. However, increased PCR cycles can distort the library's heterogeneity and prevent the detection of rare variants. In this analysis, we compared new digital PCR technologies (droplet digital PCR; ddPCR, ddPCR-Tail) with standard methods for the titration of NGS libraries. DdPCR-Tail is comparable to qPCR and fluorometry (QuBit) and allows sensitive quantification by analysis of barcode repartition after sequencing of multiplexed samples. This study provides a direct comparison between quantification methods throughout a complete sequencing experiment and provides the impetus to use ddPCR-based quantification for improvement of NGS quality.

  8. Genetic characterization of Phytophthora nicotianae by the analysis of polymorphic regions of the mitochondrial DNA.

    Science.gov (United States)

    A new method based on the analysis of mitochondrial intergenic regions characterized by intraspecific variation in DNA sequences was developed and applied to the study of the plant pathogen Phytophthora nicotianae. Two regions flanked by genes trny and rns and trnw and cox2 were identified by compa...

  9. The complementarity-determining region sequences in IgY antivenom hypervariable regions

    Directory of Open Access Journals (Sweden)

    David Gitirana da Rocha

    2017-08-01

    Full Text Available The data presented in this article are related to the research article entitled "Development of IgY antibodies against anti-snake toxins endowed with highly lethal neutralizing activity" (da Rocha et al., 2017 [1]. Complementarity-determining region (CDR sequences are variable antibody (Ab sequences that respond with specificity, duration and strength to identify and bind to antigen (Ag epitopes. B lymphocytes isolated from hens immunized with Bitis arietans (Ba and anti-Crotalus durissus terrificus (Cdt venoms and expressing high specificity, affinity and toxicity neutralizing antibody titers were used as DNA sources. The VLF1, CDR1, CDR2, VLR1 and CDR3 sequences were validated by BLASTp, and values corresponding to IgY VL and VH anti-Ba or anti-Cdt venoms were identified, registered [Gallus gallus IgY Fv Light chain (GU815099/Gallus gallus IgY Fv Heavy chain (GU815098] and used for molecular modeling of IgY scFv anti-Ba. The resulting CDR1, CDR2 and CDR3 sequences were combined to construct the three - dimensional structure of the Ab paratope.

  10. DNA sequence and analysis of human chromosome 9

    OpenAIRE

    Humphray, S. J.; Oliver, K.; Hunt, A. R.; Plumb, R. W.; Loveland, J. E.; Howe, K. L.; Andrews, T. D.; Searle, S.; Hunt, S. E.; Scott, C. E.; Jones, M. C.; Ainscough, R.; Almeida, J. P.; Ambrose, K. D.; Ashwell, R. I. S.

    2004-01-01

    Chromosome 9 is highly structurally polymorphic. It contains the largest autosomal block of heterochromatin, which is heteromorphic in 6–8% of humans, whereas pericentric inversions occur in more than 1% of the population. The finished euchromatic sequence of chromosome 9 comprises 109,044,351 base pairs and represents >99.6% of the region. Analysis of the sequence reveals many intra- and interchromosomal duplications, including segmental duplications adjacent to both the centromere and the l...

  11. Bisulfite sequencing of chromatin immunoprecipitated DNA (BisChIP-seq) directly informs methylation status of histone-modified DNA

    NARCIS (Netherlands)

    Statham, A.L.; Robinson, M.D.; Song, J.Z.; Coolen, M.W.; Stirzaker, C.; Clark, S. J.

    2012-01-01

    The complex relationship between DNA methylation, chromatin modification, and underlying DNA sequence is often difficult to unravel with existing technologies. Here, we describe a novel technique based on high-throughput sequencing of bisulfite-treated chromatin immunoprecipitated DNA (BisChIP-seq),

  12. Mixed sequence reader: a program for analyzing DNA sequences with heterozygous base calling.

    Science.gov (United States)

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3.

  13. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Directory of Open Access Journals (Sweden)

    Chun-Tien Chang

    2012-01-01

    Full Text Available The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs, insertion-deletions (indels, short tandem repeats (STRs, and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR, which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS; (iii determine human papilloma virus (HPV genotypes by searching current viral databases in cases of double infections; (iv estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4 and its paralog HSPDP3.

  14. A sequence-dependent rigid-base model of DNA.

    Science.gov (United States)

    Gonzalez, O; Petkevičiūtė, D; Maddocks, J H

    2013-02-07

    A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can

  15. Ribosomal DNA sequence analysis shows that the basidiomycete C30 belongs to the genus Trametes.

    Science.gov (United States)

    Klonowska, Agnieszka; Gaudin, Christian; Ruzzi, Maurizio; Colao, Maria Chiara; Tron, Thierry

    2003-01-01

    The basidiomycete C30 was considered as an isolate of a population of Marasmius quercophilus collected on evergreen oak litter from the Mediterranean forest. Recent phenotypic studies have clearly shown that it differs from newly characterized M. quercophilus isolates. Subsequent analysis of laccase genes revealed that C30 sequences are similar to laccase encoding sequences from organisms belonging to the polyporoid clade. Comparison of sequences of the C30 ITS regions, including 5.8S rDNA, with those found in databanks confirmed that C30 is not a Marasmius. Finally, 25S rDNA analysis revealed that C30 is closely related to the Coriolaceae and, in particular, to Trametes trogii.

  16. Realistic artificial DNA sequences as negative controls for computational genomics

    Science.gov (United States)

    Caballero, Juan; Smit, Arian F. A.; Hood, Leroy; Glusman, Gustavo

    2014-01-01

    A common practice in computational genomic analysis is to use a set of ‘background’ sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such ‘background’ sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by ‘shuffling’ real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/. PMID:24803667

  17. Realistic artificial DNA sequences as negative controls for computational genomics.

    Science.gov (United States)

    Caballero, Juan; Smit, Arian F A; Hood, Leroy; Glusman, Gustavo

    2014-07-01

    A common practice in computational genomic analysis is to use a set of 'background' sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such 'background' sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by 'shuffling' real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences.

    Science.gov (United States)

    Lippold, Sebastian; Xu, Hongyang; Ko, Albert; Li, Mingkun; Renaud, Gabriel; Butthof, Anne; Schröder, Roland; Stoneking, Mark

    2014-01-01

    Comparisons of maternally-inherited mitochondrial DNA (mtDNA) and paternally-inherited non-recombining Y chromosome (NRY) variation have provided important insights into the impact of sex-biased processes (such as migration, residence pattern, and so on) on human genetic variation. However, such comparisons have been limited by the different molecular methods typically used to assay mtDNA and NRY variation (for example, sequencing hypervariable segments of the control region for mtDNA vs. genotyping SNPs and/or STR loci for the NRY). Here, we report a simple capture array method to enrich Illumina sequencing libraries for approximately 500 kb of NRY sequence, which we use to generate NRY sequences from 623 males from 51 populations in the CEPH Human Genome Diversity Panel (HGDP). We also obtained complete mtDNA genome sequences from the same individuals, allowing us to compare maternal and paternal histories free of any ascertainment bias. We identified 2,228 SNPs in the NRY sequences and 2,163 SNPs in the mtDNA sequences. Our results confirm the controversial assertion that genetic differences between human populations on a global scale are bigger for the NRY than for mtDNA, although the differences are not as large as previously suggested. More importantly, we find substantial regional variation in patterns of mtDNA versus NRY variation. Model-based simulations indicate very small ancestral effective population sizes (<100) for the out-of-Africa migration as well as for many human populations. We also find that the ratio of female effective population size to male effective population size (Nf/Nm) has been greater than one throughout the history of modern humans, and has recently increased due to faster growth in Nf than Nm. The NRY and mtDNA sequences provide new insights into the paternal and maternal histories of human populations, and the methods we introduce here should be widely applicable for further such studies.

  19. Profiling the genome-wide DNA methylation pattern of porcine ovaries using reduced representation bisulfite sequencing.

    Science.gov (United States)

    Yuan, Xiao-Long; Gao, Ning; Xing, Yan; Zhang, Hai-Bin; Zhang, Ai-Ling; Liu, Jing; He, Jin-Long; Xu, Yuan; Lin, Wen-Mian; Chen, Zan-Mou; Zhang, Hao; Zhang, Zhe; Li, Jia-Qi

    2016-02-25

    Substantial evidence has shown that DNA methylation regulates the initiation of ovarian and sexual maturation. Here, we investigated the genome-wide profile of DNA methylation in porcine ovaries at single-base resolution using reduced representation bisulfite sequencing. The biological variation was minimal among the three ovarian replicates. We found hypermethylation frequently occurred in regions with low gene abundance, while hypomethylation in regions with high gene abundance. The DNA methylation around transcriptional start sites was negatively correlated with their own CpG content. Additionally, the methylation level in the bodies of genes was higher than that in their 5' and 3' flanking regions. The DNA methylation pattern of the low CpG content promoter genes differed obviously from that of the high CpG content promoter genes. The DNA methylation level of the porcine ovary was higher than that of the porcine intestine. Analyses of the genome-wide DNA methylation in porcine ovaries would advance the knowledge and understanding of the porcine ovarian methylome.

  20. [Polymorphism of hypervariable region in D-loop of mitochondrial DNA].

    Science.gov (United States)

    Takada, Y; Mukaida, M

    1999-06-01

    DNA sequences of PCR products from mitochondrial DNA (mtDNA) of 80 healthy Japanese volunteers (40 pairs, mother and child) were determined by the direct sequencing method for polymorphism. Thirty (15 pairs) of 80 samples analyzed showed a T-to-C transition at position 16189 (T16189C) of the C-stretch region in the hyper-variable region of mtDNA. For seven pairs randomly selected from the 15 T16189C pairs (C-stretch) and a single pair without the transition (non C-stretch), PCR products from the D-loop region were cloned and then sequenced. The repeat number of C in the C-stretch region was found to show heteroplasmy by sequencing multiples clones from each mtDNA. Statistical analyses of the distribution patterns of the repeat number revealed no significant differences between the mother and child in each lineage but significant differences between the lineages. The seven lineages could be then classified into four groups. The result of our data confirmed the existence of heteroplasmic polymorphism in the C-stretch region and the inheritance of the heteroplasmy from mother to child. Therefore, the analysis of heteroplasmy is applicable to individual identification.

  1. [Examination of processed vegetable foods for the presence of common DNA sequences of genetically modified tomatoes].

    Science.gov (United States)

    Kitagawa, Mamiko; Nakamura, Kosuke; Kondo, Kazunari; Ubukata, Shoji; Akiyama, Hiroshi

    2014-01-01

    The contamination of processed vegetable foods with genetically modified tomatoes was investigated by the use of qualitative PCR methods to detect the cauliflower mosaic virus 35S promoter (P35S) and the kanamycin resistance gene (NPTII). DNA fragments of P35S and NPTII were detected in vegetable juice samples, possibly due to contamination with the genomes of cauliflower mosaic virus infecting juice ingredients of Brassica species and soil bacteria, respectively. Therefore, to detect the transformation construct sequences of GM tomatoes, primer pairs were designed for qualitative PCR to specifically detect the border region between P35S and NPTII, and the border region between nopaline synthase gene promoter and NPTII. No amplification of the targeted sequences was observed using genomic DNA purified from the juice ingredients. The developed qualitative PCR method is considered to be a reliable tool to check contamination of products with GM tomatoes.

  2. DNA Sequence Determinants Controlling Affinity, Stability and Shape of DNA Complexes Bound by the Nucleoid Protein Fis.

    Science.gov (United States)

    Hancock, Stephen P; Stella, Stefano; Cascio, Duilio; Johnson, Reid C

    2016-01-01

    The abundant Fis nucleoid protein selectively binds poorly related DNA sequences with high affinities to regulate diverse DNA reactions. Fis binds DNA primarily through DNA backbone contacts and selects target sites by reading conformational properties of DNA sequences, most prominently intrinsic minor groove widths. High-affinity binding requires Fis-stabilized DNA conformational changes that vary depending on DNA sequence. In order to better understand the molecular basis for high affinity site recognition, we analyzed the effects of DNA sequence within and flanking the core Fis binding site on binding affinity and DNA structure. X-ray crystal structures of Fis-DNA complexes containing variable sequences in the noncontacted center of the binding site or variations within the major groove interfaces show that the DNA can adapt to the Fis dimer surface asymmetrically. We show that the presence and position of pyrimidine-purine base steps within the major groove interfaces affect both local DNA bending and minor groove compression to modulate affinities and lifetimes of Fis-DNA complexes. Sequences flanking the core binding site also modulate complex affinities, lifetimes, and the degree of local and global Fis-induced DNA bending. In particular, a G immediately upstream of the 15 bp core sequence inhibits binding and bending, and A-tracts within the flanking base pairs increase both complex lifetimes and global DNA curvatures. Taken together, our observations support a revised DNA motif specifying high-affinity Fis binding and highlight the range of conformations that Fis-bound DNA can adopt. The affinities and DNA conformations of individual Fis-DNA complexes are likely to be tailored to their context-specific biological functions.

  3. rMotifGen: random motif generator for DNA and protein sequences

    Directory of Open Access Journals (Sweden)

    Hardin C Timothy

    2007-08-01

    Full Text Available Abstract Background Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM. Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms. Results Two implementations of rMotifGen have been created, one providing a graphical user interface (GUI for random motif construction, and the other serving as a command line interface. The second implementation has the added advantages of platform independence and being able to be called in a batch mode. rMotifGen was used to construct sample sets of sequences containing DNA motifs and amino acid motifs that were then tested against the Gibbs sampler and MEME packages. Conclusion rMotifGen provides an efficient and convenient method for creating random DNA or amino acid sequences with a variable number of motifs, where the instance of each motif can be incorporated using a position-specific scoring matrix (PSSM or by creating an instance mutated from its corresponding consensus using an evolutionary model based on substitution matrices. rMotifGen is freely available at: http://bioinformatics.louisville.edu/brg/rMotifGen/.

  4. Sequences from first settlers reveal rapid evolution in Icelandic mtDNA pool.

    Directory of Open Access Journals (Sweden)

    Agnar Helgason

    2009-01-01

    Full Text Available A major task in human genetics is to understand the nature of the evolutionary processes that have shaped the gene pools of contemporary populations. Ancient DNA studies have great potential to shed light on the evolution of populations because they provide the opportunity to sample from the same population at different points in time. Here, we show that a sample of mitochondrial DNA (mtDNA control region sequences from 68 early medieval Icelandic skeletal remains is more closely related to sequences from contemporary inhabitants of Scotland, Ireland, and Scandinavia than to those from the modern Icelandic population. Due to a faster rate of genetic drift in the Icelandic mtDNA pool during the last 1,100 years, the sequences carried by the first settlers were better preserved in their ancestral gene pools than among their descendants in Iceland. These results demonstrate the inferential power gained in ancient DNA studies through the application of population genetics analyses to relatively large samples.

  5. Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures.

    Science.gov (United States)

    Taly, Jean-Francois; Magis, Cedrik; Bussotti, Giovanni; Chang, Jia-Ming; Di Tommaso, Paolo; Erb, Ionas; Espinosa-Carrasco, Jose; Kemena, Carsten; Notredame, Cedric

    2011-11-01

    T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biological sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homology) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homology extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.

  6. Satellite DNA Sequences in Canidae and Their Chromosome Distribution in Dog and Red Fox.

    Science.gov (United States)

    Vozdova, Miluse; Kubickova, Svatava; Cernohorska, Halina; Fröhlich, Jan; Rubes, Jiri

    2016-01-01

    Satellite DNA is a characteristic component of mammalian centromeric heterochromatin, and a comparative analysis of its evolutionary dynamics can be used for phylogenetic studies. We analysed satellite and satellite-like DNA sequences available in NCBI for 4 species of the family Canidae (red fox, Vulpes vulpes, VVU; domestic dog, Canis familiaris, CFA; arctic fox, Vulpes lagopus, VLA; raccoon dog, Nyctereutes procyonoides procyonoides, NPR) by comparative sequence analysis, which revealed 86-90% intraspecies and 76-79% interspecies similarity. Comparative fluorescence in situ hybridisation in the red fox and dog showed signals of the red fox satellite probe in canine and vulpine autosomal centromeres, on VVUY, B chromosomes, and in the distal parts of VVU9q and VVU10p which were shown to contain nucleolus organiser regions. The CFA satellite probe stained autosomal centromeres only in the dog. The CFA satellite-like DNA did not show any significant sequence similarity with the satellite DNA of any species analysed and was localised to the centromeres of 9 canine chromosome pairs. No significant heterochromatin block was detected on the B chromosomes of the red fox. Our results show extensive heterogeneity of satellite sequences among Canidae and prove close evolutionary relationships between the red and arctic fox. © 2017 S. Karger AG, Basel.

  7. Frequency of Epstein-Barr virus DNA sequences in human gliomas

    Directory of Open Access Journals (Sweden)

    Renata Fragelli Fonseca

    Full Text Available CONTEXT AND OBJECTIVE: The Epstein-Barr virus (EBV is the most common cause of infectious mononucleosis and is also associated with several human tumors, including Burkitt's lymphoma, Hodgkin's lymphoma, some cases of gastric carcinoma and nasopharyngeal carcinoma, among other neoplasms. The aim of this study was to screen 75 primary gliomas for the presence of specific EBV DNA sequences by means of the polymerase chain reaction (PCR, with confirmation by direct sequencing. DESIGN AND SETTING: Prevalence study on EBV molecular genetics at a molecular pathology laboratory in a university hospital and at an applied genetics laboratory in a national institution. METHODS: A total of 75 primary glioma biopsies and 6 others from other tumors from the central nervous system were obtained. The tissues were immediately frozen for subsequent DNA extraction by means of traditional methods using proteinase K digestion and extraction with a phenol-chloroform-isoamyl alcohol mixture. DNA was precipitated with ethanol, resuspended in buffer and stored. The PCRs were carried out using primers for amplification of the EBV BamM region. Positive and negative controls were added to each reaction. The PCR products were used for direct sequencing for confirmation. RESULTS: The viral sequences were positive in 11/75 (14.7% of our samples. CONCLUSION: The prevalence of EBV DNA was 11/75 (14.7% in our glioma collection. Further molecular and epidemiological studies are needed to establish the possible role played by EBV in the tumorigenesis of gliomas.

  8. Characterization of North American Armillaria species: Genetic relationships determined by ribosomal DNA sequences and AFLP markers

    Science.gov (United States)

    M. -S. Kim; N. B. Klopfenstein; J. W. Hanna; G. I. McDonald

    2006-01-01

    Phylogenetic and genetic relationships among 10 North American Armillaria species were analysed using sequence data from ribosomal DNA (rDNA), including intergenic spacer (IGS-1), internal transcribed spacers with associated 5.8S (ITS + 5.8S), and nuclear large subunit rDNA (nLSU), and amplified fragment length polymorphism (AFLP) markers. Based on rDNA sequence data,...

  9. [Study on molecular phylogeny of Schistosoma bovis based on mitochondrial DNA sequence and gene order].

    Science.gov (United States)

    Xiao, Jing-ying; Cai, Lian-shun; Mitsuru, Nagataki; Shinji, Tokuhiro; Jarilla Blanca, R; Masaaki, Shimada; Blair, David; Takeshi, Agatsuma

    2010-08-01

    To determine the nucleotide sequence of the partial mitochondrial (mt) genome and the order of the mitochondrial protein-coding genes for Schistosoma bovis for analysis of possible phylogenetic position of this species in the genus Schistosoma. The genomic DNA of adult worms were extracted by the GNT-K method. The target regions were amplified by PCR using a degenerated primer and specific primer. The PCR products were purified before ligating into the pGEM1 T-vector system. Recombinant plasmids were amplified in Escherichia coli, extracted and purified using routine methods. The nucleotide sequences were determined with an ABI PRISM 3100-Avant DNA sequencer using a BigDye Terminators v3.1 Cycle Sequencing Kit (Applied Bio-systems, CA, U.S.A.) with two T-vector specific primers (T7 and SP6). Positive colonies were sequenced with two internal specific primers to obtain the full sequence of each fragment on both strands by means of primer walking. Sequences of related schistosomes were retrieved from GenBank and aligned with our data. Gene trees were constructed using neighbor joining methods. The nucleotide sequence was determined and the gene order of this region in S. bovis was found as follows: NADHdehydrogenase4 (nad4)-trnQ (Gln)-trnK(Lys)-NADH dehydrogenase 3(nad3)-trnD (Asp)-NADH dehydrogenase 1(nad1). The gene order covering such region of S. bovis was similar to that of the African Schistosoma species, but strikingly different from the Asian species. Phylogenetic trees inferred from the alignment including partial nad4, nad3, partial nad1 and partial nad4+nad3+nad1 sequence for other 8 Schistosoma spp., respectively, revealed that S. bovis is placed proximally to S. haematobium in the African sub-group, which is identical with those placed by gene order in the African clade. The mtDNA analysis based on mitochondrial DNA sequence and the gene order strongly support the hypothesis that S. bovis belongs to the African schistosome clade rather than the Asian

  10. The identification of FANCD2 DNA binding domains reveals nuclear localization sequences.

    Science.gov (United States)

    Niraj, Joshi; Caron, Marie-Christine; Drapeau, Karine; Bérubé, Stéphanie; Guitton-Sert, Laure; Coulombe, Yan; Couturier, Anthony M; Masson, Jean-Yves

    2017-08-21

    Fanconi anemia (FA) is a recessive genetic disorder characterized by congenital abnormalities, progressive bone-marrow failure, and cancer susceptibility. The FA pathway consists of at least 21 FANC genes (FANCA-FANCV), and the encoded protein products interact in a common cellular pathway to gain resistance against DNA interstrand crosslinks. After DNA damage, FANCD2 is monoubiquitinated and accumulates on chromatin. FANCD2 plays a central role in the FA pathway, using yet unidentified DNA binding regions. By using synthetic peptide mapping and DNA binding screen by electromobility shift assays, we found that FANCD2 bears two major DNA binding domains predominantly consisting of evolutionary conserved lysine residues. Furthermore, one domain at the N-terminus of FANCD2 bears also nuclear localization sequences for the protein. Mutations in the bifunctional DNA binding/NLS domain lead to a reduction in FANCD2 monoubiquitination and increase in mitomycin C sensitivity. Such phenotypes are not fully rescued by fusion with an heterologous NLS, which enable separation of DNA binding and nuclear import functions within this domain that are necessary for FANCD2 functions. Collectively, our results enlighten the importance of DNA binding and NLS residues in FANCD2 to activate an efficient FA pathway. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Cloning and sequence analysis of a partial cDNA for chicken cartilage proteoglycan core protein.

    Science.gov (United States)

    Sai, S; Tanaka, T; Kosher, R A; Tanzer, M L

    1986-01-01

    A chicken embryo sternal cartilage cDNA library, created in the plasmid expression vector pUC9, was screened for sequences coding for immunologically detectable core protein of the large, major proteoglycan of cartilage. A 1229-base-pair cDNA clone was isolated that contained only one extended open reading frame, which had sequences coding for a polypeptide of 379 amino acid residues. These deduced sequences corresponded to those anticipated from current models of proteoglycan structure; a deduced sequence encompassing 21 amino acids was almost identical to a known sequence of bovine nasal cartilage proteoglycan. Significant homology was found between the deduced amino acid sequence of the proteoglycan and two regions of a chicken hepatic lectin. Immunoprecipitation of the products of cell-free translation yielded a component of about 340 kDa, and transfer blot hybridization of sternal cartilage RNA showed a single mRNA of about 8.1 kilobases. Hybridizable mRNA sequences were readily detectable by dot-blot analyses of the cytoplasm of cartilaginous tissues of the chicken embryo, whereas similar analyses of prechondrogenic limb mesenchymal cells did not demonstrate such hybridizable mRNA signals. Images PMID:3460082

  12. DNA Targeting Sequence Improves Magnetic Nanoparticle-Based Plasmid DNA Transfection Efficiency in Model Neurons.

    Science.gov (United States)

    Vernon, Matthew M; Dean, David A; Dobson, Jon

    2015-08-17

    Efficient non-viral plasmid DNA transfection of most stem cells, progenitor cells and primary cell lines currently presents an obstacle for many applications within gene therapy research. From a standpoint of efficiency and cell viability, magnetic nanoparticle-based DNA transfection is a promising gene vectoring technique because it has demonstrated rapid and improved transfection outcomes when compared to alternative non-viral methods. Recently, our research group introduced oscillating magnet arrays that resulted in further improvements to this novel plasmid DNA (pDNA) vectoring technology. Continued improvements to nanomagnetic transfection techniques have focused primarily on magnetic nanoparticle (MNP) functionalization and transfection parameter optimization: cell confluence, growth media, serum starvation, magnet oscillation parameters, etc. Noting that none of these parameters can assist in the nuclear translocation of delivered pDNA following MNP-pDNA complex dissociation in the cell's cytoplasm, inclusion of a cassette feature for pDNA nuclear translocation is theoretically justified. In this study incorporation of a DNA targeting sequence (DTS) feature in the transfecting plasmid improved transfection efficiency in model neurons, presumably from increased nuclear translocation. This observation became most apparent when comparing the response of the dividing SH-SY5Y precursor cell to the non-dividing and differentiated SH-SY5Y neuroblastoma cells.

  13. Using Synthetic Nanopores for Single-Molecule Analyses: Detecting SNPs, Trapping DNA Molecules, and the Prospects for Sequencing DNA

    Science.gov (United States)

    Dimitrov, Valentin V.

    2009-01-01

    This work focuses on studying properties of DNA molecules and DNA-protein interactions using synthetic nanopores, and it examines the prospects of sequencing DNA using synthetic nanopores. We have developed a method for discriminating between alleles that uses a synthetic nanopore to measure the binding of a restriction enzyme to DNA. There exists…

  14. Generating Exome Enriched Sequencing Libraries from Formalin-Fixed, Paraffin-Embedded Tissue DNA for Next Generation Sequencing

    Science.gov (United States)

    Marosy, Beth A.; Craig, Brian D.; Hetrick, Kurt N.; Witmer, P. Dane; Ling, Hua; Griffith, Sean M.; Myers, Ben; Ostrander, Elaine A.; Stanford, Janet L.; Brody, Lawrence C.; Doheny, Kimberly F.

    2016-01-01

    This unit describes a protocol for generating exome enriched sequencing libraries using DNA extracted from Formalin Fixed Paraffin Embedded (FFPE) samples. Utilizing commercially available kits, we present a low input FFPE workflow starting with 50ng of DNA. This procedure includes a repair step to address damage caused by FFPE preservation that improves sequence quality. Subsequently, libraries undergo an in-solution targeted selection for exons, followed by sequencing using the Illumina next generation short read sequencing platform. PMID:28075488

  15. Random-breakage mapping method applied to human DNA sequences

    Science.gov (United States)

    Lobrich, M.; Rydberg, B.; Cooper, P. K.; Chatterjee, A. (Principal Investigator)

    1996-01-01

    The random-breakage mapping method [Game et al. (1990) Nucleic Acids Res., 18, 4453-4461] was applied to DNA sequences in human fibroblasts. The methodology involves NotI restriction endonuclease digestion of DNA from irradiated calls, followed by pulsed-field gel electrophoresis, Southern blotting and hybridization with DNA probes recognizing the single copy sequences of interest. The Southern blots show a band for the unbroken restriction fragments and a smear below this band due to radiation induced random breaks. This smear pattern contains two discontinuities in intensity at positions that correspond to the distance of the hybridization site to each end of the restriction fragment. By analyzing the positions of those discontinuities we confirmed the previously mapped position of the probe DXS1327 within a NotI fragment on the X chromosome, thus demonstrating the validity of the technique. We were also able to position the probes D21S1 and D21S15 with respect to the ends of their corresponding NotI fragments on chromosome 21. A third chromosome 21 probe, D21S11, has previously been reported to be close to D21S1, although an uncertainty about a second possible location existed. Since both probes D21S1 and D21S11 hybridized to a single NotI fragment and yielded a similar smear pattern, this uncertainty is removed by the random-breakage mapping method.

  16. Isolation and analysis of high quality nuclear DNA with reduced organellar DNA for plant genome sequencing and resequencing.

    Science.gov (United States)

    Lutz, Kerry A; Wang, Wenqin; Zdepski, Anna; Michael, Todd P

    2011-05-20

    High throughput sequencing (HTS) technologies have revolutionized the field of genomics by drastically reducing the cost of sequencing, making it feasible for individual labs to sequence or resequence plant genomes. Obtaining high quality, high molecular weight DNA from plants poses significant challenges due to the high copy number of chloroplast and mitochondrial DNA, as well as high levels of phenolic compounds and polysaccharides. Multiple methods have been used to isolate DNA from plants; the CTAB method is commonly used to isolate total cellular DNA from plants that contain nuclear DNA, as well as chloroplast and mitochondrial DNA. Alternatively, DNA can be isolated from nuclei to minimize chloroplast and mitochondrial DNA contamination. We describe optimized protocols for isolation of nuclear DNA from eight different plant species encompassing both monocot and eudicot species. These protocols use nuclei isolation to minimize chloroplast and mitochondrial DNA contamination. We also developed a protocol to determine the number of chloroplast and mitochondrial DNA copies relative to the nuclear DNA using quantitative real time PCR (qPCR). We compared DNA isolated from nuclei to total cellular DNA isolated with the CTAB method. As expected, DNA isolated from nuclei consistently yielded nuclear DNA with fewer chloroplast and mitochondrial DNA copies, as compared to the total cellular DNA prepared with the CTAB method. This protocol will allow for analysis of the quality and quantity of nuclear DNA before starting a plant whole genome sequencing or resequencing experiment. Extracting high quality, high molecular weight nuclear DNA in plants has the potential to be a bottleneck in the era of whole genome sequencing and resequencing. The methods that are described here provide a framework for researchers to extract and quantify nuclear DNA in multiple types of plants.

  17. Isolation and analysis of high quality nuclear DNA with reduced organellar DNA for plant genome sequencing and resequencing

    Directory of Open Access Journals (Sweden)

    Zdepski Anna

    2011-05-01

    Full Text Available Abstract Background High throughput sequencing (HTS technologies have revolutionized the field of genomics by drastically reducing the cost of sequencing, making it feasible for individual labs to sequence or resequence plant genomes. Obtaining high quality, high molecular weight DNA from plants poses significant challenges due to the high copy number of chloroplast and mitochondrial DNA, as well as high levels of phenolic compounds and polysaccharides. Multiple methods have been used to isolate DNA from plants; the CTAB method is commonly used to isolate total cellular DNA from plants that contain nuclear DNA, as well as chloroplast and mitochondrial DNA. Alternatively, DNA can be isolated from nuclei to minimize chloroplast and mitochondrial DNA contamination. Results We describe optimized protocols for isolation of nuclear DNA from eight different plant species encompassing both monocot and eudicot species. These protocols use nuclei isolation to minimize chloroplast and mitochondrial DNA contamination. We also developed a protocol to determine the number of chloroplast and mitochondrial DNA copies relative to the nuclear DNA using quantitative real time PCR (qPCR. We compared DNA isolated from nuclei to total cellular DNA isolated with the CTAB method. As expected, DNA isolated from nuclei consistently yielded nuclear DNA with fewer chloroplast and mitochondrial DNA copies, as compared to the total cellular DNA prepared with the CTAB method. This protocol will allow for analysis of the quality and quantity of nuclear DNA before starting a plant whole genome sequencing or resequencing experiment. Conclusions Extracting high quality, high molecular weight nuclear DNA in plants has the potential to be a bottleneck in the era of whole genome sequencing and resequencing. The methods that are described here provide a framework for researchers to extract and quantify nuclear DNA in multiple types of plants.

  18. Isolation and analysis of high quality nuclear DNA with reduced organellar DNA for plant genome sequencing and resequencing

    Science.gov (United States)

    2011-01-01

    Background High throughput sequencing (HTS) technologies have revolutionized the field of genomics by drastically reducing the cost of sequencing, making it feasible for individual labs to sequence or resequence plant genomes. Obtaining high quality, high molecular weight DNA from plants poses significant challenges due to the high copy number of chloroplast and mitochondrial DNA, as well as high levels of phenolic compounds and polysaccharides. Multiple methods have been used to isolate DNA from plants; the CTAB method is commonly used to isolate total cellular DNA from plants that contain nuclear DNA, as well as chloroplast and mitochondrial DNA. Alternatively, DNA can be isolated from nuclei to minimize chloroplast and mitochondrial DNA contamination. Results We describe optimized protocols for isolation of nuclear DNA from eight different plant species encompassing both monocot and eudicot species. These protocols use nuclei isolation to minimize chloroplast and mitochondrial DNA contamination. We also developed a protocol to determine the number of chloroplast and mitochondrial DNA copies relative to the nuclear DNA using quantitative real time PCR (qPCR). We compared DNA isolated from nuclei to total cellular DNA isolated with the CTAB method. As expected, DNA isolated from nuclei consistently yielded nuclear DNA with fewer chloroplast and mitochondrial DNA copies, as compared to the total cellular DNA prepared with the CTAB method. This protocol will allow for analysis of the quality and quantity of nuclear DNA before starting a plant whole genome sequencing or resequencing experiment. Conclusions Extracting high quality, high molecular weight nuclear DNA in plants has the potential to be a bottleneck in the era of whole genome sequencing and resequencing. The methods that are described here provide a framework for researchers to extract and quantify nuclear DNA in multiple types of plants. PMID:21599914

  19. High-Resolution Melting (HRM) of Hypervariable Mitochondrial DNA Regions for Forensic Science.

    Science.gov (United States)

    Dos Santos Rocha, Alípio; de Amorim, Isis Salviano Soares; Simão, Tatiana de Almeida; da Fonseca, Adenilson de Souza; Garrido, Rodrigo Grazinoli; Mencalha, Andre Luiz

    2017-08-23

    Forensic strategies commonly are proceeding by analysis of short tandem repeats (STRs); however, new additional strategies have been proposed for forensic science. Thus, this article standardized the high-resolution melting (HRM) of DNA for forensic analyzes. For HRM, mitochondrial DNA (mtDNA) from eight individuals were extracted from mucosa swabs by DNAzol reagent, samples were amplified by PCR and submitted to HRM analysis to identify differences in hypervariable (HV) regions I and II. To confirm HRM, all PCR products were DNA sequencing. The data suggest that is possible discriminate DNA from different samples by HRM curves. Also, uncommon dual-dissociation was identified in a single PCR product, increasing HRM analyzes by evaluation of melting peaks. Thus, HRM is accurate and useful to screening small differences in HVI and HVII regions from mtDNA and increase the efficiency of laboratory routines based on forensic genetics. © 2017 American Academy of Forensic Sciences.

  20. Genetic variability of Taenia saginata inferred from mitochondrial DNA sequences.

    Science.gov (United States)

    Rostami, Sima; Salavati, Reza; Beech, Robin N; Babaei, Zahra; Sharbatkhori, Mitra; Harandi, Majid Fasihi

    2015-04-01

    Taenia saginata is an important tapeworm, infecting humans in many parts of the world. The present study was undertaken to identify inter- and intraspecific variation of T. saginata isolated from cattle in different parts of Iran using two mitochondrial CO1 and 12S rRNA genes. Up to 105 bovine specimens of T. saginata were collected from 20 slaughterhouses in three provinces of Iran. DNA were extracted from the metacestode Cysticercus bovis. After PCR amplification, sequencing of CO1 and 12S rRNA genes were carried out and two phylogenetic analyses of the sequence data were generated by Bayesian inference on CO1 and 12S rRNA sequences. Sequence analyses of CO1 and 12S rRNA genes showed 11 and 29 representative profiles respectively. The level of pairwise nucleotide variation between individual haplotypes of CO1 gene was 0.3-2.4% while the overall nucleotide variation among all 11 haplotypes was 4.6%. For 12S rRNA sequence data, level of pairwise nucleotide variation was 0.2-2.5% and the overall nucleotide variation was determined as 5.8% among 29 haplotypes of 12S rRNA gene. Considerable genetic diversity was found in both mitochondrial genes particularly in 12S rRNA gene.

  1. Hardware Accelerator for the Multifractal Analysis of DNA Sequences.

    Science.gov (United States)

    Duarte-Sanchez, Jorge E; Velasco-Medina, Jaime; Moreno, Pedro A

    2017-07-24

    The multifractal analysis has allowed to quantify the genetic variability and non-linear stability along the human genome sequence. It has some implications in explaining several genetic diseases given by some chromosome abnormalities, among other genetic particularities. The multifractal analysis of a genome is carried out by dividing the complete DNA sequence in smaller fragments and calculating the generalized dimension spectrum of each fragment using the chaos game representation and the box-counting method. This is a time consuming process because it involves the processing of large data sets using floating-point representation. In order to reduce the computation time, we designed an application-specific processor, here called multifractal processor, which is based on our proposed hardware-oriented algorithm for calculating efficiently the generalized dimension spectrum of DNA sequences. The multifractal processor was implemented on a low-cost SoC-FPGA and was verified by processing a complete human genome. The execution time and numeric results of the Multifractal processor were compared with the results obtained from the software implementation executed in a 20-core workstation, achieving a speed up of 2.6x and an average error of 0.0003%.

  2. DNA Sequencing as a Tool to Monitor Marine Ecological Status

    Directory of Open Access Journals (Sweden)

    Kelly D. Goodwin

    2017-05-01

    Full Text Available Many ocean policies mandate integrated, ecosystem-based approaches to marine monitoring, driving a global need for efficient, low-cost bioindicators of marine ecological quality. Most traditional methods to assess biological quality rely on specialized expertise to provide visual identification of a limited set of specific taxonomic groups, a time-consuming process that can provide a narrow view of ecological status. In addition, microbial assemblages drive food webs but are not amenable to visual inspection and thus are largely excluded from detailed inventory. Molecular-based assessments of biodiversity and ecosystem function offer advantages over traditional methods and are increasingly being generated for a suite of taxa using a “microbes to mammals” or “barcodes to biomes” approach. Progress in these efforts coupled with continued improvements in high-throughput sequencing and bioinformatics pave the way for sequence data to be employed in formal integrated ecosystem evaluation, including food web assessments, as called for in the European Union Marine Strategy Framework Directive. DNA sequencing of bioindicators, both traditional (e.g., benthic macroinvertebrates, ichthyoplankton and emerging (e.g., microbial assemblages, fish via eDNA, promises to improve assessment of marine biological quality by increasing the breadth, depth, and throughput of information and by reducing costs and reliance on specialized taxonomic expertise.

  3. Discovering motifs in ranked lists of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Eran Eden

    2007-03-01

    Full Text Available Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP-chip (chromatin immuno-precipitation on a microarray measurements. Several major challenges in sequence motif discovery still require consideration: (i the need for a principled approach to partitioning the data into target and background sets; (ii the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii the need for an appropriate framework for accounting for motif multiplicity; (iv the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs, which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP-chip and CpG methylation data and obtained the following results. (i Identification of 50 novel putative transcription factor (TF binding sites in yeast ChIP-chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked

  4. A MapReduce Framework for DNA Sequencing Data Processing

    Directory of Open Access Journals (Sweden)

    Samy Ghoneimy

    2016-12-01

    Full Text Available Genomics and Next Generation Sequencers (NGS like Illumina Hiseq produce data in the order of ‎‎200 billion base pairs in a single one-week run for a 60x human genome coverage, which ‎requires modern high-throughput experimental technologies that can ‎only be tackled with high performance computing (HPC and specialized software algorithms called ‎‎“short read aligners”. This paper focuses on the implementation of the DNA sequencing as a set of MapReduce programs that will accept a DNA data set as a FASTQ file and finally generate a VCF (variant call format file, which has variants for a given DNA data set. In this paper MapReduce/Hadoop along with Burrows-Wheeler Aligner (BWA, Sequence Alignment/Map (SAM ‎tools, are fully utilized to provide various utilities for manipulating alignments, including sorting, merging, indexing, ‎and generating alignments. The Map-Sort-Reduce process is designed to be suited for a Hadoop framework in ‎which each cluster is a traditional N-node Hadoop cluster to utilize all of the Hadoop features like HDFS, program ‎management and fault tolerance. The Map step performs multiple instances of the short read alignment algorithm ‎‎(BoWTie that run in parallel in Hadoop. The ordered list of the sequence reads are used as input tuples and the ‎output tuples are the alignments of the short reads. In the Reduce step many parallel instances of the Short ‎Oligonucleotide Analysis Package for SNP (SOAPsnp algorithm run in the cluster. Input tuples are sorted ‎alignments for a partition and the output tuples are SNP calls. Results are stored via HDFS, and then archived in ‎SOAPsnp format. ‎ The proposed framework enables extremely fast discovering somatic mutations, inferring population genetical ‎parameters, and performing association tests directly based on sequencing data without explicit genotyping or ‎linkage-based imputation. It also demonstrate that this method achieves comparable

  5. Legume genomics: understanding biology through DNA and RNA sequencing

    Science.gov (United States)

    O'Rourke, Jamie A.; Bolon, Yung-Tsi; Bucciarelli, Bruna; Vance, Carroll P.

    2014-01-01

    Background The legume family (Leguminosae) consists of approx. 17 000 species. A few of these species, including, but not limited to, Phaseolus vulgaris, Cicer arietinum and Cajanus cajan, are important dietary components, providing protein for approx. 300 million people worldwide. Additional species, including soybean (Glycine max) and alfalfa (Medicago sativa), are important crops utilized mainly in animal feed. In addition, legumes are important contributors to biological nitrogen, forming symbiotic relationships with rhizobia to fix atmospheric N2 and providing up to 30 % of available nitrogen for the next season of crops. The application of high-throughput genomic technologies including genome sequencing projects, genome re-sequencing (DNA-seq) and transcriptome sequencing (RNA-seq) by the legume research community has provided major insights into genome evolution, genomic architecture and domestication. Scope and Conclusions This review presents an overview of the current state of legume genomics and explores the role that next-generation sequencing technologies play in advancing legume genomics. The adoption of next-generation sequencing and implementation of associated bioinformatic tools has allowed researchers to turn each species of interest into their own model organism. To illustrate the power of next-generation sequencing, an in-depth overview of the transcriptomes of both soybean and white lupin (Lupinus albus) is provided. The soybean transcriptome focuses on analysing seed development in two near-isogenic lines, examining the role of transporters, oil biosynthesis and nitrogen utilization. The white lupin transcriptome analysis examines how phosphate deficiency alters gene expression patterns, inducing the formation of cluster roots. Such studies illustrate the power of next-generation sequencing and bioinformatic analyses in elucidating the gene networks underlying biological processes. PMID:24769535

  6. Peptide Synthesis on a Next-Generation DNA Sequencing Platform.

    Science.gov (United States)

    Svensen, Nina; Peersen, Olve B; Jaffrey, Samie R

    2016-09-02

    Methods for displaying large numbers of peptides on solid surfaces are essential for high-throughput characterization of peptide function and binding properties. Here we describe a method for converting the >10(7) flow cell-bound clusters of identical DNA strands generated by the Illumina DNA sequencing technology into clusters of complementary RNA, and subsequently peptide clusters. We modified the flow-cell-bound primers with ribonucleotides thus enabling them to be used by poliovirus polymerase 3D(pol) . The primers hybridize to the clustered DNA thus leading to RNA clusters. The RNAs fold into functional protein- or small molecule-binding aptamers. We used the mRNA-display approach to synthesize flow-cell-tethered peptides from these RNA clusters. The peptides showed selective binding to cognate antibodies. The methods described here provide an approach for using DNA clusters to template peptide synthesis on an Illumina flow cell, thus providing new opportunities for massively parallel peptide-based assays. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Exome sequencing generates high quality data in non-target regions

    Directory of Open Access Journals (Sweden)

    Guo Yan

    2012-05-01

    Full Text Available Abstract Background Exome sequencing using next-generation sequencing technologies is a cost efficient approach to selectively sequencing coding regions of human genome for detection of disease variants. A significant amount of DNA fragments from the capture process fall outside target regions, and sequence data for positions outside target regions have been mostly ignored after alignment. Result We performed whole exome sequencing on 22 subjects using Agilent SureSelect capture reagent and 6 subjects using Illumina TrueSeq capture reagent. We also downloaded sequencing data for 6 subjects from the 1000 Genomes Project Pilot 3 study. Using these data, we examined the quality of SNPs detected outside target regions by computing consistency rate with genotypes obtained from SNP chips or the Hapmap database, transition-transversion (Ti/Tv ratio, and percentage of SNPs inside dbSNP. For all three platforms, we obtained high-quality SNPs outside target regions, and some far from target regions. In our Agilent SureSelect data, we obtained 84,049 high-quality SNPs outside target regions compared to 65,231 SNPs inside target regions (a 129% increase. For our Illumina TrueSeq data, we obtained 222,171 high-quality SNPs outside target regions compared to 95,818 SNPs inside target regions (a 232% increase. For the data from the 1000 Genomes Project, we obtained 7,139 high-quality SNPs outside target regions compared to 1,548 SNPs inside target regions (a 461% increase. Conclusions These results demonstrate that a significant amount of high quality genotypes outside target regions can be obtained from exome sequencing data. These data should not be ignored in genetic epidemiology studies.

  8. A Simulation of DNA Sequencing Utilizing 3M Post-It[R] Notes

    Science.gov (United States)

    Christensen, Doug

    2009-01-01

    An inexpensive and equipment free approach to teaching the technical aspects of DNA sequencing. The activity described requires an instructor with a familiarity of DNA sequencing technology but provides a straight forward method of teaching the technical aspects of sequencing in the absence of expensive sequencing equipment. The final sequence…

  9. Entropy and long-range correlations in DNA sequences.

    Science.gov (United States)

    Melnik, S S; Usatenko, O V

    2014-12-01

    We analyze the structure of DNA molecules of different organisms by using the additive Markov chain approach. Transforming nucleotide sequences into binary strings, we perform statistical analysis of the corresponding "texts". We develop the theory of N-step additive binary stationary ergodic Markov chains and analyze their differential entropy. Supposing that the correlations are weak we express the conditional probability function of the chain by means of the pair correlation function and represent the entropy as a functional of the pair correlator. Since the model uses two point correlators instead of probability of block occurring, it makes possible to calculate the entropy of subsequences at much longer distances than with the use of the standard methods. We utilize the obtained analytical result for numerical evaluation of the entropy of coarse-grained DNA texts. We believe that the entropy study can be used for biological classification of living species. Copyright © 2014. Published by Elsevier Ltd.

  10. A DNA sequence obtained by replacement of the dopamine RNA aptamer bases is not an aptamer

    DEFF Research Database (Denmark)

    Álvarez-Martos, Isabel; Ferapontova, Elena

    2017-01-01

    of dopamine is a 57 nucleotides long RNA sequence reported in 1997 (Biochemistry, 1997, 36, 9726). Later, it was suggested that the DNA homologue of the RNA aptamer retains the specificity of dopamine binding (Biochem. Biophys. Res. Commun., 2009, 388, 732). Here, we show that the DNA sequence obtained...... by the replacement of the RNA aptamer bases for their DNA analogues is not able of specific biorecognition of dopamine, in contrast to the original RNA aptamer sequence. This DNA sequence binds dopamine and structurally related catecholamine neurotransmitters non-specifically, as any DNA sequence, and, thus...

  11. Sequence-dependent structural changes in a self-assembling DNA oligonucleotide.

    Science.gov (United States)

    Saoji, Maithili; Paukstelis, Paul J

    2015-12-01

    DNA has proved to be a remarkable molecule for the construction of sophisticated two-dimensional and three-dimensional architectures because of its programmability and structural predictability provided by complementary Watson-Crick base pairing. DNA oligonucleotides can, however, exhibit a great deal of local structural diversity. DNA conformation is strongly linked to both environmental conditions and the nucleobase identities inherent in the oligonucleotide sequence, but the exact relationship between sequence and local structure is not completely understood. This study examines how a single-nucleotide addition to a class of self-assembling DNA 13-mers leads to a significantly different overall structure under identical crystallization conditions. The DNA 13-mers self-assemble in the presence of Mg(2+) through a combination of Watson-Crick and noncanonical base-pairing interactions. The crystal structures described here show that all of the predicted Watson-Crick base pairs are present, with the major difference being a significant rearrangement of noncanonical base pairs. This includes the formation of a sheared A-G base pair, a junction of strands formed from base-triple interactions, and tertiary interactions that generate structural features similar to tandem sheared G-A base pairs. The adoption of this alternate noncanonical structure is dependent in part on the sequence in the Watson-Crick duplex region. These results provide important new insights into the sequence-structure relationship of short DNA oligonucleotides and demonstrate a unique interplay between Watson-Crick and noncanonical base pairs that is responsible for crystallization fate.

  12. Automated methods for single-stranded DNA isolation and dideoxynucleotide DNA sequencing reactions on a robotic workstation.

    Science.gov (United States)

    Mardis, E R; Roe, B A

    1989-09-01

    Automated procedures have been developed for both the simultaneous isolation of 96 single-stranded M13 chimeric template DNAs in less than two hours, and for simultaneously pipetting 24 dideoxynucleotide sequencing reactions on a commercially available laboratory workstation. The DNA sequencing results obtained by either radiolabeled or fluorescent methods are consistent with the premise that automation of these portions of DNA sequencing projects will improve the reproducibility of the DNA isolation and the procedures for these normally labor-intensive steps provides an approach for rapid acquisition of large amounts of high quality, reproducible DNA sequence data.

  13. DNA interaction with platinum-based cytostatics revealed by DNA sequencing.

    Science.gov (United States)

    Smerkova, Kristyna; Vaculovic, Tomas; Vaculovicova, Marketa; Kynicky, Jindrich; Brtnicky, Martin; Eckschlager, Tomas; Stiborova, Marie; Hubalek, Jaromir; Adam, Vojtech

    2017-12-15

    The main mechanism of action of platinum-based cytostatic drugs - cisplatin, oxaliplatin and carboplatin - is the formation of DNA cross-links, which restricts the transcription due to the disability of DNA to enter the active site of the polymerase. The polymerase chain reaction (PCR) was employed as a simplified model of the amplification process in the cell nucleus. PCR with fluorescently labelled dideoxynucleotides commonly employed for DNA sequencing was used to monitor the effect of platinum-based cytostatics on DNA in terms of decrease in labeling efficiency dependent on a presence of the DNA-drug cross-link. It was found that significantly different amounts of the drugs - cisplatin (0.21 μg/mL), oxaliplatin (5.23 μg/mL), and carboplatin (71.11 μg/mL) - were required to cause the same quenching effect (50%) on the fluorescent labelling of 50 μg/mL of DNA. Moreover, it was found that even though the amounts of the drugs was applied to the reaction mixture differing by several orders of magnitude, the amount of incorporated platinum, quantified by inductively coupled plasma mass spectrometry, was in all cases at the level of tenths of μg per 5 μg of DNA. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Mechanically unzipping dsDNA with built-in sequence inhomogeneities and bound proteins.

    Science.gov (United States)

    Lu, Ping; Pegg, Ian L; Sarkar, Abhijit

    2013-02-01

    We theoretically analyze the force signal expected during unzipping through a single double-stranded DNA (dsDNA) molecule with designed subsequences or inhomogeneities with large stability differentials compared to the rest of the molecule. Our calculations describe experiments where the extension between the unzipped ends is fixed--the so-called fixed-extension ensemble--and the equilibrium force is measured. Two different types of force traces are obtained depending on the inhomogeneity length and strength. For short inhomogeneities, a "sawtooth" force trace is obtained, with a force jump corresponding to release of the entire inhomogeneity at once, as observed in unzipping of natural-sequence DNA (Bockelmann et al., Biophy. J. 82, 1537 (2002)). For longer inhomogeneities, traces with force plateaus are obtained, corresponding to a gradual unpeeling of the strongly bound region. We find that inhomogeneities are disrupted in sequence giving rise to a succession of force spikes superposed on the baseline unzipping force of 15pN. The height of the force pulses diminishes as regions further down the molecule are unzipped, and asymptotically the force response approaches that of DNA without large stability-enhanced islands. Our model also allows us to describe the transition between intact and disrupted binding zones by thermally activated kinetics. We then analyze the related situation where multiple proteins are bound at specific points on the DNA with or without cooperative interactions between proteins, and where the removal of each protein is required for unzipping to proceed further along the DNA. The proteins bind stably to dsDNA and also to single-stranded DNA (ssDNA) but with a lower binding enthalpy. The force jumps correspond to the extra mechanical work that has to be done to overcome the large protein binding enthalpy to either dsDNA or ssDNA. Each force jump leads to the dissociation of a corresponding protein, but we do not find simultaneous release of

  15. Next-generation DNA barcoding: using next-generation sequencing to enhance and accelerate DNA barcode capture from single specimens.

    Science.gov (United States)

    Shokralla, Shadi; Gibson, Joel F; Nikbakht, Hamid; Janzen, Daniel H; Hallwachs, Winnie; Hajibabaei, Mehrdad

    2014-09-01

    DNA barcoding is an efficient method to identify specimens and to detect undescribed/cryptic species. Sanger sequencing of individual specimens is the standard approach in generating large-scale DNA barcode libraries and identifying unknowns. However, the Sanger sequencing technology is, in some respects, inferior to next-generation sequencers, which are capable of producing millions of sequence reads simultaneously. Additionally, direct Sanger sequencing of DNA barcode amplicons, as practiced in most DNA barcoding procedures, is hampered by the need for relatively high-target amplicon yield, coamplification of nuclear mitochondrial pseudogenes, confusion with sequences from intracellular endosymbiotic bacteria (e.g. Wolbachia) and instances of intraindividual variability (i.e. heteroplasmy). Any of these situations can lead to failed Sanger sequencing attempts or ambiguity of the generated DNA barcodes. Here, we demonstrate the potential application of next-generation sequencing platforms for parallel acquisition of DNA barcode sequences from hundreds of specimens simultaneously. To facilitate retrieval of sequences obtained from individual specimens, we tag individual specimens during PCR amplification using unique 10-mer oligonucleotides attached to DNA barcoding PCR primers. We employ 454 pyrosequencing to recover full-length DNA barcodes of 190 specimens using 12.5% capacity of a 454 sequencing run (i.e. two lanes of a 16 lane run). We obtained an average of 143 sequence reads for each individual specimen. The sequences produced are full-length DNA barcodes for all but one of the included specimens. In a subset of samples, we also detected Wolbachia, nontarget species, and heteroplasmic sequences. Next-generation sequencing is of great value because of its protocol simplicity, greatly reduced cost per barcode read, faster throughout and added information content. © 2014 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.

  16. Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries.

    Science.gov (United States)

    Carpenter, Meredith L; Buenrostro, Jason D; Valdiosera, Cristina; Schroeder, Hannes; Allentoft, Morten E; Sikora, Martin; Rasmussen, Morten; Gravel, Simon; Guillén, Sonia; Nekhrizov, Georgi; Leshtakov, Krasimir; Dimitrova, Diana; Theodossiev, Nikola; Pettener, Davide; Luiselli, Donata; Sandoval, Karla; Moreno-Estrada, Andrés; Li, Yingrui; Wang, Jun; Gilbert, M Thomas P; Willerslev, Eske; Greenleaf, William J; Bustamante, Carlos D

    2013-11-07

    Most ancient specimens contain very low levels of endogenous DNA, precluding the shotgun sequencing of many interesting samples because of cost. Ancient DNA (aDNA) libraries often contain DNA, with the majority of sequencing capacity taken up by environmental DNA. Here we present a capture-based method for enriching the endogenous component of aDNA sequencing libraries. By using biotinylated RNA baits transcribed from genomic DNA libraries, we are able to capture DNA fragments from across the human genome. We demonstrate this method on libraries created from four Iron Age and Bronze Age human teeth from Bulgaria, as well as bone samples from seven Peruvian mummies and a Bronze Age hair sample from Denmark. Prior to capture, shotgun sequencing of these libraries yielded an average of 1.2% of reads mapping to the human genome (including duplicates). After capture, this fraction increased substantially, with up to 59% of reads mapped to human and enrichment ranging from 6- to 159-fold. Furthermore, we maintained coverage of the majority of regions sequenced in the precapture library. Intersection with the 1000 Genomes Project reference panel yielded an average of 50,723 SNPs (range 3,062-147,243) for the postcapture libraries sequenced with 1 million reads, compared with 13,280 SNPs (range 217-73,266) for the precapture libraries, increasing resolution in population genetic analyses. Our whole-genome capture approach makes it less costly to sequence aDNA from specimens containing very low levels of endogenous DNA, enabling the analysis of larger numbers of samples. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  17. PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context.

    Science.gov (United States)

    Zhou, Jiyun; Xu, Ruifeng; He, Yulan; Lu, Qin; Wang, Hongpeng; Kong, Bing

    2016-06-10

    Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community.

  18. Clinical characterization and mitochondrial DNA sequence variations in Leber hereditary optic neuropathy

    Science.gov (United States)

    Kumar, Manoj; Kaur, Punit; Kumar, Manoj; Saxena, Rohit; Sharma, Pradeep

    2012-01-01

    Purpose Leber hereditary optic neuropathy (LHON), a maternally inherited disorder, results from point mutations in mitochondrial DNA (mtDNA). MtDNA is highly polymorphic in nature with very high mutation rate, 10–17 fold higher as compared to nuclear genome. Identification of new mtDNA sequence variations is necessary to establish a clean link with human disease. Thus this study was aimed to assess or evaluate LHON patients for novel mtDNA sequence variations. Materials and Methods Twenty LHON patients were selected from the neuro-ophthalmology clinic of the All India Institute of Medical Sciences, New Delhi, India. DNA was isolated from whole blood samples. The entire coding region of the mitochondrial genome was amplified by PCR in 20 patients and 20 controls. For structural analysis (molecular modeling and simulation) the MODELER 9.2 program in Discovery Studio (DS 2.0) was used. Results MtDNA sequencing revealed a total of 47 nucleotide variations in the 20 LHON patients and 29 variations in 20 controls. Of 47 changes in patients 21.2% (10/47) were nonsynonymous and the remaining 78.72% (37/47) were synonymous. Five nonsynonymous changes, including primary LHON mutations (NADH dehydrogenase subunit 1 [ND1]:p.A52T, NADH dehydrogenase subunit 6 [ND6]:p.M64V, adenosine triphosphate [ATP] synthase subunit a (F-ATPase protein 6) [ATPase6]:p.M181T, NADH dehydrogenase subunit 4 [ND4]:p.R340H, and cytochrome B [CYB]:p.F181L), were found to be pathogenic. A greater number of changes were present in complex I (53.19%; 25/47), followed by complex III (19.14%; 9/47), then complex IV (19.14%; 9/47), then complex V (8.5%; 4/47). Nonsynonymous variations may impair respiratory chain and oxidative phosphorylation (OXPHOS) pathways, which results in low ATP production and elevated reactive oxygen species (ROS) levels. Oxidative stress is the underlying etiology in various diseases and also plays a crucial role in LHON. Conclusions This study describes the role of mtDNA

  19. Association of genetic variations in the mitochondrial DNA control region with presbycusis.

    Science.gov (United States)

    Falah, Masoumeh; Farhadi, Mohammad; Kamrava, Seyed Kamran; Mahmoudian, Saeid; Daneshi, Ahmad; Balali, Maryam; Asghari, Alimohamad; Houshmand, Massoud

    2017-01-01

    The prominent role of mitochondria in the generation of reactive oxygen species, cell death, and energy production contributes to the importance of this organelle in the intracellular mechanism underlying the progression of the common sensory disorder of the elderly, presbycusis. Reduced mitochondrial DNA (mtDNA) gene expression and coding region variation have frequently been reported as being associated with the development of presbycusis. The mtDNA control region regulates gene expression and replication of the genome of this organelle. To comprehensively understand of the role of mitochondria in the progression of presbycusis, we compared variations in the mtDNA control region between subjects with presbycusis and controls. A total of 58 presbycusis patients and 220 control subjects were enrolled in the study after examination by the otolaryngologist and audiology tests. Variations in the mtDNA control region were investigated by polymerase chain reaction and Sanger sequencing. A total of 113 sequence variants were observed in mtDNA, and variants were detected in 100% of patients, with 84% located in hypervariable regions. The frequencies of the variants, 16,223 C>T, 16,311 T>C, 16,249 T>C, and 15,954 A>C, were significantly different between presbycusis and control subjects. The statistically significant difference in the frequencies of four nucleotide variants in the mtDNA control region of presbycusis patients and controls is in agreement with previous experimental evidence and supports the role of mitochondria in the intracellular mechanism underlying presbycusis development. Moreover, these variants have potential as diagnostic markers for individuals at a high risk of developing presbycusis. The data also suggest the possible presence of changes in the mtDNA control region in presbycusis, which could alter regulatory factor binding sites and influence mtDNA gene expression and copy number.

  20. Maternal Plasma DNA and RNA Sequencing for Prenatal Testing.

    Science.gov (United States)

    Tamminga, Saskia; van Maarle, Merel; Henneman, Lidewij; Oudejans, Cees B M; Cornel, Martina C; Sistermans, Erik A

    2016-01-01

    Cell-free DNA (cfDNA) testing has recently become indispensable in diagnostic testing and screening. In the prenatal setting, this type of testing is often called noninvasive prenatal testing (NIPT). With a number of techniques, using either next-generation sequencing or single nucleotide polymorphism-based approaches, fetal cfDNA in maternal plasma can be analyzed to screen for rhesus D genotype, common chromosomal aneuploidies, and increasingly for testing other conditions, including monogenic disorders. With regard to screening for common aneuploidies, challenges arise when implementing NIPT in current prenatal settings. Depending on the method used (targeted or nontargeted), chromosomal anomalies other than trisomy 21, 18, or 13 can be detected, either of fetal or maternal origin, also referred to as unsolicited or incidental findings. For various biological reasons, there is a small chance of having either a false-positive or false-negative NIPT result, or no result, also referred to as a "no-call." Both pre- and posttest counseling for NIPT should include discussing potential discrepancies. Since NIPT remains a screening test, a positive NIPT result should be confirmed by invasive diagnostic testing (either by chorionic villus biopsy or by amniocentesis). As the scope of NIPT is widening, professional guidelines need to discuss the ethics of what to offer and how to offer. In this review, we discuss the current biochemical, clinical, and ethical challenges of cfDNA testing in the prenatal setting and its future perspectives including novel applications that target RNA instead of DNA. © 2016 Elsevier Inc. All rights reserved.

  1. Wiggle-predicting functionally flexible regions from primary sequence.

    Directory of Open Access Journals (Sweden)

    Jenny Gu

    2006-07-01

    Full Text Available The Wiggle series are support vector machine-based predictors that identify regions of functional flexibility using only protein sequence information. Functionally flexible regions are defined as regions that can adopt different conformational states and are assumed to be necessary for bioactivity. Many advances have been made in understanding the relationship between protein sequence and structure. This work contributes to those efforts by making strides to understand the relationship between protein sequence and flexibility. A coarse-grained protein dynamic modeling approach was used to generate the dataset required for support vector machine training. We define our regions of interest based on the participation of residues in correlated large-scale fluctuations. Even with this structure-based approach to computationally define regions of functional flexibility, predictors successfully extract sequence-flexibility relationships that have been experimentally confirmed to be functionally important. Thus, a sequence-based tool to identify flexible regions important for protein function has been created. The ability to identify functional flexibility using a sequence based approach complements structure-based definitions and will be especially useful for the large majority of proteins with unknown structures. The methodology offers promise to identify structural genomics targets amenable to crystallization and the possibility to engineer more flexible or rigid regions within proteins to modify their bioactivity.

  2. Distinguishing forest and savanna African elephants using short nuclear DNA sequences.

    Science.gov (United States)

    Ishida, Yasuko; Demeke, Yirmed; van Coeverden de Groot, Peter J; Georgiadis, Nicholas J; Leggett, Keith E A; Fox, Virginia E; Roca, Alfred L

    2011-01-01

    A more complete description of African elephant phylogeography would require a method that distinguishes forest and savanna elephants using DNA from low-quality samples. Although mitochondrial DNA is often the marker of choice for species identification, the unusual cytonuclear patterns in African elephants make nuclear markers more reliable. We therefore designed and utilized genetic markers for short nuclear DNA regions that contain fixed nucleotide differences between forest and savanna elephants. We used M13 forward and reverse sequences to increase the total length of PCR amplicons and to improve the quality of sequences for the target DNA. We successfully sequenced fragments of nuclear genes from dung samples of known savanna and forest elephants in the Democratic Republic of Congo, Ethiopia, and Namibia. Elephants at previously unexamined locations were found to have nucleotide character states consistent with their status as savanna or forest elephants. Using these and results from previous studies, we estimated that the short-amplicon nuclear markers could distinguish forest from savanna African elephants with more than 99% accuracy. Nuclear genotyping of museum, dung, or ivory samples will provide better-informed conservation management of Africa's elephants.

  3. Isolation and sequence analysis of the wheat B genome subtelomeric DNA

    Directory of Open Access Journals (Sweden)

    Huneau Cecile

    2009-09-01

    Full Text Available Abstract Background Telomeric and subtelomeric regions are essential for genome stability and regular chromosome replication. In this work, we have characterized the wheat BAC (bacterial artificial chromosome clones containing Spelt1 and Spelt52 sequences, which belong to the subtelomeric repeats of the B/G genomes of wheats and Aegilops species from the section Sitopsis. Results The BAC library from Triticum aestivum cv. Renan was screened using Spelt1 and Spelt52 as probes. Nine positive clones were isolated; of them, clone 2050O8 was localized mainly to the distal parts of wheat chromosomes by in situ hybridization. The distribution of the other clones indicated the presence of different types of repetitive sequences in BACs. Use of different approaches allowed us to prove that seven of the nine isolated clones belonged to the subtelomeric chromosomal regions. Clone 2050O8 was sequenced and its sequence of 119 737 bp was annotated. It is composed of 33% transposable elements (TEs, 8.2% Spelt52 (namely, the subfamily Spelt52.2 and five non-TE-related genes. DNA transposons are predominant, making up 24.6% of the entire BAC clone, whereas retroelements account for 8.4% of the clone length. The full-length CACTA transposon Caspar covers 11 666 bp, encoding a transposase and CTG-2 proteins, and this transposon accounts for 40% of the DNA transposons. The in situ hybridization data for 2050O8 derived subclones in combination with the BLAST search against wheat mapped ESTs (expressed sequence tags suggest that clone 2050O8 is located in the terminal bin 4BL-10 (0.95-1.0. Additionally, four of the predicted 2050O8 genes showed significant homology to four putative orthologous rice genes in the distal part of rice chromosome 3S and confirm the synteny to wheat 4BL. Conclusion Satellite DNA sequences from the subtelomeric regions of diploid wheat progenitor can be used for selecting the BAC clones from the corresponding regions of hexaploid wheat

  4. Length and sequence heterogeneity in 5S rDNA of Populus deltoides.

    Science.gov (United States)

    Negi, Madan S; Rajagopal, Jyothi; Chauhan, Neeti; Cronn, Richard; Lakshmikumaran, Malathi

    2002-12-01

    The 5S rRNA genes and their associated non-transcribed spacer (NTS) regions are present as repeat units arranged in tandem arrays in plant genomes. Length heterogeneity in 5S rDNA repeats was previously identified in Populus deltoides and was also observed in the present study. Primers were designed to amplify the 5S rDNA NTS variants from the P. deltoides genome. The PCR-amplified products from the two accessions of P. deltoides (G3 and G48) suggested the presence of length heterogeneity of 5S rDNA units within and among accessions, and the size of the spacers ranged from 385 to 434 bp. Sequence analysis of the non-transcribed spacer (NTS) revealed two distinct classes of 5S rDNA within both accessions: class 1, which contained GAA trinucleotide microsatellite repeats, and class 2, which lacked the repeats. The class 1 spacer shows length variation owing to the microsatellite, with two clones exhibiting 10 GAA repeat units and one clone exhibiting 16 such repeat units. However, distance analysis shows that class 1 spacer sequences are highly similar inter se, yielding nucleotide diversity (pi) estimates that are less than 0.15% of those obtained for class 2 spacers (pi = 0.0183 vs. 0.1433, respectively). The presence of microsatellite in the NTS region leading to variation in spacer length is reported and discussed for the first time in P. deltoides.

  5. Recent progress in atomistic simulation of electrical current DNA sequencing.

    Science.gov (United States)

    Kim, Han Seul; Kim, Yong-Hoon

    2015-07-15

    We review recent advances in the DNA sequencing method based on measurements of transverse electrical currents. Device configurations proposed in the literature are classified according to whether the molecular fingerprints appear as the major (Mode I) or perturbing (Mode II) current signals. Scanning tunneling microscope and tunneling electrode gap configurations belong to the former category, while the nanochannels with or without an embedded nanopore belong to the latter. The molecular sensing mechanisms of Modes I and II roughly correspond to the electron tunneling and electrochemical gating, respectively. Special emphasis will be given on the computer simulation studies, which have been playing a critical role in the initiation and development of the field. We also highlight low-dimensional nanomaterials such as carbon nanotubes, graphene, and graphene nanoribbons that allow the novel Mode II approach. Finally, several issues in previous computational studies are discussed, which points to future research directions toward more reliable simulation of electrical current DNA sequencing devices. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. Eucaryotic DNA primase does not prefer to synthesize primers at pyrimidine rich DNA sequences when nucleoside triphosphates are present at concentrations found in whole cells.

    Science.gov (United States)

    Kirk, B W; Harrington, C; Perrino, F W; Kuchta, R D

    1997-06-03

    The critical role of NTP concentration in determining where calf thymus DNA primase synthesizes a primer on a DNA template was examined. Varying the concentration of NTPs dramatically altered the template sequences at which primase synthesized primers. At the low NTP concentrations typically used for in vitro experiments (100 microM), primase greatly preferred to synthesize primers at pyrimidine rich DNA sequences. However, when the concentrations of NTPs were increased to levels typically found in whole cells, primers were now synthesized in all regions of the template. Importantly, synthesis of primers in all regions of the DNA template, not just the pyrimidine rich sequences, is the pattern of primer synthesis observed during DNA replication in whole cells. With low concentrations of NTPs (i.e., Vmax/K(M) conditions), primers are only synthesized at the most preferred synthesis sites, namely, those that are pyrimidine rich. In contrast, under conditions of high NTP concentrations, primer synthesis will occur at the first potential synthesis site to which primase binds. Now, the primase x DNA complex will be immediately converted to a primase x DNA x NTP x NTP complex that is poised for primer synthesis.

  7. Combined hybridization capture and shotgun sequencing for ancient DNA analysis of extinct wild and domestic dromedary camel.

    Science.gov (United States)

    Mohandesan, Elmira; Speller, Camilla F; Peters, Joris; Uerpmann, Hans-Peter; Uerpmann, Margarethe; De Cupere, Bea; Hofreiter, Michael; Burger, Pamela A

    2017-03-01

    The performance of hybridization capture combined with next-generation sequencing (NGS) has seen limited investigation with samples from hot and arid regions until now. We applied hybridization capture and shotgun sequencing to recover DNA sequences from bone specimens of ancient-domestic dromedary (Camelus dromedarius) and its extinct ancestor, the wild dromedary from Jordan, Syria, Turkey and the Arabian Peninsula, respectively. Our results show that hybridization capture increased the percentage of mitochondrial DNA (mtDNA) recovery by an average 187-fold and in some cases yielded virtually complete mitochondrial (mt) genomes at multifold coverage in a single capture experiment. Furthermore, we tested the effect of hybridization temperature and time by using a touchdown approach on a limited number of samples. We observed no significant difference in the number of unique dromedary mtDNA reads retrieved with the standard capture compared to the touchdown method. In total, we obtained 14 partial mitochondrial genomes from ancient-domestic dromedaries with 17-95% length coverage and 1.27-47.1-fold read depths for the covered regions. Using whole-genome shotgun sequencing, we successfully recovered endogenous dromedary nuclear DNA (nuDNA) from domestic and wild dromedary specimens with 1-1.06-fold read depths for covered regions. Our results highlight that despite recent methodological advances, obtaining ancient DNA (aDNA) from specimens recovered from hot, arid environments is still problematic. Hybridization protocols require specific optimization, and samples at the limit of DNA preservation need multiple replications of DNA extraction and hybridization capture as has been shown previously for Middle Pleistocene specimens. © 2016 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.

  8. The Dunaliella salina organelle genomes: large sequences, inflated with intronic and intergenic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Smith, David R.; Lee, Robert W.; Cushman, John C.; Magnuson, Jon K.; Tran, Duc; Polle, Juergen E.

    2010-05-07

    Abstract Background: Dunaliella salina Teodoresco, a unicellular, halophilic green alga belonging to the Chlorophyceae, is among the most industrially important microalgae. This is because D. salina can produce massive amounts of β-carotene, which can be collected for commercial purposes, and because of its potential as a feedstock for biofuels production. Although the biochemistry and physiology of D. salina have been studied in great detail, virtually nothing is known about the genomes it carries, especially those within its mitochondrion and plastid. This study presents the complete mitochondrial and plastid genome sequences of D. salina and compares them with those of the model green algae Chlamydomonas reinhardtii and Volvox carteri. Results: The D. salina organelle genomes are large, circular-mapping molecules with ~60% noncoding DNA, placing them among the most inflated organelle DNAs sampled from the Chlorophyta. In fact, the D. salina plastid genome, at 269 kb, is the largest complete plastid DNA (ptDNA) sequence currently deposited in GenBank, and both the mitochondrial and plastid genomes have unprecedentedly high intron densities for organelle DNA: ~1.5 and ~0.4 introns per gene, respectively. Moreover, what appear to be the relics of genes, introns, and intronic open reading frames are found scattered throughout the intergenic ptDNA regions -- a trait without parallel in other characterized organelle genomes and one that gives insight into the mechanisms and modes of expansion of the D. salina ptDNA. Conclusions: These findings confirm the notion that chlamydomonadalean algae have some of the most extreme organelle genomes of all eukaryotes. They also suggest that the events giving rise to the expanded ptDNA architecture of D. salina and other Chlamydomonadales may have occurred early in the evolution of this lineage. Although interesting from a genome evolution standpoint, the D. salina organelle DNA sequences will aid in the development of a viable

  9. The Dunaliella salina organelle genomes: large sequences, inflated with intronic and intergenic DNA.

    Science.gov (United States)

    Smith, David Roy; Lee, Robert W; Cushman, John C; Magnuson, Jon K; Tran, Duc; Polle, Jürgen E W

    2010-05-07

    Dunaliella salina Teodoresco, a unicellular, halophilic green alga belonging to the Chlorophyceae, is among the most industrially important microalgae. This is because D. salina can produce massive amounts of beta-carotene, which can be collected for commercial purposes, and because of its potential as a feedstock for biofuels production. Although the biochemistry and physiology of D. salina have been studied in great detail, virtually nothing is known about the genomes it carries, especially those within its mitochondrion and plastid. This study presents the complete mitochondrial and plastid genome sequences of D. salina and compares them with those of the model green algae Chlamydomonas reinhardtii and Volvox carteri. The D. salina organelle genomes are large, circular-mapping molecules with approximately 60% noncoding DNA, placing them among the most inflated organelle DNAs sampled from the Chlorophyta. In fact, the D. salina plastid genome, at 269 kb, is the largest complete plastid DNA (ptDNA) sequence currently deposited in GenBank, and both the mitochondrial and plastid genomes have unprecedentedly high intron densities for organelle DNA: approximately 1.5 and approximately 0.4 introns per gene, respectively. Moreover, what appear to be the relics of genes, introns, and intronic open reading frames are found scattered throughout the intergenic ptDNA regions -- a trait without parallel in other characterized organelle genomes and one that gives insight into the mechanisms and modes of expansion of the D. salina ptDNA. These findings confirm the notion that chlamydomonadalean algae have some of the most extreme organelle genomes of all eukaryotes. They also suggest that the events giving rise to the expanded ptDNA architecture of D. salina and other Chlamydomonadales may have occurred early in the evolution of this lineage. Although interesting from a genome evolution standpoint, the D. salina organelle DNA sequences will aid in the development of a viable

  10. The Dunaliella salina organelle genomes: large sequences, inflated with intronic and intergenic DNA

    Directory of Open Access Journals (Sweden)

    Tran Duc

    2010-05-01

    Full Text Available Abstract Background Dunaliella salina Teodoresco, a unicellular, halophilic green alga belonging to the Chlorophyceae, is among the most industrially important microalgae. This is because D. salina can produce massive amounts of β-carotene, which can be collected for commercial purposes, and because of its potential as a feedstock for biofuels production. Although the biochemistry and physiology of D. salina have been studied in great detail, virtually nothing is known about the genomes it carries, especially those within its mitochondrion and plastid. This study presents the complete mitochondrial and plastid genome sequences of D. salina and compares them with those of the model green algae Chlamydomonas reinhardtii and Volvox carteri. Results The D. salina organelle genomes are large, circular-mapping molecules with ~60% noncoding DNA, placing them among the most inflated organelle DNAs sampled from the Chlorophyta. In fact, the D. salina plastid genome, at 269 kb, is the largest complete plastid DNA (ptDNA sequence currently deposited in GenBank, and both the mitochondrial and plastid genomes have unprecedentedly high intron densities for organelle DNA: ~1.5 and ~0.4 introns per gene, respectively. Moreover, what appear to be the relics of genes, introns, and intronic open reading frames are found scattered throughout the intergenic ptDNA regions -- a trait without parallel in other characterized organelle genomes and one that gives insight into the mechanisms and modes of expansion of the D. salina ptDNA. Conclusions These findings confirm the notion that chlamydomonadalean algae have some of the most extreme organelle genomes of all eukaryotes. They also suggest that the events giving rise to the expanded ptDNA architecture of D. salina and other Chlamydomonadales may have occurred early in the evolution of this lineage. Although interesting from a genome evolution standpoint, the D. salina organelle DNA sequences will aid in the

  11. cDNA sequences reveal considerable gene prediction inaccuracy in the Plasmodium falciparum genome

    Directory of Open Access Journals (Sweden)

    Valenzuela Jesus G

    2007-07-01

    Full Text Available Abstract Background The completion of the Plasmodium falciparum genome represents a milestone in malaria research. The genome sequence allows for the development of genome-wide approaches such as microarray and proteomics that will greatly facilitate our understanding of the parasite biology and accelerate new drug and vaccine development. Designing and application of these genome-wide assays, however, requires accurate information on gene prediction and genome annotation. Unfortunately, the genes in the parasite genome databases were mostly identified using computer software that could make some erroneous predictions. Results We aimed to obtain cDNA sequences to examine the accuracy of gene prediction in silico. We constructed cDNA libraries from mixed blood stages of P. falciparum parasite using the SMART cDNA library construction technique and generated 17332 high-quality expressed sequence tags (EST, including 2198 from primer-walking experiments. Assembly of our sequence tags produced 2548 contigs and 2671 singletons versus 5220 contigs and 5910 singletons when our EST were assembled with EST in public databases. Comparison of all the assembled EST/contigs with predicted CDS and genomic sequences in the PlasmoDB database identified 356 genes with predicted coding sequences fully covered by EST, including 85 genes (23.6% with introns incorrectly predicted. Careful automatic software and manual alignments found an additional 308 genes that have introns different from those predicted, with 152 new introns discovered and 182 introns with sizes or locations different from those predicted. Alternative spliced and antisense transcripts were also detected. Matching cDNA to predicted genes also revealed silent chromosomal regions, mostly at subtelomere regions. Conclusion Our data indicated that approximately 24% of the genes in the current databases were predicted incorrectly, although some of these inaccuracies could represent alternatively

  12. A novel method for comparative analysis of DNA sequences by Ramanujan-Fourier transform.

    Science.gov (United States)

    Yin, Changchuan; Yin, Xuemeng E; Wang, Jiasong

    2014-12-01

    Alignment-free sequence analysis approaches provide important alternatives over multiple sequence alignment (MSA) in biological sequence analysis because alignment-free approaches have low computation complexity and are not dependent on high level of sequence identity. However, most of the existing alignment-free methods do not employ true full information content of sequences and thus can not accurately reveal similarities and differences among DNA sequences. We present a novel alignment-free computational method for sequence analysis based on Ramanujan-Fourier transform (RFT), in which complete information of DNA sequences is retained. We represent DNA sequences as four binary indicator sequences and apply RFT on the indicator sequences to convert them into frequency domain. The Euclidean distance of the complete RFT coefficients of DNA sequences are used as similarity measures. To address the different lengths of RFT coefficients in Euclidean space, we pad zeros to short DNA binary sequences so that the binary sequences equal the longest length in the comparison sequence data. Thus, the DNA sequences are compared in the same dimensional frequency space without information loss. We demonstrate the usefulness of the proposed method by presenting experimental results on hierarchical clustering of genes and genomes. The proposed method opens a new channel to biological sequence analysis, classification, and structural module identification.

  13. A C----T substitution at nt--101 in a conserved DNA sequence of the promotor region of the beta-globin gene is associated with "silent" beta-thalassemia.

    Science.gov (United States)

    Gonzalez-Redondo, J M; Stoming, T A; Kutlar, A; Kutlar, F; Lanclos, K D; Howard, E F; Fei, Y J; Aksoy, M; Altay, C; Gurgey, A

    1989-05-01

    Sequence analyses and dot-blot analyses with synthetic oligonucleotide probes have identified eight individuals in three Turkish families and one Bulgarian family with one chromosome having a C----T mutation at nucleotide position--101 relative to the Cap site of the beta-globin gene. This nucleotide is part of one of the conserved blocks of nucleotides within the promoter region; in vitro expression analyses with the chloramphenicol acetyltransferase system showed that this substitution will decrease the effectiveness of transcription. Five subjects had a thalassemia intermedia due to the additional presence of a known classical high hemoglobin (Hb) A2 beta-thalassemia mutation on the second chromosome; their hematologic condition was relatively mild. The three persons with a heterozygosity for the--101 C----T mutation had normal hematologic data without microcytosis but with high-normal levels of Hb A2 and a mild imbalance in chain synthesis. The newly discovered mutation is considered one of the silent types of beta-thalassemia. It is relatively rare because it was absent among several hundred normal and beta-thalassemia chromosomes.

  14. A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties.

    Science.gov (United States)

    Pan, Gaofeng; Jiang, Limin; Tang, Jijun; Guo, Fei

    2018-02-08

    DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods-especially machine learning methods-have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use k -gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria-area under the receiver operating characteristic curve (AUC), Matthew's correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity-are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399 . For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3.

  15. Completion of the DNA sequence of mouse adenovirus type 1: sequence of E2B, L1, and L2 (18-51 map units).

    Science.gov (United States)

    Meissner, J D; Hirsch, G N; LaRue, E A; Fulcher, R A; Spindler, K R

    1997-09-01

    The DNA sequence of 9991 nt, corresponding to 18-51 map units of mouse adenovirus type 1 (MAV-1), was determined, completing the sequence of the Larsen strain of MAV-1. The length of the complete MAV-1 genome is 30,946 nucleotides, consistent with previous experimental estimates. The 18-51 map unit region encodes early region 2B proteins necessary for adenoviral replication as well as late region L1 and L2 structural and packaging proteins. Sequence comparison in this region with human adenoviruses indicates broad similarities, including colinear preservation of all recognized open reading frames (ORFs), with highest amino acid identity occurring in the DNA polymerase and polypeptide III (penton base subunit) ORFs. Virus-associated (VA) RNA is not encoded in the region where VA RNAs are found in the human adenoviruses, between E2B and L1, nor is it encoded anywhere in the entire MAV-1 genome. The MAV-1 polypeptide III lacks the arginine-glycine-aspartic acid (RGD) motif which is involved in an association with cell-surface integrins. Only one RGD sequence is found in an identified coding region in the entire MAV-1 genome. Similar to the porcine adenovirus, this RGD sequence is found in the C-terminus of the MAV-1 fiber protein.

  16. The complete mitochondrial DNA sequence of Crotalus horridus (timber rattlesnake).

    Science.gov (United States)

    Hall, Jacob B; Cobb, Vincent A; Cahoon, A Bruce

    2013-04-01

    The complete mitogenome of the timber rattlesnake (Crotalus horridus) was completed using Sanger sequencing. It is 17,260 bp with 13 protein-coding genes, 21 tRNAs, two rRNAs and two control regions. Gene synteny is consistent with other snakes with the exception of a missing redundant tRNA (Ser) . This mitogenome should prove to be a useful addition of a well-known member of the Viperidae snake family.

  17. Reconstructing the History of Mesoamerican Populations through the Study of the Mitochondrial DNA Control Region

    OpenAIRE

    Amaya Gorostiza; Víctor Acunha-Alonzo; Lucía Regalado-Liu; Sergio Tirado; Julio Granados; David Sámano; Héctor Rangel-Villalobos; Antonio González-Martín

    2012-01-01

    The study of genetic information can reveal a reconstruction of human population's history. We sequenced the entire mtDNA control region (positions 16.024 to 576 following Cambridge Reference Sequence, CRS) of 605 individuals from seven Mesoamerican indigenous groups and one Aridoamerican from the Greater Southwest previously defined, all of them in present Mexico. Samples were collected directly from the indigenous populations, the application of an individual survey made it possible to remo...

  18. The nucleotide sequence of the right-hand terminus of adenovirus type 5 DNA: Implications for the mechanism of DNA replication

    NARCIS (Netherlands)

    Steenbergh, P.H.; Sussenbach, J.S.

    The nucleotide sequence of the right-hand terminal 3% of adenovirus type 5 (Ad5) DNA has been determined, using the chemical degradation technique developed by Maxam and Gilbert (1977). This region of the genome comprises the 1003 basepair long HindIII-I fragment and the first 75 nucleotides of the

  19. Determination of cDNA and genomic DNA sequences of hevamine, a chitinase from the rubber tree Hevea brasiliensis

    NARCIS (Netherlands)

    Bokma, E; Spiering, M; Chow, KS; Mulder, PPMFA; Subroto, T; Beintema, JJ

    Hevamine is a chitinase from the rubber tree Hevea brasiliensis and belongs to the family 18 glycosyl hydrolases. This paper describes the cloning of hevamine DNA and cDNA sequences. Hevamine contains a signal peptide at the N-terminus and a putative vacuolar targeting sequence at the C-terminus

  20. Construction of a Sequencing Library from Circulating Cell-Free DNA.

    Science.gov (United States)

    Fang, Nan; Löffert, Dirk; Akinci-Tolun, Rumeysa; Heitz, Katja; Wolf, Alexander

    2016-04-01

    Circulating DNA is cell-free DNA (cfDNA) in serum or plasma that can be used for non-invasive prenatal testing, as well as cancer diagnosis, prognosis, and stratification. High-throughput sequence analysis of the cfDNA with next-generation sequencing technologies has proven to be a highly sensitive and specific method in detecting and characterizing mutations in cancer and other diseases, as well as aneuploidy during pregnancy. This unit describes detailed procedures to extract circulating cfDNA from human serum and plasma and generate sequencing libraries from a wide concentration range of circulating DNA. Copyright © 2016 John Wiley & Sons, Inc.

  1. Genetic diversity in the mtDNA control region and population ...

    African Journals Online (AJOL)

    We investigated the genetic structure and phylogeographical patterns of Sardinella zunasi in Northwestern Pacific. The mitochondrial DNA control region was sequenced for 77 individuals of S.zunasi from four localities over most of the species range. A total of 215 polymorphic sites (72 parsimony informative) and 69 ...

  2. Analysis of mtDNA hypervariable region II for increasing the ...

    African Journals Online (AJOL)

    aghomotsegin

    2015-03-11

    Mar 11, 2015 ... Mitochondrial DNA is a useful genetic marker for answering evolutionary questions due to its high copy number, maternal mode of inheritance, and its high rate of evolution. The aims of this research were to study the mitochondria noncoding region by using the sanger sequencing technique and establish ...

  3. Analysis of mtDNA hypervariable region II for increasing the ...

    African Journals Online (AJOL)

    Mitochondrial DNA is a useful genetic marker for answering evolutionary questions due to its high copy number, maternal mode of inheritance, and its high rate of evolution. The aims of this research were to study the mitochondria noncoding region by using the sanger sequencing technique and establish the degree of ...

  4. The evolution processes of DNA sequences, languages and carols

    Science.gov (United States)

    Hauck, Jürgen; Henkel, Dorothea; Mika, Klaus

    2001-04-01

    The sequences of bases A, T, C and G of about 100 enolase, secA and cytochrome DNA were analyzed for attractive or repulsive interactions by the numbers T 1,T 2,T 3; r of nearest, next-nearest and third neighbor bases of the same kind and the concentration r=other bases/analyzed base. The area of possible T1, T2 values is limited by the linear borders T 2=2T 1-2, T 2=0 or T1=0 for clustering, attractive or repulsive interactions and the border T2=-2 T1+2(2- r) for a variation from repulsive to attractive interactions at r⩽2. Clustering is preferred by most bases in sequences of enolases and secA’ s. Major deviations with repulsive interactions of some bases are observed for archaea bacteria in secA and for highly developed animals and the human species in enolase sequences. The borders of the structure map for enthalpy stabilized structures with maximum interactions are approached in few cases. Most letters of the natural languages and some music notes are at the borders of the structure map.

  5. Characterization of Cryptocaryon irritans isolates from marine fishes in Mainland China by ITS ribosomal DNA sequences.

    Science.gov (United States)

    Sun, H Y; Zhu, X Q; Xie, M Q; Wu, X Y; Li, A X; Lin, R Q; Song, H Q

    2006-07-01

    Seven isolates of Cryptocaryon irritans from different host species and geographical locations in Mainland China were characterized by the first (ITS-1) and second (ITS-2) internal transcribed spacers (ITS) of nuclear ribosomal DNA (rDNA) using two isolates of Ichthyophthirius multifiliis for comparative purposes. The rDNA region including the ITS-1, 5.8S, ITS-2, and flanking 18S and 28S sequences were amplified by polymerase chain reaction and the amplicons were sequenced directly. The ITS-1, 5.8S, and ITS-2 sequences were 129, 160, and 190 bp in length, respectively, for all seven C. irritans isolates, whereas the corresponding sequences for the two I. multifiliis isolates were 142, 153, and 194 bp, respectively. While sequence variation among the seven C. irritans isolates ranged from 0 to 1.6% in both the ITS-1 and ITS-2, and the two I. multifiliis isolates differed by 1.4% in the ITS-1 and 1.0% in the ITS-2; C. irritans differed from I. multifiliis by 57.1-60.9% in the ITS-1 and 79.4-83.0% in the ITS-2, indicating that ITS sequences provide reliable genetic markers for the identification and differentiation of the two species. Phylogenetic analysis using the sequence pairwise-distance data using the neighbor-joining method inferred that the seven C. irritans isolates from Mainland China and two other isolates (T.A and Aus.C) from other countries clustered together to show monophyly, which could be readily distinguished from the other monophyletic group all from other regions. Therefore, ITS sequence data and phylogenetic analysis provided strong support that C. irritans isolates from Mainland China represent a single species. The definition of genetic markers in the ITS rDNA provide opportunities for studying the ecology and population genetic structures of the C. irritans from Mainland China and elsewhere and is also relevant to the diagnosis and control of fish diseases they cause.

  6. Long-range correlations and charge transport properties of DNA sequences

    Science.gov (United States)

    Liu, Xiao-liang; Ren, Yi; Xie, Qiong-tao; Deng, Chao-sheng; Xu, Hui

    2010-04-01

    By using Hurst's analysis and transfer approach, the rescaled range functions and Hurst exponents of human chromosome 22 and enterobacteria phage lambda DNA sequences are investigated and the transmission coefficients, Landauer resistances and Lyapunov coefficients of finite segments based on above genomic DNA sequences are calculated. In a comparison with quasiperiodic and random artificial DNA sequences, we find that λ-DNA exhibits anticorrelation behavior characterized by a Hurst exponent 0.5sequence displays a transition from correlation behavior to anticorrelation behavior. The resonant peaks of the transmission coefficient in genomic sequences can survive in longer sequence length than in random sequences but in shorter sequence length than in quasiperiodic sequences. It is shown that the genomic sequences have long-range correlation properties to some extent but the correlations are not strong enough to maintain the scale invariance properties.

  7. Long-range correlations and charge transport properties of DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    Liu Xiaoliang, E-mail: xlliucsu@yahoo.com.c [College of Physical Science and Technology and College of Metallurgical Science and Engineering, Central South University, Changsha 410083 (China); Ren, Yi [College of Physical Science and Technology and College of Metallurgical Science and Engineering, Central South University, Changsha 410083 (China); Xie, Qiong-tao [Key Laboratory of Low Dimensional Quantum Structures and Quantum Control of Ministry of Education (Hunan Normal University), Changsha 410081 (China); Deng, Chao-sheng; Xu, Hui [College of Physical Science and Technology and College of Metallurgical Science and Engineering, Central South University, Changsha 410083 (China)

    2010-04-26

    By using Hurst's analysis and transfer approach, the rescaled range functions and Hurst exponents of human chromosome 22 and enterobacteria phage lambda DNA sequences are investigated and the transmission coefficients, Landauer resistances and Lyapunov coefficients of finite segments based on above genomic DNA sequences are calculated. In a comparison with quasiperiodic and random artificial DNA sequences, we find that lambda-DNA exhibits anticorrelation behavior characterized by a Hurst exponent 0.5sequence displays a transition from correlation behavior to anticorrelation behavior. The resonant peaks of the transmission coefficient in genomic sequences can survive in longer sequence length than in random sequences but in shorter sequence length than in quasiperiodic sequences. It is shown that the genomic sequences have long-range correlation properties to some extent but the correlations are not strong enough to maintain the scale invariance properties.

  8. Non-random expression of ribosomal DNA units in a grasshopper showing high intragenomic variation for the ITS2 region.

    Science.gov (United States)

    Ruiz-Estévez, M; Ruiz-Ruano, F J; Cabrero, J; Bakkali, M; Perfectti, F; López-León, M D; Camacho, J P M

    2015-06-01

    We analyse intragenomic variation of the ITS2 internal transcribed spacer of ribosomal DNA (rDNA) in the grasshopper Eyprepocnemis plorans, by means of tagged PCR 454 amplicon sequencing performed on both genomic DNA (gDNA) and RNA-derived complementary DNA (cDNA), using part of the ITS2 flanking coding regions (5.8S and 28S rDNA) as an internal control for sequencing errors. Six different ITS2 haplotypes (i.e. variants for at least one nucleotide in the complete ITS2 sequence) were found in a single population, one of them (Hap4) being specific to a supernumerary (B) chromosome. The analysis of both gDNA and cDNA from the same individuals provided an estimate of the expression efficiency of the different haplotypes. We found random expression (i.e. about similar recovery in gDNA and cDNA) for three haplotypes (Hap1, Hap2 and Hap5), but significant underexpression for three others (Hap3, Hap4 and Hap6). Hap4 was the most extremely underexpressed and, remarkably, it showed the lowest sequence conservation for the flanking 5.8-28S coding regions in the gDNA reads but the highest conservation (100%) in the cDNA ones, suggesting the preferential expression of mutation-free rDNA units carrying this ITS2 haplotype. These results indicate that the ITS2 region of rDNA is far from complete homogenization in this species, and that the different rDNA units are not expressed at random, with some of them being severely downregulated. © 2015 The Royal Entomological Society.

  9. Chromosome characterization in Thinopyrum ponticum (Triticeae, Poaceae using in situ hybridization with different DNA sequences

    Directory of Open Access Journals (Sweden)

    Brasileiro-Vidal Ana Christina

    2003-01-01

    Full Text Available Thinopyrum ponticum (2n = 10x = 70, JJJJsJs belongs to the Triticeae tribe, and is currently used as a source of pathogen resistance genes in wheat breeding. In order to characterize its chromosomes, the number and position of 45S and 5S rDNA sites, as well as the distribution of the repetitive DNA sequences pAs1 and pSc119.2, were identified by fluorescent in situ hybridization. The number of nucleoli and NORs was also recorded after silver nitrate staining. Seventeen 45S and twenty 5S rDNA sites were observed on the short arms of 17 chromosomes, the 45S rDNA was always located terminally. On three other chromosomes, only the 5S rDNA site was observed. Silver staining revealed a high number of Ag-NORs (14 to 17 on metaphase chromosomes, whereas on interphase nuclei there was a large variation in number of nucleoli (one to 15, most of them (82.8% ranging between four and nine. The pAs1 probe hybridized to the terminal region of both arms of all 70 chromosomes. In addition, a disperse labeling was observed throughout the chromosomes, except in centromeric and most pericentromeric regions. When the pSc119.2 sequence was used as a probe, terminal labeling was observed on the short arms of 17 chromosomes and on the long arms of five others. The relative position of 45S and 5S rDNA sites, together with the hybridization pattern of pAs1 and pSc119.2 probes, should allow whole chromosomes or chromosome segments of Th. ponticum to be identified in inbred lines of wheat x Th. ponticum.

  10. Structural analysis of DNA sequence: evidence for lateral gene transfer in Thermotoga maritima

    DEFF Research Database (Denmark)

    Worning, Peder; Jensen, Lars Juhl; Nelson, K. E.

    2000-01-01

    The recently published complete DNA sequence of the bacterium Thermotoga maritima provides evidence, based on protein sequence conservation, for lateral gene transfer between Archaea and Bacteria. We introduce a new method of periodicity analysis of DNA sequences, based on structural parameters......, which brings independent evidence for the lateral gene transfer in the genome of T.maritima, The structural analysis relates the Archaea-like DNA sequences to the genome of Pyrococcus horikoshii. Analysis of 24 complete genomic DNA sequences shows different periodicity patterns for organisms...

  11. DNA sequence explains seemingly disordered methylation levels in partially methylated domains of Mammalian genomes.

    Directory of Open Access Journals (Sweden)

    Dimos Gaidatzis

    2014-02-01

    Full Text Available For the most part metazoan genomes are highly methylated and harbor only small regions with low or absent methylation. In contrast, partially methylated domains (PMDs, recently discovered in a variety of cell lines and tissues, do not fit this paradigm as they show partial methylation for large portions (20%-40% of the genome. While in PMDs methylation levels are reduced on average, we found that at single CpG resolution, they show extensive variability along the genome outside of CpG islands and DNase I hypersensitive sites (DHS. Methylation levels range from 0% to 100% in a roughly uniform fashion with only little similarity between neighboring CpGs. A comparison of various PMD-containing methylomes showed that these seemingly disordered states of methylation are strongly conserved across cell types for virtually every PMD. Comparative sequence analysis suggests that DNA sequence is a major determinant of these methylation states. This is further substantiated by a purely sequence based model which can predict 31% (R(2 of the variation in methylation. The model revealed CpG density as the main driving feature promoting methylation, opposite to what has been shown for CpG islands, followed by various dinucleotides immediately flanking the CpG and a minor contribution from sequence preferences reflecting nucleosome positioning. Taken together we provide a reinterpretation for the nucleotide-specific methylation levels observed in PMDs, demonstrate their conservation across tissues and suggest that they are mainly determined by specific DNA sequence features.

  12. Partial DNA sequencing of Douglas-fir cDNAs used in RFLP mapping

    Science.gov (United States)

    K.D. Jermstad; D.L. Bassoni; C.S. Kinlaw; D.B. Neale

    1998-01-01

    DNA sequences from 87 Douglas-fir (Pseudotsuga menziesii [Mirb.] Franco) cDNA RFLP probes were determined. Sequences were submitted to the GenBank dbEST database and searched for similarity against nucleotide and protein databases using the BLASTn and BLASTx programs. Twenty-one sequences (24%) were assigned putative functions; 18 of which...

  13. Comparative Genomic Sequence Analysis of the Human Chromosome 21 Down Syndrome Critical Region

    Science.gov (United States)

    Toyoda, Atsushi; Noguchi, Hideki; Taylor, Todd D.; Ito, Takehiko; Pletcher, Mathew T.; Sakaki, Yoshiyuki; Reeves, Roger H.; Hattori, Masahira

    2002-01-01

    Comprehensive knowledge of the gene content of human chromosome 21 (HSA21) is essential for understanding the etiology of Down syndrome (DS). Here we report the largest comparison of finished mouse and human sequence to date for a 1.35-Mb region of mouse chromosome 16 (MMU16) that corresponds to human chromosome 21q22.2. This includes a portion of the commonly described “DS critical region,” thought to contain a gene or genes whose dosage imbalance contributes to a number of phenotypes associated with DS. We used comparative sequence analysis to construct a DNA feature map of this region that includes all known genes, plus 144 conserved sequences ≥100 bp long that show ≥80% identity between mouse and human but do not match known exons. Twenty of these have matches to expressed sequence tag and cDNA databases, indicating that they may be transcribed sequences from chromosome 21. Eight putative CpG islands are found at conserved positions. Models for two human genes, DSCR4 and DSCR8, are not supported by conserved sequence, and close examination indicates that low-level transcripts from these loci are unlikely to encode proteins. Gene prediction programs give different results when used to analyze the well-conserved regions between mouse and human sequences. Our findings have implications for evolution and for modeling the genetic basis of DS in mice. [Sequence data described in this paper have been submitted to the DDBJ/GenBank under accession nos. AP003148 through AP003158, and AB066227. Supplemental material is available at http://www.genome.org.] PMID:12213769

  14. Bloom DNA helicase facilitates homologous recombination between diverged homologous sequences.

    Science.gov (United States)

    Kikuchi, Koji; Abdel-Aziz, H Ismail; Taniguchi, Yoshihito; Yamazoe, Mitsuyoshi; Takeda, Shunichi; Hirota, Kouji

    2009-09-25

    Bloom syndrome caused by inactivation of the Bloom DNA helicase (Blm) is characterized by increases in the level of sister chromatid exchange, homologous recombination (HR) associated with cross-over. It is therefore believed that Blm works as an anti-recombinase. Meanwhile, in Drosophila, DmBlm is required specifically to promote the synthesis-dependent strand anneal (SDSA), a type of HR not associating with cross-over. However, conservation of Blm function in SDSA through higher eukaryotes has been a matter of debate. Here, we demonstrate the function of Blm in SDSA type HR in chicken DT40 B lymphocyte line, where Ig gene conversion diversifies the immunoglobulin V gene through intragenic HR between diverged homologous segments. This reaction is initiated by the activation-induced cytidine deaminase enzyme-mediated uracil formation at the V gene, which in turn converts into abasic site, presumably leading to a single strand gap. Ig gene conversion frequency was drastically reduced in BLM(-/-) cells. In addition, BLM(-/-) cells used limited donor segments harboring higher identity compared with other segments in Ig gene conversion event, suggesting that Blm can promote HR between diverged sequences. To further understand the role of Blm in HR between diverged homologous sequences, we measured the frequency of gene targeting induced by an I-SceI-endonuclease-mediated double-strand break. BLM(-/-) cells showed a severer defect in the gene targeting frequency as the number of heterologous sequences increased at the double-strand break site. Conversely, the overexpression of Blm, even an ATPase-defective mutant, strongly stimulated gene targeting. In summary, Blm promotes HR between diverged sequences through a novel ATPase-independent mechanism.

  15. TRX-LOGOS - a graphical tool to demonstrate DNA information content dependent upon backbone dynamics in addition to base sequence.

    Science.gov (United States)

    Fortin, Connor H; Schulze, Katharina V; Babbitt, Gregory A

    2015-01-01

    It is now widely-accepted that DNA sequences defining DNA-protein interactions functionally depend upon local biophysical features of DNA backbone that are important in defining sites of binding interaction in the genome (e.g. DNA shape, charge and intrinsic dynamics). However, these physical features of DNA polymer are not directly apparent when analyzing and viewing Shannon information content calculated at single nucleobases in a traditional sequence logo plot. Thus, sequence logos plots are severely limited in that they convey no explicit information regarding the structural dynamics of DNA backbone, a feature often critical to binding specificity. We present TRX-LOGOS, an R software package and Perl wrapper code that interfaces the JASPAR database for computational regulatory genomics. TRX-LOGOS extends the traditional sequence logo plot to include Shannon information content calculated with regard to the dinucleotide-based BI-BII conformation shifts in phosphate linkages on the DNA backbone, thereby adding a visual measure of intrinsic DNA flexibility that can be critical for many DNA-protein interactions. TRX-LOGOS is available as an R graphics module offered at both SourceForge and as a download supplement at this journal. To demonstrate the general utility of TRX logo plots, we first calculated the information content for 416 Saccharomyces cerevisiae transcription factor binding sites functionally confirmed in the Yeastract database and matched to previously published yeast genomic alignments. We discovered that flanking regions contain significantly elevated information content at phosphate linkages than can be observed at nucleobases. We also examined broader transcription factor classifications defined by the JASPAR database, and discovered that many general signatures of transcription factor binding are locally more information rich at the level of DNA backbone dynamics than nucleobase sequence. We used TRX-logos in combination with MEGA 6.0 software

  16. Quality Control of the Traditional Patent Medicine Yimu Wan Based on SMRT Sequencing and DNA Barcoding.

    Science.gov (United States)

    Jia, Jing; Xu, Zhichao; Xin, Tianyi; Shi, Linchun; Song, Jingyuan

    2017-01-01

    Substandard traditional patent medicines may lead to global safety-related issues. Protecting consumers from the health risks associated with the integrity and authenticity of herbal preparations is of great concern. Of particular concern is quality control for traditional patent medicines. Here, we establish an effective approach for verifying the biological composition of traditional patent medicines based on single-molecule real-time (SMRT) sequencing and DNA barcoding. Yimu Wan (YMW), a classical herbal prescription recorded in the Chinese Pharmacopoeia, was chosen to test the method. Two reference YMW samples were used to establish a standard method for analysis, which was then applied to three different batches of commercial YMW samples. A total of 3703 and 4810 circular-consensus sequencing (CCS) reads from two reference and three commercial YMW samples were mapped to the ITS2 and psbA-trnH regions, respectively. Moreover, comparison of intraspecific genetic distances based on SMRT sequencing data with reference data from Sanger sequencing revealed an ITS2 and psbA-trnH intergenic spacer that exhibited high intraspecific divergence, with the sites of variation showing significant differences within species. Using the CCS strategy for SMRT sequencing analysis was adequate to guarantee the accuracy of identification. This study demonstrates the application of SMRT sequencing to detect the biological ingredients of herbal preparations. SMRT sequencing provides an affordable way to monitor the legality and safety of traditional patent medicines.

  17. Quality Control of the Traditional Patent Medicine Yimu Wan Based on SMRT Sequencing and DNA Barcoding

    Directory of Open Access Journals (Sweden)

    Jing Jia

    2017-05-01

    Full Text Available Substandard traditional patent medicines may lead to global safety-related issues. Protecting consumers from the health risks associated with the integrity and authenticity of herbal preparations is of great concern. Of particular concern is quality control for traditional patent medicines. Here, we establish an effective approach for verifying the biological composition of traditional patent medicines based on single-molecule real-time (SMRT sequencing and DNA barcoding. Yimu Wan (YMW, a classical herbal prescription recorded in the Chinese Pharmacopoeia, was chosen to test the method. Two reference YMW samples were used to establish a standard method for analysis, which was then applied to three different batches of commercial YMW samples. A total of 3703 and 4810 circular-consensus sequencing (CCS reads from two reference and three commercial YMW samples were mapped to the ITS2 and psbA-trnH regions, respectively. Moreover, comparison of intraspecific genetic distances based on SMRT sequencing data with reference data from Sanger sequencing revealed an ITS2 and psbA-trnH intergenic spacer that exhibited high intraspecific divergence, with the sites of variation showing significant differences within species. Using the CCS strategy for SMRT sequencing analysis was adequate to guarantee the accuracy of identification. This study demonstrates the application of SMRT sequencing to detect the biological ingredients of herbal preparations. SMRT sequencing provides an affordable way to monitor the legality and safety of traditional patent medicines.

  18. Ribosomal DNA ITS-1 and ITS-2 sequence comparisons as a tool for predicting genetic relatedness.

    Science.gov (United States)

    Coleman, A W; Mai, J C

    1997-08-01

    The determination of the secondary structure of the internal transcribed spacer (ITS) regions separating nuclear ribosomal RNA genes of Chlorophytes has improved the fidelity of alignment of nuclear ribosomal ITS sequences from related organisms. Application of this information to sequences from green algae and plants suggested that a subset of the ITS-2 positions is relatively conserved. Organisms that can mate are identical at all of these 116 positions, or differ by at most, one nucleotide change. Here we sequenced and compared the ITS-1 and ITS-2 of 40 green flagellates in search of the nearest relative to Chlamydomonas reinhardtii. The analysis clearly revealed one unique candidate, C. incerta. Several ancillary benefits of the analysis included the identification of mislabelled cultures, the resolution of confusion concerning C. smithii, the discovery of misidentified sequences in GenBank derived from a green algal contaminant, and an overview of evolutionary relationships among the Volvocales, which is congruent with that derived from rDNA gene sequence comparisons but improves upon its resolution. The study further delineates the taxonomic level at which ITS sequences, in comparison to ribosomal gene sequences, are most useful in systematic and other studies.

  19. In vitro footprinting of promoter regions within supercoiled plasmid DNA.

    Science.gov (United States)

    Sun, Daekyu

    2010-01-01

    Polypurine/polypyrimidine (pPu/pPy) tracts, which exist in the promoter regions of many growth-related genes, have been proposed to be very dynamic in their conformation. In this chapter, we describe a detailed protocol for DNase I and S1 nuclease footprinting experiments with supercoiled plasmid DNA containing the promoter regions to probe whether there are conformational transitions to B-type DNA, melted DNA, and G-quadruplex structures within this tract. This is demonstrated with the proximal promoter region of the human vascular endothelial growth factor (VEGF) gene, which also contains multiple binding sites for Sp1 and Egr-1 transcription factors.

  20. Beyond DNA Sequencing in Space: Current and Future Omics Capabilities of the Biomolecule Sequencer Payload

    Science.gov (United States)

    Wallace, Sarah

    2017-01-01

    Why do we need a DNA sequencer to support the human exploration of space? (A) Operational environmental monitoring; (1) Identification of contaminating microbes, (2) Infectious disease diagnosis, (3) Reduce down mass (sample return for environmental monitoring, crew health, etc.). (B) Research; (1) Human, (2) Animal, (3) Microbes/Cell lines, (4) Plant. (C) Med Ops; (1) Response to countermeasures, (2) Radiation, (3) Real-time analysis can influence medical intervention. (C) Support astrobiology science investigations; (1) Technology superiorly suited to in situ nucleic acid-based life detection, (2) Functional testing for integration into robo